Screenshot of the The National Risk Index’s interactive mapping and data-based interface
Read each part of the assignment carefully, and use the check boxes to ensure you’ve addressed all elements of the assignment!
Part I: Choosing the right graphic form
Learning Outcomes
- identify which types of visualizations are most appropriate for your data and your audience
- prepare (e.g. clean, explore, wrangle) data so that it’s appropriately formatted for building data visualizations
- build effective, responsible, accessible, and aesthetically-pleasing, visualizations using the R programming language, and specifically
{ggplot2}
+ ggplot2 extension packages
Description
In class, we’ve been discussing strategies and considerations for choosing the right graphic form to represent your data and convey your intended message. Here, you’ll apply what we’re learning to natural hazards and demographics data, courtesy of the FEMA National Risk Index (NRI) and the US Census Bureau’s American Community Survey (ACS).
1a. Background reading
Unfold the following note to read more about the data before continuing on (collapsed to save space):
About FEMA’s National Risk Index (NRI) for Natural Hazards
FEMA (Federal Emergency Management Agency) is a government agency with a mission of helping people before, during, and after disasters. In 2021, FEMA launched the National Risk Index (NRI), “a dataset and online tool to help illustrate the United States communities most at risk for 18 natural hazards”.
Risk is defined as the potential for negative impacts resulting from natural hazards. It’s calculated using the following equation (and illustrated in this graphic; read more about determining risk):
\[Risk\:Index = Expected\:Annual\:Loss \times \frac{Social\:Vulnerability}{Community\:Resilience}\]
NRI provides hazard type-specific scores, as well as a composite score, which adds together the risk from all 18 hazard types. A community’s risk score is represented by its percentile ranking among all other communities at the same level for Risk, Expected Annual Loss, Social Vulnerability and Community Resilience – for example, if a given county’s Risk Index percentile for a hazard type is 84.32 then its Risk Index value is greater than 84.32% of all US counties. Each community is also assigned a risk rating, which is a qualitative rating that describes the community in comparison to all other communities at the same level, ranging from “Very Low” to “Very High.”
You can learn more about the NRI at hazards.fema.gov/nri.
Accessing NRI Data
Data at the county- and census tract-level are available for download in multiple formats (including Shapefiles & CSVs) from NRI’s Data Resources page.
About the US Census Bureau’s American Community Survey (ACS)
The American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely social, economic, housing, and demographic data every year. Unlike the Decennial Census (which counts every person in the US every 10 years for the purpose of congressional appointment), the ACS collects detailed information from a small subset of the population (~3.5 million households) at 1- and 5-year intervals. Learn more about the differences between these 1- and 5-year estimates.
Accessing ACS Data
The US Census Bureau provides a couple of tools for accessing their data, including:
- data.census.gov: a browser-based portal for exploring the many available data tables (e.g. Table B02001: Race)
- The Census Data API: a data service that enables software developers to access and use Census Bureau data within their applications
However, when working in R, the {tidycensus}
package is arguably the easiest way to query and retrieve Census data – use the get_acs()
function to obtain ACS data for specified geographies (e.g. counties or census tracts), tables (e.g. B02001), variables (e.g. B02001_002, B02001_003), years (e.g. 2023), states (e.g. CA), surveys (e.g. acs1, acs5), etc.
The following sections (Part 1b - 1d) should be completed via GitHub Classroom (find and accept the assignment link on Slack). Read on for the full assignment description.
1b. Create viz #1 + answer questions
Create a data viz that helps to answer the question, How do FEMA National Risk Index scores for counties in California compare to those in other states?, following these steps:
- Download and unzip the data: You’ll use the All Counties - County-level detail (Table) (2023 Release; accessed on the NRI Data Resources page). Unzip the file, then drop the whole
NRI_Table_Counties/
folder into adata/
folder in your HW repository. - Add your
data/
folder to.gitignore
: So we don’t accidentally push our data to GitHub! - Read in
NRIDataDictionary.csv
:NRI_Table_Counties/
contains a few different files, including this CSV file which describes each of the NRI variables found inNRI_Table_Counties.csv
. This is a helpful place to start! - Read in
NRI_Table_Counties.csv
: This is your data. - Build your viz: This may require some data wrangling first. Your final viz should:
- identify / jot down your variables of interest and consider which data types they are
- use online tools like from Data to Viz to help determine appropriate graph types, given your variables
- roughly sketch out your plots by hand (I find this incredibly helpful for understanding how my data needs to be wrangled to achieve my desired output(s))
(You’ll also want to repeat this process when creating your second data viz in Part 1c)
- Answer the following questions:
1c. Create viz #2 + answer questions
Create a data viz that helps to answer the question, How does climate hazard risk exposure vary across racial / ethnic groups in California?, following these steps:
- Import ACS data using
tidycensus::get_acs()
: You’ll need your API key to use{tidycensus}
(revisit week 2 pre-class prep instructions, if necessary). You may use the following code:
#.........see all available ACS variables + descriptions.........
<- tidycensus::load_variables(year = 2023,
acs_vars dataset = "acs1")
#..................import race & ethnicity data..................
<- tidycensus::get_acs(
race_ethnicity geography = "county",
survey = "acs1",
variables = c("B01003_001", "B02001_002", "B02001_003",
"B02001_004", "B02001_005", "B02001_006",
"B02001_007", "B02001_008", "B03002_012",
"B03002_002"),
state = "CA",
year = 2023) |>
::left_join(acs_vars, by = dplyr::join_by(variable == name)) # join variable descriptions (so we know what's what!) dplyr
- Optionally, write your data to
.csv
: It’s always a good idea to write your data (i.e. therace_ethnicity
data frame, from above) to file, in case the Census Bureau’s API goes down. You may use the following code:
::write_csv(race_ethnicity, here::here("data", "ACS-race-ethnicity.csv")) readr
- Build your viz: This will require some data wrangling first (including joining the NRI and ACS data). Your final viz should:
- Answer the following questions:
1d. Polish your .qmd
file
Your rendered .qmd
file should be polished and neatly organized. Be sure to consider / implement (as appropriate) the following:
Rubric (specifications)
You must complete the following, as detailed below, to receive a “Satisfactory” mark for Assignment #2, Part I:
eds240-hw2-username/Part1.qmd
):
Everyone receives one “free pass” for not successfully submitting assignments via specified channels, after which you will receive a “Not Yet” mark.
Choosing an incorrect graphic form (i.e. one that’s inappropriate for your data) will result in a “Not Yet” score. However, there are numerous graphic forms which may be appropriate. Your final plots should clearly display the variables of interest, and you should be able to justify your choice in your written responses.
Part II: Data wrangling & exploratory data viz using your chosen data
Learning Outcomes
Note: This part of HW #2 is a continuation of HW #1, Part II and is the next step in working towards your final course assignment. Your final assignment is meant to combine nearly all of the course learning outcomes(!):
- identify which types of visualizations are most appropriate for your data and your audience
- prepare (e.g. clean, explore, wrangle) data so that it’s appropriately formatted for building data visualizations
- build effective, responsible, accessible, and aesthetically-pleasing visualizations using the R programming language, and specifically
{ggplot2}
+ ggplot2 extension packages - write code from scratch and read and adapt code written by others
- apply a DEI (Diversity, Equity & Inclusion) lens to the process of designing data visualizations
Description
2a. Review HW #4 instructions
Please begin by re-reading HW #4 in full as a reminder of the options, goals, and requirements for your final class assignment.
2b. Import & wrangle data, then create exploratory data viz
This week, you’ll focus on importing and wrangling your data (found as part of HW #1, Part II), followed by the exploratory data visualization phase. Complete the following:
2c. Answer questions
After completing the above steps, answer the following questions:
Rubric (specifications)
You must complete the following, as detailed below, to receive a “Satisfactory” mark for Assignment #2, Part II:
lastName-eds240-HW4
repo, not in GitHub Classroom:
See details below.
---
: "your HW #2 title"
title: "your Name"
author: xxxx-xx-xx
date:
format:
html-resources: true # this ensures that your rendered .html file is self-contained, so we (your instructors) can open it and view all your work
embed---