You must earn a “Satisfactory” mark for each individual Part (I and II) to earn a “Satisfactory” mark for Assignment #1.

Read each part of the assignment carefully, and use the check boxes to ensure you’ve addressed all elements of the assignment!

Part I: Choosing the right graphic form

Learning Outcomes

identify which types of visualizations are most appropriate for your data and your audience
prepare (e.g. clean, explore, wrangle) data so that it’s appropriately formatted for building data visualizations
build effective, responsible, accessible, and aesthetically-pleasing, visualizations using the R programming language, and specifically {ggplot2} + ggplot2 extension packages

Description

In class, we’ve been discussing strategies and considerations for choosing the right graphic form to represent your data and convey your intended message. Here, you’ll apply what we’re learning to natural hazards and demographics data, courtesy of the FEMA National Risk Index (NRI) and the US Census Bureau’s American Community Survey (ACS).

1a. Background reading

Unfold the following note to read more about the data before continuing on (collapsed to save space):

About the data

About FEMA’s National Risk Index (NRI) for Natural Hazards

FEMA (Federal Emergency Management Agency) is a government agency with a mission of helping people before, during, and after disasters. In 2021, FEMA launched the National Risk Index (NRI), “a dataset and online tool to help illustrate the United States communities most at risk for 18 natural hazards”.

Risk is defined as the potential for negative impacts resulting from natural hazards. It’s calculated using the following equation (and illustrated in this graphic; read more about determining risk):

\[Risk\:Index = Expected\:Annual\:Loss \times \frac{Social\:Vulnerability}{Community\:Resilience}\]

NRI provides hazard type-specific scores, as well as a composite score, which adds together the risk from all 18 hazard types. A community’s risk score is represented by its percentile ranking among all other communities at the same level for Risk, Expected Annual Loss, Social Vulnerability and Community Resilience – for example, if a given county’s Risk Index percentile for a hazard type is 84.32 then its Risk Index value is greater than 84.32% of all US counties. Each community is also assigned a risk rating, which is a qualitative rating that describes the community in comparison to all other communities at the same level, ranging from “Very Low” to “Very High.”

You can learn more about the NRI at hazards.fema.gov/nri.

Screenshot of the The National Risk Index’s interactive mapping and data-based interface

Accessing NRI Data

Data at the county- and census tract-level are available for download in multiple formats (including Shapefiles & CSVs) from NRI’s Data Resources page.

About the US Census Bureau’s American Community Survey (ACS)

The American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely social, economic, housing, and demographic data every year. Unlike the Decennial Census (which counts every person in the US every 10 years for the purpose of congressional appointment), the ACS collects detailed information from a small subset of the population (~3.5 million households) at 1- and 5-year intervals. Learn more about the differences between these 1- and 5-year estimates.

Accessing ACS Data

The US Census Bureau provides a couple of tools for accessing their data, including:

data.census.gov: a browser-based portal for exploring the many available data tables (e.g. Table B02001: Race)
The Census Data API: a data service that enables software developers to access and use Census Bureau data within their applications

However, when working in R, the {tidycensus} package is arguably the easiest way to query and retrieve Census data – use the get_acs() function to obtain ACS data for specified geographies (e.g. counties or census tracts), tables (e.g. B02001), variables (e.g. B02001_002, B02001_003), years (e.g. 2023), states (e.g. CA), surveys (e.g. acs1, acs5), etc.

The following sections (Part 1b - 1d) should be completed via GitHub Classroom (find and accept the assignment link on Slack). Read on for the full assignment description.

1b. Create viz #1 + answer questions

Create a data viz that helps to answer the question, How do FEMA National Risk Index scores for counties in California compare to those in other states?, following these steps:

Download and unzip the data: You’ll use the All Counties - County-level detail (Table) (2023 Release; accessed on the NRI Data Resources page). Unzip the file, then drop the whole NRI_Table_Counties/ folder into a data/ folder in your HW repository.
Add your data/ folder to .gitignore: So we don’t accidentally push our data to GitHub!
Read in NRIDataDictionary.csv: NRI_Table_Counties/ contains a few different files, including this CSV file which describes each of the NRI variables found in NRI_Table_Counties.csv. This is a helpful place to start!
Read in NRI_Table_Counties.csv: This is your data.
Build your viz: This may require some data wrangling first. Your final viz should:
- include data for the 50 US states only (no territories)
- include a title (short, descriptive) & subtitle (describes main takeaway) (see Fundamentals of Data Visualization, Ch 22 for an example), a caption (describes the data source, e.g. “Data: FEMA National Risk Index (2023 Release)”), and alt text (following the formula described in week 3 discussion; use the fig-alt code chunk option to apply your alt text)
- consider and implement strategies for highlighting trends / important information (e.g. arranging data, highlighting data, adjusting scales)
- use custom colors (if applicable), rather than ggplot defaults
- have an updated / polished theme (you’ll learn more about fine-tuning ggplot themes in week 4 discussion)

BEFORE you start wrangling / building your viz, be sure to…

identify / jot down your variables of interest and consider which data types they are
use online tools like from Data to Viz to help determine appropriate graph types, given your variables
roughly sketch out your plots by hand (I find this incredibly helpful for understanding how my data needs to be wrangled to achieve my desired output(s))

(You’ll also want to repeat this process when creating your second data viz in Part 1c)

Answer the following questions:
- a. What are your variables of interest and what kinds of data (e.g. numeric, categorical, ordered, etc.) are they (a bullet point list is fine)?
- b. How did you decide which type of graphic form was best suited for answering the question? What alternative graphic forms could you have used instead? Why did you settle on this particular graphic form?
- c. What modifications did you make to this viz to make it more easily readable?
- d. Is there anything you wanted to implement, but didn’t know how? If so, please describe.

1c. Create viz #2 + answer questions

Create a data viz that helps to answer the question, How does climate hazard risk exposure vary across racial / ethnic groups in California?, following these steps:

Import ACS data using tidycensus::get_acs(): You’ll need your API key to use {tidycensus} (revisit week 2 pre-class prep instructions, if necessary). You may use the following code:

#.........see all available ACS variables + descriptions.........
acs_vars <- tidycensus::load_variables(year = 2023,
                                       dataset = "acs1")

#..................import race & ethnicity data..................
race_ethnicity <- tidycensus::get_acs(
  geography = "county",
  survey = "acs1",
  variables = c("B01003_001", "B02001_002", "B02001_003",
                "B02001_004", "B02001_005", "B02001_006",
                "B02001_007", "B02001_008", "B03002_012",
                "B03002_002"),
  state = "CA", 
  year = 2023) |>
  dplyr::left_join(acs_vars, by = dplyr::join_by(variable == name)) # join variable descriptions (so we know what's what!)

Optionally, write your data to .csv: It’s always a good idea to write your data (i.e. the race_ethnicity data frame, from above) to file, in case the Census Bureau’s API goes down. You may use the following code:

readr::write_csv(race_ethnicity, here::here("data", "ACS-race-ethnicity.csv"))

Build your viz: This will require some data wrangling first (including joining the NRI and ACS data). Your final viz should:
- include the following racial / ethnic groups: White, Black or African American, American Indian and Alaska Native, Asian, Native Hawaiian and Other Pacific Islander, Some Other Race, Two or More Races, Hispanic or Latino
- include a title (short, descriptive) & subtitle (describes main takeaway) (see Fundamentals of Data Visualization, Ch 22 for an example), a caption (describes the data source, e.g. “Data: FEMA National Risk Index (2023 Release)”), and alt text (following the formula discussed in week 3 discussion; use the fig-alt code chunk option to apply your alt text)
- consider and implement strategies for highlighting trends / important information (e.g. arranging data, highlighting data, adjusting scales)
- use custom colors (if applicable), rather than ggplot defaults
- have an updated theme / polished theme (you’ll learn more about fine-tuning ggplot themes in week 4 discussion)
Answer the following questions:
- a. What are your variables of interest and what kinds of data (e.g. numeric, categorical, ordered, etc.) are they (a bullet point list is fine)?
- b. How did you decide which type of graphic form was best suited for answering the question? What alternative graphic forms could you have used instead? Why did you settle on this particular graphic form?
- d. What modifications did you make to this viz to make it more easily readable?
- d. Is there anything you wanted to implement, but didn’t know how? If so, please describe.

1d. Polish your `.qmd` file

Your rendered .qmd file should be polished and neatly organized. Be sure to consider / implement (as appropriate) the following:

update the YAML with your name using the author option
set appropriate code chunk options, such that:
- your code and outputs (i.e. data viz) render (see eval, echo)
- warnings and messages are suppressed (see warning, message)
- (optional) outputs are center-aligned (if you find this more visually-pleasing than the default left-alignment; see fig-align: "center")
- adjust the aspect ratio of your plot(s) so that your data / groups are easy to read (see fig-asp, which makes adjusting aspect ratios for rendered outputs quite easy; values > 1 make your plot taller and values < 1 make your plot wider)
code is appropriately annotated (NOTE: you do not need to annotate every line (like in HW #1 Part 1), but you should include enough so that someone else reading your code understands the purpose of each discrete block of code)
include any necessary section headers / prose in the body of your Quarto doc to effectively organize your document

Part II: Data wrangling & exploratory data viz using your chosen data

Learning Outcomes

Note: This part of HW #2 is a continuation of HW #1, Part II and is the next step in working towards your final course assignment. Your final assignment is meant to combine nearly all of the course learning outcomes(!):

identify which types of visualizations are most appropriate for your data and your audience
prepare (e.g. clean, explore, wrangle) data so that it’s appropriately formatted for building data visualizations
build effective, responsible, accessible, and aesthetically-pleasing visualizations using the R programming language, and specifically {ggplot2} + ggplot2 extension packages
write code from scratch and read and adapt code written by others
apply a DEI (Diversity, Equity & Inclusion) lens to the process of designing data visualizations

Description

2a. Review HW #4 instructions

Please begin by re-reading HW #4 in full as a reminder of the options, goals, and requirements for your final class assignment.

2b. Import & wrangle data, then create exploratory data viz

This week, you’ll focus on importing and wrangling your data (found as part of HW #1, Part II), followed by the exploratory data visualization phase. Complete the following:

Create an file named, HW2-exploration.qmd within your lastName-eds240-HW4 repo and add appropriate YAML fields.
Load necessary packages and read in your data.
Clean & wrangle your data.
Create at least three exploratory data visualizations (but of course feel free to create more!). The goal of these visualizations is to explore your data for any potentially interesting patterns or trends, which you or may not decide to pursue further as you iterate on your final project deliverable. These plots do not need to be polished (e.g. updated theme), but it may be helpful to try arranging data to identify any trends.
IMPORTANT: If you have a downloaded data file saved to your repo (e.g. you’re not reading in your data directly from online, from a server, etc.) be sure to add your data folder / file to your .gitignore, particularly if this file is large.

2c. Answer questions

After completing the above steps, answer the following questions:

1. What have you learned about your data? Have any potentially interesting patterns emerged?
2. In HW #1, you outlined some questions that you wanted to answer using these data. Have you made any strides towards answering those questions? If yes, how so? If no, what next steps do you need to take (e.g. I need to create X plot type, I still need to track down Y data, I need to restructure existing data so that you can visualize it in Z ways, etc.)?
3. What challenges do you foresee encountering with your data? These can be data wrangling and / or visualization challenges.

Rubric (specifications)

You must complete the following, as detailed below, to receive a “Satisfactory” mark for Assignment #2, Part II:

Complete the following steps under your lastName-eds240-HW4 repo, not in GitHub Classroom:

See details below.

Perform all Part 2b steps (as described above) in your HW2-exploration.qmd file.
Answer Part 2c questions in your HW2-exploration.qmd file. There is no set length requirement, but you must answer each question in full to receive a Satisfactory score.
All three plot outputs should appear in your rendered doc.
HW2-exploration.qmd should be neatly organized – this does not need to be a perfectly polished document, but sections should be clearly labeled with prose and / or annotations so that we can easily follow along.
Code chunks should have appropriate chunk options set (e.g. code should render and execute, but warnings and messages should be suppressed, long data frames should not be printed out, etc.).
Send your instructor (Sam Shanny-Csik) and TA (Annie Adams) a rendered version of your HW2-exploration.qmd file by 11:59pm PT on Sat 02/04/2025 following these steps:
- a. ensure that your YAML specifies these options, at a minimum:

---
title: "your HW #2 title"
author: "your Name"
date: xxxx-xx-xx
format:
  html:
    embed-resources: true # this ensures that your rendered .html file is self-contained, so we (your instructors) can open it and view all your work
---

b. render your HW2-exploration.qmd file, verify that you can open the resulting HW2-exploration.html file in your browser, and that all formatting looks good
c. save a copy of HW2-exploration.html to your Desktop, rename it it so that it has your first initial / last name at the start (e.g. SShannyCsik-HW2-exploration.html), then send it to both your instructor (Sam Shanny-Csik) and TA (Annie Adams) on Slack via direct message

End Part II

Part I: Choosing the right graphic form

Learning Outcomes

Description

1a. Background reading

About FEMA’s National Risk Index (NRI) for Natural Hazards

Accessing NRI Data

About the US Census Bureau’s American Community Survey (ACS)

Accessing ACS Data

1b. Create viz #1 + answer questions

1c. Create viz #2 + answer questions

1d. Polish your .qmd file

Rubric (specifications)

Part II: Data wrangling & exploratory data viz using your chosen data

Learning Outcomes

Description

2a. Review HW #4 instructions

2b. Import & wrangle data, then create exploratory data viz

2c. Answer questions

Rubric (specifications)

1d. Polish your `.qmd` file