It’s up to you to organize your own week1-discussion.qmd file (i.e. there is no template). You may (should) discuss and work through today’s exercise with a partner (or two!).
Your goal is to transform the raw fracking data into a wrangled data frame that looks like this (only the first 6 rows are shown here):
Load the
{tidyverse},{janitor}, and{usdata}packages, read inregistryupload_1.csv, and perform some basic data exploration. Use the{here}package to read in your data and check out things like the dimensions, structure, types of variables, unique observations, etc. Read through the metadata, which accompanied ourregistryupload_1.csvfile when downloaded from FracFocus.Consider what data cleaning you might need to perform. Discuss any curious data formatting (e.g. data types, inconsistent observation names, missing values, etc.) that you discovered during your data exploration step, above. What might you need to address in your next data wrangling steps?
Take a moment to share some of your findings with the rest of the class.
- Clean / wrangle the data! Using the packages loaded in step 1, perform the following operations (NOTE: Each step below corresponds to a single line of code):
- convert all column names from CamelCase to snake_case (for ease of readability and to adhere to the Tidyverse style guide recommendation)
- remove
"AM"from thejob_start_datecolumn observations - convert observations in the
job_start_datecolumn from character strings to datetime objects and save them to a new column nameddatetime_start - add a column named
yearthat contains only the year fromdatetime_start - keep only the necessary columns (
datetime_start,year,state_name,well_name,total_base_water_volume) - convert all observations in the
state_namecolumn to Title Case - remove any rows where the state name ends in
? - convert any state abbreviations to full names in the
state_namecolumn (Hint: use a combination of themutate(),ifelse()&usdata::abbr2state()functions) - correct any misspelled state names in the
state_namecolumn (Note: figure out how to update the names of a few misspelled states, the feel free to copy the complete code for this step from the expandable section, below) - remove any rows that do not have a true US state name in the
state_namecolumn - add a column named
state_abbwith just US state abbreviations - move the
state_abbcolumn so that it sits immediately after thestate_namecolumn - rename
total_base_water_volumeastotal_base_water_volume_galfor clarity - remove any observations that don’t have a measurement for
total_base_water_volume_gal(i.e. NA) - reorder rows from earliest to most recent
datetime_startvalues
mutate(state_name = case_when(
state_name == "Colordao" ~ "Colorado",
state_name == "Loiusiana" ~ "Louisiana",
state_name == "Louisianna" ~ "Louisiana",
state_name == "Lousiana" ~ "Louisiana",
state_name == "New Mexcio" ~ "New Mexico",
state_name == "Norh Dakota" ~ "North Dakota",
state_name == "Norht Dakota" ~ "North Dakota",
state_name == "North Dakota" ~ "North Dakota",
state_name == "North Dakata" ~ "North Dakota",
state_name == "North Dakotta" ~ "North Dakota",
state_name == "Noth Dakota" ~ "North Dakota",
state_name == "Pennslvania" ~ "Pennsylvania",
state_name == "Pennsylavania" ~ "Pennsylvania",
state_name == "Pennsylvanya" ~ "Pennsylvania",
state_name == "Penssylvania" ~ "Pennsylvania",
state_name == "Texasa" ~ "Texas",
state_name == "Texs" ~ "Texas",
state_name == "West Viginia" ~ "West Virginia",
state_name == "Wyominng" ~ "Wyoming",
TRUE ~ state_name # copy over rest of state names from as-is
))