Note
This template follows lecture 1.3 slides. Please be sure to cross-reference the slides, which contain important information and additional context!
Setup
Tidy Data Review
Example untidy / wide data:
# create some untidy temperature data ----
temp_data_wide <- tribble(
  ~date, ~station1, ~station2,  ~station3,
  "2023-10-01", 30.1, 29.8,  31.2,
  "2023-11-01", 28.6, 29.1,  33.4,
  "2023-12-01", 29.9, 28.5,  32.3
)
# print it out ----
print(temp_data_wide)# A tibble: 3 × 4
  date       station1 station2 station3
  <chr>         <dbl>    <dbl>    <dbl>
1 2023-10-01     30.1     29.8     31.2
2 2023-11-01     28.6     29.1     33.4
3 2023-12-01     29.9     28.5     32.3Using pivot_longer() to “lengthen” / tidy our data:
# convert data from wide > long ----
temp_data_long <- temp_data_wide |> 
  pivot_longer(cols = starts_with("station"),
               names_to = "station_id",
               values_to = "temp_c")
# print it out ----
print(temp_data_long)# A tibble: 9 × 3
  date       station_id temp_c
  <chr>      <chr>       <dbl>
1 2023-10-01 station1     30.1
2 2023-10-01 station2     29.8
3 2023-10-01 station3     31.2
4 2023-11-01 station1     28.6
5 2023-11-01 station2     29.1
6 2023-11-01 station3     33.4
7 2023-12-01 station1     29.9
8 2023-12-01 station2     28.5
9 2023-12-01 station3     32.3Plot #1
Explore the relationship between penguin bill length and bill depth. Our goals are to review:
- initializing a plot object and adding a geometry layer to represent our data
- when to define data & map variables globally (e.g. within ggplot()) vs. locally (e.g. within ageom_*())
- updating how aesthetic mappings manifest visually aka scaling
- piping into a ggplot
Example 1 (the basics)
- initialize a plot object
- map aesthetics
- show that you can omit argument names
- add a geometry layer
- update how aesthetic mappings manifest visually (i.e. scaling)
Example 2 (mapping custom colors)
Example 3 (mapping color to continuous variable)
Example 4 (updating color for all points)
Example 5 (local vs. global data & variable mappings)
- define data and mappings within geom_*()(locally) rather than inggplot()(globally) – helpful if you plan to include multiple geoms with different mappings (e.g. you’re plotting data from multiple data frames):
# create a separate penguins_summary df ----
penguins_summary <- penguins |>
  drop_na() |> 
  group_by(species) |> 
  summarize(
    mean_bill_length_mm = mean(bill_length_mm),
    mean_bill_depth_mm = mean(bill_depth_mm)
  )
# create ggplot with layers from different dfs ----
ggplot() +
  geom_point(data = penguins, 
             mapping = aes(x = bill_length_mm, y = bill_depth_mm, color = species), 
             alpha = 0.5) +
  geom_point(data = penguins_summary,
             mapping = aes(x = mean_bill_length_mm, y = mean_bill_depth_mm, fill = species),
             size = 5, shape = 24)
Example 6 & 7 (mapping variables globally vs. locally)
- global mappings are passed down to all subsequent layers:
# scatterplot with lms fitted to species ---- 
ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  geom_point() +
  geom_smooth(method = "lm")
- local mappings only apply to that particular layer:
Example 8 (piping data into a ggplot)
- useful if you need to do a small bit of wrangling first, but don’t want to create a whole new df
Plot #2
Explore penguin species counts. Our goals are to review:
- statistical transformations – what are they, how to identify the default, and how to update them
- position adjustments – what are they, how to identify the default, and how to update them
- coordinate systems – what are they, how to identify the default, and how to update them
 
- updating non-data elements using pre-built themes and the theme()function
Example 1 (the basics + default stat)
- initialize a plot object
- map aesthetics
- add a geometry layer
- explore default statistical transformations
- reading documentation is important here!
 
Example 2 (override default stat)
- let’s say we have a df that already contains calculated species count values, and we want the height of the bars to be based on those count values:
Example 3 (override default stat mapping)
- e.g. we can display the same data but with y-axis values as proportions rather than counts
Example 4 (default position adjustment)
- position adjustments tweak position of elements to resolve overlapping geoms
- all geoms have a default position (e.g. barplots have position = "stack") – see documentation
- let’s say we now want to visualize penguin counts by species (bar height) and by island (color):
Example 5 & 6 (override default position adjustments)
- position = "fill"creates stacked bars of the same height (easier to compare proportions):
- position = "dodge"places overlapping bars directly beside one another (easier to compare individual values):
Example 7 (alternatively, use position_*())
Example 8 (default coordinate system)
Example 9 & 10 (alternative coordinate systems)
- flip x & y axes:
- polar coordinates:
Example 11 (pre-made themes)
Example 12 (customize further with theme())
Plot #3
Explore penguin flipper lengths. Our goals are to review:
- more position adjustments and scales
- updating plot labels
- faceting
Example 1 (all in one go!)
- initialize a plot object
- map aesthetics
- add a geometry layer
- map color to species
- update position adjustment
- update labels
- facet by species
ggplot(penguins, aes(x = flipper_length_mm, fill = species)) + 
  geom_histogram(position = "identity", alpha = 0.5) +
  scale_fill_manual(values = c("darkorange", "purple", "cyan4")) + 
  labs(x = "Flipper Length (mm)",
       y = "Frequency",
       fill = "Species",
       title = "Penguin Flipper Lengths") +
  facet_wrap(~species, ncol = 1)

















