EDS 240: Lecture 6.2

Annotations


Week 6 | February 12th, 2024

Good data visualization design considers:


  • data-ink ratio (less is more, within reason)
  • how to reduce eye movement and improve readability / interpretability (e.g. through alternative legend positions, direct annotations)
  • putting things in context
  • how to draw the main attention to the most important info
  • consistent use of colors, spacing, typefaces, weights
  • typeface / font choices and how they affect both readability and emotions and perceptions
  • using visual hierarchy to guide the reader
  • color choices (incl. palette types, emotions, readability)
  • how to tell an interesting story
  • how to center the people and communities represented in your data
  • accessibility through colorblind-friendly palettes & alt text (see week 2 discussion)

This lesson will focus on the use of annotations in a good data visualization.


02:00

02:00



02:00
02:00

Why annotate?


  • clarify meaning / significance of data (especially particular data points or groups)
  • facilitate interpretation
  • build a narrative

The average attention span of an internet user is ~8 seconds (shorter than a goldfish!). It’s imperative that we respect our readers’ time.

Aim to:

  • tell your readers what you want them to see
  • guide your readers eyes & attention
  • remind your readers what they’re looking at

The more time you spend making your visualization crystal clear, the more time you save your readers needing to decipher it.

We’ll be annotating these plots



Metabolism Effects on Foraging Across Temperatures

Adapted from Csik et al. 2023, Figure 5

Mono Lake levels

Borrowed from Allison Horst’s Customized Data Visualization in {ggplot2} materials

These two plots (and likely many others that you’ll create moving forward) will benefit from some custom annotations.

Lobster plot starter code


Note that this starter code incorporates many of the strategies we’ve discussed in past lectures: turning a theme into a function, creating a color palette (and also point shape and size scales), and axis labels outside of the ggplot code, and using {ggtext} to apply markdown to plot text:

##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                                    setup                                 ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

#.........................load libraries.........................
library(tidyverse)

#..........................read in data..........................

# read in Google Sheet ----
lobs <- googlesheets4::read_sheet("https://docs.google.com/spreadsheets/d/1DkDVcl_9rlaqznHfa_v1V1jtZqcuL75Q6wvAHpnCHuk/edit#gid=2143433533") |>
  mutate(temp = as.factor(temp))

# alternatively, read in csv file ----
lobs <- read_csv(here::here("week6", "data", "metabolism-foraging-data.csv")) |>
  mutate(temp = as.factor(temp))

##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                            create lobster plot                           ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

#..........................create theme..........................
lob_theme <- function(){
  theme_light() +
    theme(
      axis.title.x = ggtext::element_markdown(size = 13,
                                              margin = margin(t = 1, r = 0, b = 0, l = 0, unit = "lines")),
      axis.title.y = ggtext::element_markdown(size = 13,
                                              margin = margin(t = 0, r = 1, b = 0, l = 0, unit = "lines")),
      axis.text = element_text(color = "black", size = 12),
      panel.border = element_rect(colour = "black", size = 0.7),
      panel.grid = element_blank(),
      legend.title = element_text(size = 11),
      legend.text = element_text(size = 10),
      legend.position = c(0.95, 0.95),
      legend.justification = c(0.95, 0.95),
      legend.box.background = element_rect(color = "black", size = 1.1)

    )
}

#..........................create scales.........................
lob_palette <- c("11" = "#7B8698",
                 "16" = "#BAD7E5",
                 "21" = "#DC7E7C",
                 "26" = "#7D3E40")

lob_shapes <-  c("11" = 15,
                 "16" = 16,
                 "21" = 17,
                 "26" = 18)

lob_sizes <- c("11" = 6,
               "16" = 6,
               "21" = 6,
               "26" = 7)

#........................create plot text........................
x_axis_lab <- glue::glue("Resting Metabolic Rate<br>
                         (mg O<sub>2</sub> kg<sup>-1</sup> min<sup>-1</sup>)")

y_axis_lab <- glue::glue("Maximum Consumption Rate<br>
                         (prey consumed predator<sup>-1</sup> 24hr<sup>-1</sup>)")

#............................plot data...........................
lob_plot <- ggplot(lobs, aes(x = SMR, y = avg_eaten,
                 color = temp, shape = temp, size = temp)) +
  geom_point() +
  scale_color_manual(values = lob_palette, name = "Temperature (ºC)") +
  scale_shape_manual(values = lob_shapes, name = "Temperature (ºC)") +
  scale_size_manual(values = lob_sizes, name = "Temperature (ºC)") +
  scale_x_continuous(breaks = seq(0, 1.5, by = 0.2)) +
  scale_y_continuous(breaks = seq(0, 35, by = 5)) +
  labs(x = x_axis_lab,
       y = y_axis_lab) +
  lob_theme()

lob_plot

Building custom annotations



There are two primary ways to add custom text annotations:


  1. geom_text() (for plain text) & geom_label() (adds a rectangle behind text), which take aesthetics mappings; these draw the geom once per each row of the data frame
  2. annotate(), which does not take aesthetics mappings and instead draws only the information provided to it


Let’s try to add an annotation to our plot using both approaches to better understand the difference.


Our goal: add a rectangle that bounds / highlights a subset of points, add text nearby that reads Important lobsters, and draw an arrow from the text pointing to the box.

geom_text() + geom_rect() doesn’t look right . . .


Here, we use geom_text() + geom_rect() to add text and a rectangle to our plot. We need to supply coordinates to place each on our plot.

lob_plot +
  geom_text(
    x = 0.1,
    y = 25,
    label = "Important lobsters",
    size = 4,
    color = "black",
    hjust = "inward",
  ) +
  geom_rect(
    xmin = 0.25, xmax = 0.85,
    ymin = 8.5, ymax = 18,
    alpha = 0.5,
    fill = "gray40", color = "black",
    show.legend = FALSE
  )

Notice that our text looks oddly blurry and bold, and our rectangle is opaque (despite adjusting alpha) and has a weird, thick border.

geom_text() inherits aesthetic mappings from ggplot()


Like all other geom_*() functions we’ve worked with, geom_text() & geom_label() take aesthetic mappings. You can either define aes() within the geom, or it’ll inherit global mappings from ggplot() (as in our case).


Here, geom_text() is plotting our label (Important lobsters) and box 22 times each (once for each of the 22 observations in our data frame).

str(lobs)
tibble [22 × 7] (S3: tbl_df/tbl/data.frame)
 $ lobster_id: chr [1:22] "N18" "L4" "N14" "L3" ...
 $ temp      : Factor w/ 4 levels "11","16","21",..: 3 3 3 3 3 3 1 1 1 1 ...
 $ SMR       : num [1:22] 0.709 0.551 0.582 1.084 0.575 ...
 $ MMR       : num [1:22] 4.5 3.75 5.64 4.66 4.85 ...
 $ AAS       : num [1:22] 3.79 3.2 5.06 3.58 4.28 ...
 $ FAS       : num [1:22] 6.35 6.81 9.69 4.3 8.44 ...
 $ avg_eaten : num [1:22] 23.3 11 21.3 9 14.3 ...


It’s also inheriting the size aesthetic for our box border.

# from our `lob_plot` code
scale_size_manual(values = lob_sizes, name = "Temperature (ºC)") 

This is exactly the situation annotate() was made for


Unlike geom_text(), annotate() requires that we define a geom type (e.g. "text", "rect"). We can also remove the show.lengend argument, since annotate() doesn’t produce a legend.

lob_plot +
  annotate(
    geom = "text",
    x = 0.1,
    y = 25,
    label = "Important lobsters",
    size = 4,
    color = "black",
    hjust = "inward"
  ) +
  annotate(
    geom = "rect",
    xmin = 0.25, xmax = 0.85,
    ymin = 8.5, ymax = 18,
    alpha = 0.5,
    fill = "gray70", color = "black"
  )

Note: Determining coordinates for any annotation requires a lot of trial and error. Pick values that you think are close and then tweak from there.

Draw an arrow between our label and rectangle


We can specify the "curve" geom type to draw a curved line. Use the arrow argument + arrow() function to add an arrow tip on the end:

lob_plot +
  annotate(
    geom = "text",
    x = 0.1,
    y = 25,
    label = "Important lobsters",
    size = 4,
    color = "black",
    hjust = "inward"
  ) +
  annotate(
    geom = "rect",
    xmin = 0.25, xmax = 0.85,
    ymin = 8.5, ymax = 18,
    alpha = 0.5,
    fill = "gray70", color = "black"
  ) +
  annotate(
    geom = "curve",
    x = 0.3, xend = 0.5,
    y = 23.8, yend = 19,
    curvature = -0.15,
    arrow = arrow(length = unit(0.3, "cm"))
  )

Use geom_text/label() to annotate each point


geom_text() adds plain text

lob_plot +
  geom_text(aes(label = lobster_id),
            size = 6,
            show.legend = FALSE)

geom_label() adds a rectangle behind text

lob_plot +
  geom_label(aes(label = lobster_id),
             size = 6,
             show.legend = FALSE)

Annotations sit on top of data points, which may be undesirable…

Use {ggrepel} to repel annotations


geom_text() adds plain text

lob_plot +
  ggrepel::geom_text_repel(aes(label = lobster_id),
                           size = 4,
                           color = "gray10",
                           nudge_x = 0.1, nudge_y = 0.3,
                           arrow = arrow(length = unit(0.25, "cm")))

geom_label() adds a rectangle behind text

lob_plot +
  ggrepel::geom_label_repel(aes(label = lobster_id),
                           size = 4,
                           color = "gray10",
                           nudge_x = 0.1, nudge_y = 0.3,
                           arrow = arrow(length = unit(0.25, "cm")))

Manually label just a few important points


If we have just a few lobsters that we want to call attention to, we can use annotate() to label them. Let’s start with lobster IV10:

lob_plot +
  annotate(
    geom = "text",
    x = 0.3, y = 20.1,
    label = "IV10",
    hjust = "left",
    size = 5
    ) +
  annotate(
    geom = "curve",
    x = 0.29, xend = 0.184,
    y = 20, yend = 9.43,
    arrow = arrow(length = unit(0.3, "cm")),
    linewidth = 0.6
    ) 

Manually label just a few important points


Your turn! Create another text label and arrow pointing to lobster IV19 (the farthest dark red diamond to the right). You don’t need to choose this exact location for your text and arrow:

05:00

Manually label just a few important points


A solution (you may have chosen a different placement for your text and arrow):

lob_plot +
  annotate(
    geom = "text",
    x = 0.3, y = 20.1,
    label = "IV10",
    hjust = "left",
    size = 5
    ) +
  annotate(
    geom = "curve",
    x = 0.29, xend = 0.184,
    y = 20, yend = 9.43,
    arrow = arrow(length = unit(0.3, "cm")),
    linewidth = 0.6
    ) +
  annotate(
    geom = "text",
    x = 1.19,
    y = 5.25,
    label = "IV19",
    hjust = "right",
    size = 5
    ) +
  annotate(
    geom = "curve",
    x = 1.2, xend = 1.31,
    y = 5, yend = 14,
    arrow = arrow(length = unit(0.3, "cm")),
    linewidth = 0.6
    )

Mono Lake plot starter code


##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                                    setup                                 ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

#.........................load libraries.........................
library(tidyverse)

#..........................read in data..........................

# read in Google Sheet ----
mono <- googlesheets4::read_sheet("https://docs.google.com/spreadsheets/d/1o0-89RFp2rI2y8hMQWy-kquf_VIzidmhmVDXQ02JjCA/edit#gid=164128885")

# alternatively, read in csv ----
mono <- read_csv(here::here("week6", "data", "mono.csv"))

##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                            create Mono Lake plot                         ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

ggplot(data = mono, aes(x = year, y = lake_level)) +
  geom_line() +
  labs(x = "\nYear",
       y = "Lake surface level\n(feet above sea level)\n",
       title = "Mono Lake levels (1850 - 2017)\n",
       caption = "Data: Mono Basin Clearinghouse") +
  scale_x_continuous(limits = c(1850, 2020),
                     expand = c(0,0),
                     breaks = seq(1850, 2010, by = 20)) +
  scale_y_continuous(limits = c(6350, 6440),
                     breaks = c(6370, 6400, 6430),
                     expand = c(0,0),
                     labels = scales::label_comma()) +
  theme_light() +
  theme(
    plot.title.position = "plot",
    plot.title = element_text(size = 16),
    axis.title = element_text(size = 12),
    axis.text = element_text(size = 10),
    plot.caption = element_text(face = "italic")
    )

Highlight years of interest


Let’s say we want to call particular attention to the sharp decline in lake surface level between 1941 - 1983 as a result of unrestricted water diversions. Let’s do so using annotate() (note the order of our annotation layers matters!).

ggplot(data = mono, aes(x = year, y = lake_level)) +
  annotate(
    geom = "rect",
    xmin = 1941, xmax = 1983,
    ymin = 6350, ymax = 6440,
    fill = "gray90"
  ) +
  geom_line() +
  labs(x = "\nYear",
       y = "Lake surface level\n(feet above sea level)\n",
       title = "Mono Lake levels (1850 - 2017)\n",
       caption = "Data: Mono Basin Clearinghouse") +
  scale_x_continuous(limits = c(1850, 2020),
                     expand = c(0,0),
                     breaks = seq(1850, 2010, by = 20)) +
  scale_y_continuous(limits = c(6350, 6440),
                     breaks = c(6370, 6400, 6430),
                     expand = c(0,0),
                     labels = scales::label_comma()) +
  annotate(
    geom = "text", 
    x = 1962, y = 6425,
    label = "unrestricted diversions\n(1941 - 1983)",
    size = 3
  ) +
  theme_light() +
  theme(
    plot.title.position = "plot",
    plot.title = element_text(size = 16),
    axis.title = element_text(size = 12),
    axis.text = element_text(size = 10),
    plot.caption = element_text(face = "italic")
    )

Add other important context


We can add any other important information to provide better context for our readers. Let’s say we’re also interested in shrimp abundances, which decline above 6,360 feet. Here, we add a baseline at that elevation, along with text:

ggplot(data = mono, aes(x = year, y = lake_level)) +
  annotate(
    geom = "rect",
    xmin = 1941, xmax = 1983,
    ymin = 6350, ymax = 6440,
    fill = "gray90"
  ) +
  geom_line() +
  labs(x = "\nYear",
       y = "Lake surface level\n(feet above sea level)\n",
       title = "Mono Lake levels (1850 - 2017)\n",
       caption = "Data: Mono Basin Clearinghouse") +
  scale_x_continuous(limits = c(1850, 2020),
                     expand = c(0,0),
                     breaks = seq(1850, 2010, by = 20)) +
  scale_y_continuous(limits = c(6350, 6440),
                     breaks = c(6370, 6400, 6430),
                     expand = c(0,0),
                     labels = scales::label_comma()) +
  annotate(
    geom = "text", 
    x = 1962, y = 6425,
    label = "unrestricted diversions\n(1941 - 1983)",
    size = 3
  ) +
  geom_hline(yintercept = 6360, 
             linetype = "dashed") +
  annotate(
    geom = "text",
    x = 1910, y = 6366,
    label = "Decreased shrimp abundance expected\n(6,360 feet above sea level)",
    size = 3
    ) +
  theme_light() +
  theme(
    plot.title.position = "plot",
    plot.title = element_text(size = 16),
    axis.title = element_text(size = 12),
    axis.text = element_text(size = 10),
    plot.caption = element_text(face = "italic")
    )

Bonus: Annotating facets requires some patience and mapping. We’ll demonstrate on our occupations plot from the last lesson.

Add annotations to separate facet panels


Here, we create a separate data frame with all necessary information (e.g. labels, label positions, arrow positions, etc.) for building our annotations. Then, we use in geom_label() to map this information onto the appropriate facet. This requires a lot of manual adjustment! It helps to focus on one small piece (e.g. one label or one arrow) at a time.

##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                                    setup                                 ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

#..........................load packages.........................
library(tidyverse)
library(showtext)

#..........................import data...........................
jobs <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-03-05/jobs_gender.csv")

#..........................import fonts..........................
font_add_google(name = "Josefin Sans", family = "josefin")
font_add_google(name = "Sen", family = "sen")

#....................import Font Awesome fonts...................
font_add(family = "fa-brands",
         regular = here::here("fonts", "Font Awesome 6 Brands-Regular-400.otf"))
font_add(family = "fa-regular",
         regular = here::here("fonts", "Font Awesome 6 Free-Regular-400.otf")) 
font_add(family = "fa-solid",
         regular = here::here("fonts", "Font Awesome 6 Free-Solid-900.otf"))

#................enable {showtext} for rendering.................
showtext_auto()

##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                                wrangle data                              ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

jobs_clean <- jobs |>

  # add cols (needed for dumbbell plot) ----
  mutate(percent_male = 100 - percent_female, # % of females within each industry was already included
       difference_earnings = total_earnings_male - total_earnings_female) |>  # diff in earnings between M & F

  # rearrange columns ----
  relocate(year, major_category, minor_category, occupation,
           total_workers, workers_male, workers_female,
           percent_male, percent_female,
           total_earnings, total_earnings_male, total_earnings_female, difference_earnings,
           wage_percent_of_male) |>

  # drop rows with missing earning data ----
  drop_na(total_earnings_male, total_earnings_female) |>

  # make occupation a factor ----
  mutate(occupation = as.factor(occupation)) |>

# ---- this next step is for creating our dumbbell plots ----

# classify jobs by percentage male or female ----
  mutate(group_label = case_when(
    percent_female >= 75 ~ "Occupations that are 75%+ female",
    percent_female >= 45 & percent_female <= 55 ~ "Occupations that are 45-55% female",
    percent_male >= 75 ~ "Occupations that are 75%+ male"
  ))

##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                              create subset df                            ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

#....guarantee the same random samples each time we run code.....
set.seed(0)

#.........get 10 random jobs that are 75%+ female (2016).........
f75 <- jobs_clean |>
  filter(year == 2016, group_label == "Occupations that are 75%+ female") |>
  slice_sample(n = 10)

#..........get 10 random jobs that are 75%+ male (2016)..........
m75 <- jobs_clean |>
  filter(year == 2016, group_label == "Occupations that are 75%+ male") |>
  slice_sample(n = 10)

#........get 10 random jobs that are 45-55%+ female (2016).......
f50 <- jobs_clean |>
  filter(year == 2016, group_label == "Occupations that are 45-55% female") |>
  slice_sample(n = 10)

#.......combine dfs & relevel factors (for plotting order).......
subset_jobs <- rbind(f75, m75, f50) |>
  mutate(group_label = fct_relevel(group_label, 
                                   "Occupations that are 75%+ female",
                                   "Occupations that are 45-55% female", 
                                   "Occupations that are 75%+ male"))

##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                                create plot                               ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

#..........................build palette.........................
earnings_pal <- c("males" = "#2D7787",
                  "females" = "#FC6B4B",
                  dark_text = "#0C1509",
                  light_text = "#4E514D") 

#.........................create caption.........................
github_icon <- "&#xf09b"
github_username <- "samanthacsik"

caption <- glue::glue(
  "Data Source: TidyTuesday (March 5, 2019) |
  <span style='font-family:fa-brands;'>{github_icon};</span>
  {github_username}"
)

#........................create subtitle.........................
money_icon <- "&#xf3d1"

subtitle <- glue::glue("Median earnings <span style='font-family:fa-regular;'>{money_icon};</span>
                       of full-time
                       <span style='color:#2D7787;font-size:20pt;'>**male**</span>
                       versus <span style='color:#FC6B4B;font-size:20pt;'>**female**</span>
                       workers by occupation in 2016")

#.................create df with annotation info.................
facet_labs <- data.frame(my_text = c("Males make $23,644 than\nfemales in this female-\ndominated occupation",
                                     "Male & female probation officers &\ncorrectional treatment specialists\nmake about the same",
                                     "Here's another annotation\nwith a horizontal arrow"),
                         group_label = c("Occupations that are 75%+ female", "Occupations that are 45-55% female", "Occupations that are 75%+ male"),
                         text_x = c(70000, 45000, 60000),
                         text_y = c(5, 3, 5),
                         arrow_x = c(80000, 48500, 60000),
                         arrow_xend = c(80000, 48500, 37500),
                         arrow_y = c(6, 4, 5),
                         arrow_yend = c(10, 6.5, 5)) |>
  
  # need to reset factor levels so that facets are ordered as before ----
  mutate(group_label = fct_relevel(group_label, "Occupations that are 75%+ female",
                                   "Occupations that are 45-55% female", "Occupations that are 75%+ male"))

#..........................create plot...........................
ggplot(subset_jobs) +
  geom_segment(aes(x = total_earnings_female, xend = total_earnings_male,
                   y = fct_reorder(occupation, total_earnings), yend = occupation)) +
  geom_point(aes(x = total_earnings_male, y = occupation),
             color = earnings_pal["males"], size = 3.25) +
  geom_point(aes(x = total_earnings_female, y = occupation),
             color = earnings_pal["females"], size = 3.25) +
  facet_wrap(~group_label, nrow = 3, scales = "free_y") +
  scale_x_continuous(labels = scales::label_dollar(scale = 0.001, suffix = "k"),
                     breaks = c(25000, 50000, 75000, 100000, 125000)) +
  labs(title = "Males earn more than females across most occupations",
       subtitle = subtitle,
       caption = caption) +
  geom_segment(
    data = facet_labs,
    mapping = aes(x = arrow_x, xend = arrow_xend,
                  y = arrow_y, yend = arrow_yend),
    linewidth = 1, arrow = arrow(length = unit(0.3, "cm"))
    ) +
  geom_label(
    data = facet_labs,
    mapping = aes(x = text_x, y = text_y, label = my_text),
    size = 3,
    hjust = "left"
    ) +
  theme_minimal() +
  theme(
    plot.title.position = "plot",
    plot.title = element_text(family = "josefin",
                              face = "bold",
                              size = 25,
                              color = earnings_pal["dark_text"]),
    plot.subtitle = ggtext::element_textbox_simple(family = "sen",
                                                   size = 17,
                                                   color = earnings_pal["light_text"],
                                                   margin = margin(t = 0.5, r = 0, b = 1, l = 0, unit = "lines")),
    plot.caption = ggtext::element_textbox(family = "sen",
                                           face = "italic",
                                           color = earnings_pal["light_text"],
                                           margin = margin(t = 3, r = 0, b = 0, l = 0, unit = "lines")),
    strip.text.x = element_text(family = "josefin",
                                face = "bold",
                                size = 12,
                                hjust = 0),
    panel.spacing.y = unit(x = 1, "lines"),
    axis.text = element_text(family = "sen",
                             color = earnings_pal["light_text"]),
    axis.text.x = element_text(size = 10),
    axis.title = element_blank()
  )

Use geom_textbox() to apply Markdown to annotations


The {ggtext} package provides geom_textbox(), which allows for Markdown styling and font family specification. Note that we had to adjust the box coordinates slightly:

#.................create df with annotation info.................
facet_labs <- data.frame(my_text = c("<span style='color:#2D7787;'>**Males**</span> **make $23,644 more than** <span style='color:#FC6B4B;'>**females**</span> in this female-dominated occupation",
                                     "Male & female probation officers & correctional treatment specialists make about the same",
                                     "Here's another annotation with a horizontal arrow"), 
                         group_label = c("Occupations that are 75%+ female", "Occupations that are 45-55% female", "Occupations that are 75%+ male"),
                         x = c(70000, 55000, 65000),
                         y = c(5, 3, 5),
                         arrow_x = c(80000, 48500, 60000),
                         arrow_xend = c(80000, 48500, 37500),
                         arrow_y = c(6, 4, 5),
                         arrow_yend = c(10, 6.5, 5)) |>
  mutate(group_label = fct_relevel(group_label, "Occupations that are 75%+ female",
                                   "Occupations that are 45-55% female", "Occupations that are 75%+ male"))

#..........................create plot...........................
ggplot(subset_jobs) +
  geom_segment(aes(x = total_earnings_female, xend = total_earnings_male,
                   y = fct_reorder(occupation, total_earnings), yend = occupation)) +
  geom_point(aes(x = total_earnings_male, y = occupation),
             color = earnings_pal["males"], size = 3.25) +
  geom_point(aes(x = total_earnings_female, y = occupation),
             color = earnings_pal["females"], size = 3.25) +
  facet_wrap(~group_label, nrow = 3, scales = "free_y") +
  scale_x_continuous(labels = scales::label_dollar(scale = 0.001, suffix = "k"),
                     breaks = c(25000, 50000, 75000, 100000, 125000)) +
  labs(title = "Males earn more than females across most occupations",
       subtitle = subtitle,
       caption = caption) +
  geom_segment(data = facet_labs,
               mapping = aes(x = arrow_x, xend = arrow_xend,
                             y = arrow_y, yend = arrow_yend),
               linewidth = 1, arrow = arrow(length = unit(0.3, "cm"))) +
  ggtext::geom_textbox(data = facet_labs,
                       mapping = aes(x = x, y = y, label = my_text),
                       size = 3,
                       family = "sen") +
  theme_minimal() +
  theme(
    plot.title.position = "plot",
    plot.title = element_text(family = "josefin",
                              face = "bold",
                              size = 25,
                              color = earnings_pal["dark_text"]),
    plot.subtitle = ggtext::element_textbox_simple(family = "sen",
                                                   size = 17,
                                                   color = earnings_pal["light_text"],
                                                   margin = margin(t = 0.5, r = 0, b = 1, l = 0, unit = "lines")),
    plot.caption = ggtext::element_textbox(family = "sen",
                                           face = "italic",
                                           color = earnings_pal["light_text"],
                                           margin = margin(t = 3, r = 0, b = 0, l = 0, unit = "lines")),
    strip.text.x = element_text(family = "josefin",
                                face = "bold",
                                size = 12,
                                hjust = 0),
    panel.spacing.y = unit(x = 1, "lines"),
    axis.text = element_text(family = "sen",
                             color = earnings_pal["light_text"]),
    axis.text.x = element_text(size = 10),
    axis.title = element_blank()
  )

Keep these additional tips, tools & tutorials in mind!


Tools & packages:

Tutorials:

See you next week!

~ This is the end of Lesson 2 (of 2) ~