EDS 240: Lecture 5.2

Colors


Week 5 | February 3rd, 2025

Good data visualization design considers:


  • data-ink ratio (less is more, within reason)
  • how to reduce eye movement and improve readability / interpretability (e.g. through alternative legend positions, direct annotations)
  • putting things in context
  • how to draw the main attention to the most important info
  • consistent use of colors, spacing, typefaces, weights
  • typeface / font choices and how they affect both readability and emotions and perceptions
  • using visual hierarchy to guide the reader
  • color choices (incl. palette types, emotions, readability)
  • how to tell an interesting story
  • how to center the people and communities represented in your data
  • accessibility through colorblind-friendly palettes & alt text

This lesson will focus on the use of colors in a good data visualization.

Why do we use color?

Spend a couple minutes discussing with your Learning Partners the following:

Why and / or when should we use color in data visualizations?

Find an example(s) of a data viz that uses color to convey information to share in #eds-240-data viz. Note some of your own observations about the color choices (i.e. why these colors? palette arrangement?).

02:00

Choosing colors is difficult and they should be purposefully chosen



You’ll probably iterate on them as you sit with your visualization and of course, as you get feedback from others.


Some places to start / things to consider:

  • is using color the best and / or only way to visually represent your variable(s)?
  • are you designing for a particular organization / brand?
  • what emotions are you trying (or not trying) to elicit?
  • who is your audience?
  • are your data commonly represented using a particular color scheme?
  • what data types (e.g. numeric vs. categorical, discrete vs. continuous?) are you working with?

What is color?


There are a number of different color spaces that are used to represent and define color. HSV and HSL are used commonly in color pickers (e.g. Google color picker). HCL underlies some default {ggplot2} parameters.

HSV

HCL

You don’t need to worry much about the underlying theory of color spaces, but know that changing any of the parameters (e.g. hue, saturation, etc.) can influence how we perceive information in a data visualization.

Different color scales for different data types





Categorical scales


  • mainly formed by selecting different hues
  • hues assigned to each group must be distinct and ideally have different lightnesses
  • groups don’t have an intrinsic order
  • limit to no more than 7 hues

Sequential scales


  • colors assigned to data values in a continuum, based on lightness, hue, or both
  • lower values typically associated with lighter colors & higher values associated with darker colors (though not a hard and fast rule; make choices clear with legend)
  • can use a single hue or two hues

Diverging scales


  • combination of two sequential palettes with a shared endpoint at the central value
  • central value is assigned a light color (light gray is best)
  • use a distinctive hue for each of the component palettes

Base plots (for applying color scales to)


We’ll be testing out different palettes throughout this lesson. Instead of having to retype the code for our plots each time, let’s create and save two versions of a penguin scatter plot. We can then call either of these plot objects to modify with different color scales:

library(palmerpenguins)
library(tidyverse)

Requires a categorical color scale

cat_color_plot <- ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species, shape = species)) +
  geom_point(size = 4, alpha = 0.8)

cat_color_plot 

Requires a continuous color scale

cont_color_plot <- ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = body_mass_g)) +
  geom_point(size = 4, alpha = 0.8) 

cont_color_plot 

Ensuring inclusive and accessible design through your color choices

What is colorblindness?


Color vision deficiency aka colorblindness is the decreased ability to see color or differences in color. It’s estimated that about 1 in 12 men (8%) and 1 in 200 women (0.5%) are affected (Wikipedia).

Color plate tests are used to help identify different forms of color blindness. Try using the Let’s get color blind Chrome extension to emulate different forms of colorblindness while looking at the above plates. Image source: American Optometric Association

The problem with rainbow color maps


  • colors don’t follow any natural perceived ordering (no innate sense of higher or lower)

  • perceptual changes in rainbow colors are not uniform (e.g. colors appear to change faster in yellow region than green region)

  • insensitive to color vision deficiencies


Rainbow colormaps aren’t all bad




Problematic, perceptually nonuniform and unordered rainbow colormaps

Improved, perceptual uniform and diverging rainbow colormaps

ALTERNATIVE: Viridis


The viridis color scales are perceptually-uniform (even when printed in gray scale) and colorblindness-friendly:


Continuous viridis scales

Binned viridis scales

There are a number of different ways to apply viridis color scales, but I often opt for scale_*_viridis_*() functions, which come pre-loaded with {ggplot}.

Using viridis color scales


Try out the palette options below, then check out the documentation and play around with some alternative options as well.


Discrete viridis scales

cat_color_plot +
  scale_color_viridis_d(option = "viridis") 

Continuous viridis scales

cont_color_plot +
  scale_color_viridis_c(option = "magma")

02:00

ALTERNATIVE: RColorBrewer


ColorBrewer offers a number of colorblind-friendly color schemes for maps and other graphics. Check them out using {RColorBrewer} or the web-based interface.


RColorBrewer::display.brewer.all(colorblindFriendly = TRUE)

Colorblind-friendly palettes, viewed using display.brewer.all()

ColorBrewer’s web-based interface for exploring palettes

ALTERNATIVE: RColorBrewer



{RColorBrewer} comes with a couple useful functions for quickly viewing and assembling your palette’s HEX codes:


Preview a palette with your number of desired colors:

RColorBrewer::display.brewer.pal(n = 4, name = 'Dark2')

Print the HEX codes of your palette:

RColorBrewer::brewer.pal(n = 4, name = 'Dark2')
[1] "#1B9E77" "#D95F02" "#7570B3" "#E7298A"

Using RColorBrewer color scales


Use the right function (all pre-loaded with {ggplot2}) for the type of data / palette:


Use scale_color_brewer() to apply qualitative palettes

cat_color_plot +
  scale_color_brewer(palette = "Dark2") 

Use scale_color_distiller() for unclassed continuous color scales

cont_color_plot +
  scale_color_distiller(palette = "BuPu")

Use scale_color_fermenter() for classed continuous color scales

cont_color_plot +
  scale_color_fermenter(palette = "YlGnBu")

Check out the documentation and play around with some alternative options.

02:00

Accessibility tip: outline points to make light colors more visible


Rather than color points by body_mass_g, we can fill points by body_mass_g. Then, we need to change the shape of our points to 21, which is the code for an outlined, fill-able point:

ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, fill = body_mass_g)) +
  geom_point(shape = 21, size = 4, alpha = 0.8) +
  scale_fill_distiller(palette = "BuPu")

Accessibility tip: use redundant mapping whenever possible


Recall that colors are low on the hierarchy of elementary perceptual tasks. When possible, avoid conveying important information purely through color – consider how you might additionally use shapes, symbols, typography, or annotations.



There are so many other great pre-made color palettes to explore, many of which take into consideration color vision deficiencies (but always double check!)

Use paletteer to access TONS of pre-made palettes


The {paletteer} package provides a common interface for accessing a near-comprehensive list of palettes (over 2,500!!) across various packages.




  • two groups of palettes: discrete and continuous
  • discrete palettes can be fixed (have a set # of colors) or dynamic (adjustable # of colors based on your specifications)

Take a couple minutes to explore palettes


There are a number of ways to browse the extensive list of supported palette packages, including in the {paletter} documentation, on the r-color-palettes GitHub repo, on the R Color Palettes website, or my personal favorite, the R-Graph Gallery’s Color Palette Finder (built on {paletteer}; see below).

02:00

{paletteer} is useful in a number of ways – let’s consider two options next

1. Apply a palette using scale_*_paletteer_*()


  • where scale_* can be scale_color or scale_fill
  • and paletteer_* can be paletteer_d (discrete), paletteer_c (continuous), or paletteer_binned

superbloom3 palette from {calecopal}

cat_color_plot +
  paletteer::scale_color_paletteer_d("calecopal::superbloom3")

bartlow palette from {scico}

cont_color_plot +
  paletteer::scale_color_paletteer_c("scico::batlow", direction = -1)

2a. Create vector of HEX codes using paletteer_*()


  • be sure to specify the number of colors to number of desired colors (n), and optionally, the direction

GrandBudapest1 palette from {wesanderson}

(discrete, with 3 colors)

pal_d <- paletteer::paletteer_d("wesanderson::GrandBudapest1", n = 3)
pal_d
<colors>
#F1BB7BFF #FD6467FF #5B1A18FF 

Gold-Purple Diverging palette from {ggthemes}

(continuous, with 5 colors)

pal_c <- paletteer::paletteer_c("ggthemes::Gold-Purple Diverging", n = 5)
pal_c
<colors>
#AD9024FF #CBAE4DFF #E3D8CFFF #CA96B9FF #AC7299FF 


We can now apply our palette to our plot using the appropriate ggplot2::scale_*() function.

See the next slide for some commonly used options.

Some common functions for scaling colors


For qualitative (categorical) data A qualitative color scale with 5 distinct colors: dark blue, medium blue, yellow, orange, red. :

  • scale_*_manual()

For quantitative (numeric) data:

Unclassed palettes An unclassed sequential color scale which transitions from light to dark blue moving left to right. :

  • scale_*_gradient(): creates a two color gradient (low-high)
  • scale_*_gradient2(): creates a diverging color gradient (low-mid-high)
  • scale_*_gradientn(): creates a n-color gradient

Classed palettes A classed sequential color scale with 5 binned colors ranging from light blue on the left to dark blue on the right. :

  • scale_*steps(): creates a two color binned gradient (low-high)
  • scale_*_steps2(): creates a diverging binned color gradient (low-mid-high)
  • scale_*_stepsn(): creates a n-color binned gradient

Use the fill variant of the above functions for areas, bars, etc. and the color variant for points, lines, etc.

2b. Apply palette using ggplot2::scale_*()


Examples using our discrete color palette:

pal_d <- paletteer::paletteer_d("wesanderson::GrandBudapest1", n = 3)


apply to scatter plot using the color variant

cat_color_plot + 
  scale_color_manual(values = pal_d)

apply to histogram using the fill variant

ggplot(penguins, aes(x = body_mass_g, fill = species)) +
  geom_histogram() +
  scale_fill_manual(values = pal_d)

2b. Apply palette using ggplot2::scale_*()


Examples using our continuous color palette:

pal_c <- paletteer::paletteer_c("ggthemes::Gold-Purple Diverging", n = 5)


apply to scatter plot as an unclassed palette (use gradientn variant)

cont_color_plot + 
  scale_color_gradientn(colors = pal_c)

apply to scatter plot as a classed (binned) palette (use stepsn variant)

cont_color_plot + 
  scale_color_stepsn(colors = pal_c)

Climate and environmental science visualizations can (should) draw from community standards, when possible

Some widely-used climate science palettes




Figure 4. Appropriate diverging and sequential colour schemes for the following climate data (a), absolute temperature (b), absolute precipitation (c), temperature anomaly (d), precipitation or runoff anomaly (e and f) other climate variables with no symbolic association . Schemes in this figure are 7 class ones designed by Cynthia Brewer, (Brewer et al. 2003)







Want to design your own palette? Here are some helpful guidelines and considerations…

Select hues using color wheels / pickers


There are lots of different variations of color wheels, but look for hues along the outer edge:




Common color models: RYB (used by painters), RGB (used in electonic displays), CMYK (used in modern printing). Image source: medium.com

When using a color picker, adjust the HEX code sliding scale to pick a hue and ensure that the selector is set to the far right edge of the box:

There are lots of great color pickers out there, though Google color picker is a quick one to navigate to. HTML color codes is my personal favorite.


Use color wheels identify color harmonies


Image source: htmlcolorcodes.com



blue-green & red-orange are complementary and therefore offer the strongest possible contrast

Find descriptions of blue-green & red-orange on htmlcolorcodes.com

Hues have associated meaning


We associate meaning with different hues (e.g. cold / sad = blue, hot / angry = red), and importantly, these associations can differ among cultures.

Some associations span multiple cultures




Colors elicit emotional responses


“lightness, brightness, and saturation can communicate the level of seriousness, intensity, and emotional weight in a visual work” -Cédric Scherer

Colors elicit emotional responses


“lightness, brightness, and saturation can communicate the level of seriousness, intensity, and emotional weight in a visual work” -Cédric Scherer


(Right) COVID-19 tracker by the Johns Hopkins University (screenshot from 2020-07-27, courtesy of Cédric Scherer). Red tends to elicit panic / fear. (Left) A map of confirmed COVID-19 cases by Datawrapper (screenshot from 2020-07-27, courtesy of Cédric Scherer). Blues and greens help to avoid such a strong fearful emotional response.

Colors elicit emotional responses




“We show the current or confirmed cases in another color than red. The coronavirus is not a death sentence. Most infected people will survive. If you’re infected, you want to find yourself on a map as a blue (or yellow, or beige, or purple…) dot, not as a “attention, danger, run!”-screaming red dot. Related, we show deaths in black, not red – it feels more respectful.”


Using pure hues can be overwhelming


Though it may be temping to use bright / bold colors to grab attention, it can lead to eye strain and make it more challenging for your readers to focus on your chart.

Use more subdued colors instead


Though it may be temping to use bright / bold colors to grab attention, it can lead to eye strain and make it more challenging for your readers to focus on your chart.

A few approaches for subduing a pure hue


1. adjust the saturation (i.e. the level of intensity of a color)

2. adjust value: tint (add white), tone (add gray), or shade (add black)

3. increase transparency (e.g. using the alpha argument)

Green (HEX #00FF33 / 132° on the color wheel) at 100% saturation

Green (HEX #00FF33 / 132° on the color wheel) at 40% saturation

A few approaches for subduing a pure hue


1. adjust the saturation (i.e. the level of intensity of a color)

2. adjust value: tint (add white), tone (add gray), or shade (add black)

3. increase transparency (e.g. using the alpha argument)

The default chroma for ggplots is set to 100%

ggplot(na.omit(penguins), aes(x = species, fill = sex)) +
  geom_bar()

Use scale_*_hue() to adjust chroma (saturation)

ggplot(na.omit(penguins), aes(x = species, fill = sex)) +
  geom_bar() +
  scale_fill_hue(c = 70)

A few approaches for subduing a pure hue


1. adjust the saturation (i.e. the level of intensity of a color)

2. adjust value: tint (add white), tone (add gray), or shade (add black)

3. increase transparency (e.g. using the alpha argument)

Green (HEX #00FF33 / 132° on the color wheel) with lightness adjusted to 90% (more white)

Green (HEX #00FF33 / 132° on the color wheel) with lightness adjusted to 10% (more black)

A few approaches for subduing a pure hue


1. adjust the saturation (i.e. the level of intensity of a color)

2. adjust value: tint (add white), tone (add gray), or shade (add black)

3. increase transparency (e.g. using the alpha argument)

The default lightness for ggplots is set to 65%

ggplot(na.omit(penguins), aes(x = bill_length_mm, y = bill_depth_mm, color = sex)) +
  geom_point()

Use scale_*_hue() to adjust lightness / darkness

ggplot(na.omit(penguins), aes(x = bill_length_mm, y = bill_depth_mm, color = sex)) +
  geom_point() +
  scale_color_hue(l = 45)

A few approaches for subduing a pure hue


1. adjust the saturation (i.e. the level of intensity of a color)

2. adjust value: tint (add white), tone (add gray), or shade (add black)

3. increase transparency (e.g. using the alpha argument)

Green (HEX #00FF33 / 132° on the color wheel) with default opacity (100%)

Green (HEX #00FF33 / 132° on the color wheel) with opacity reduced to 50%

Building your own color palette


Be sure to consider what we’ve already discussed:

  • ensure that you’re picking colorblind-friendly color combos
  • use color wheels to identify color harmonies
  • think carefully about what emotions / messages your color choices will convey
  • avoid lots of pure / fully-saturated hues


And also consider some other important sources of inspiration:

  • your company or organization’s brand / logo
  • steal colors from your favorite / relevant images using tools like Color Thief
  • use a randomized palette generator, like coolers.co

TIP: Save your palette outside of your ggplot


I recommend saving your palette to a named vector outside of your ggplot – this prevents lengthy palettes from creating cluttered ggplot code and allows you to reuse your palette across multiple plots:

# create palette ----
my_palette <- c("#32DE8A", "#E36414", "#0F4C5C")

# apply to plot ----
cat_color_plot +
  scale_color_manual(values = my_palette) # alternatively, `scale_color_manual(values = c("#32DE8A", "#E36414", "#0F4C5C"))`

TIP: Set color names (1/2)


We should always be consistent with our colors. E.g. if Gentoo penguins are blue in one plot, they should be blue in all plots. Notice that our colors don’t “stick” with the species they represent, but rather they’re applied in the order that they appear in our palette:

my_palette <- c("#32DE8A", "#E36414", "#0F4C5C")

Adelie, Chinstrap & Gentoo penguins

ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  geom_point(size = 4, alpha = 0.8) +
  scale_color_manual(values = my_palette)

Just Adelie & Gentoo penguins

penguins |> 
  filter(species != "Chinstrap") |> 
  ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  geom_point(size = 4, alpha = 0.8) +
  scale_color_manual(values = my_palette)

TIP: Set color names (2/2)


Setting the names of our vector elements (colors) ensures that they stick with those factor levels across all of our visualizations:

my_palette_named <- c("Adelie" = "#32DE8A","Chinstrap" = "#E36414", "Gentoo" = "#0F4C5C")

Adelie, Chinstrap & Gentoo penguins

ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  geom_point(size = 4, alpha = 0.8) +
  scale_color_manual(values = my_palette_named)

Just Adelie & Gentoo penguins

penguins |> 
  filter(species != "Chinstrap") |> 
  ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  geom_point(size = 4, alpha = 0.8) +
  scale_color_manual(values = my_palette_named)

Tip: modify df to apply colors to observations


The scale_*_identity() functions allows you to map aesthetic values from your data frame to individual points. They will not produce a legend unless specified using guide = "legend".

penguins |> 
  mutate(
    my_color = case_when(
      bill_length_mm < 40 ~ "#D7263D",
      between(bill_length_mm, 40, 50) ~ "#E4BB97",
      bill_length_mm > 50 ~ "#386150"
    )
  ) |> 
  ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = my_color)) +
  geom_point(size = 4, alpha = 0.8) +
  scale_color_identity()

penguins |> 
  mutate(
    my_color = case_when(
      body_mass_g > 6000 ~ "#D7263D",
      TRUE ~ "gray50"
    )
  ) |> 
  ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = my_color)) +
  geom_point(size = 4, alpha = 0.8) +
  scale_color_identity(guide = "legend", 
                       name = "Body mass (g)", labels = c(">6000", "<= 6000"))

There are also some additional rules / guidelines that you should pretty much always abide by when selecting colors

High saturation = greater / more important values



It’s okay to use saturated / brighter colors in moderation.

We tend to associate more saturated colors with greater values.

Image source: New York Times

Image source: {ggdensity} pkgdown site.

No more than 7 colors


If you need more than seven colors, consider alternative chart types.



Use colors consistently


Ensure consistent use of colors across multiple visualizations that display the same groups.



Explain what your colors encode


Always include a color key, in the form of a traditional legend or otherwise.



Highlight important values


Use gray for less important groups / values, annotations, contextual information, etc.



Be predictable in your color choices


Use intuitive colors (e.g. green for forest, blue for water) but avoid stereotypes (e.g. pink for women, blue for men).



Try a cold color for men (e.g. blue or purple) and a warmer color for women (e.g. yellow, orange or a warm green; see this great blog post for more information).


Bright = low, dark = high


In most cases, readers will associate bright colors with lower values and darker colors with higher values. Build gradients accordingly.



Except in some cases. . .


“humans perceive bright colors on elevation maps to represent a high altitude, with darker colors representing naturally low-lying and shady areas like valley” (Cédric Scherer, Colors and Emotions in Data Visualization)

Filled contour plot of Mt. Shasta. Image source: EarthLab

USGS Digital Elevation Model of Pohnpei (Micronesia). Image source: PacIOOS

Gradient palettes for continuous data only


Most readers will associate dark colors with “high / important” and bright or light colors with “low / less”. Using a gradient palette with categorical data may imply a ranking of categories where there shouldn’t be.



Use lightness, not just hue, to build gradients


Gradients should also work in black and white.



Two hues are sometimes better than one


Readers are generally better able to distinguish colors on a gradient better if they are encoded through both lightness and two (sometimes three) carefully-selected hues.



Take a Break

~ This is the end of Lesson 2 (of 3) ~

05:00