EDS 240: Lecture 1.2

Data visualization: an intro


Week 1 | January 6th, 2025

What is data visualization?


“…the practice of designing and creating easy-to-communicate and easy-to-understand graphic or visual representations of a large amount of complex quantitative and qualitative data and information with the help of static, dynamic or interactive visual items.”

-from Wikipedia



Created using {ggplot2}

Created using {gganimate}

Created using {shiny}

What is data visualization?




“any graphical representation of information and data”


“part art and part science”

(A little bit of the) History of data visualization


16,500 years ago, Pleistocene

The Lascaux Cave Paintings are thought to be some of the first charted stars and constellations.

(A little bit of the) History of data visualization


~1150 BC

Oldest known geologic map, the Turin Papyrus Map, depicts a dry riverbed (Wadi Hammamat) and major mining region in Egypt’s Eastern Desert.


(A little bit of the) History of data visualization


1400 - 1532 AD

Quipus (kee-poos) were recording devices used by the Inca Empire for data collection, census records, calendaring, etc.


(A little bit of the) History of data visualization


1644

Michael Florent van Langren, Flemish astronomer, created the first (known) statistical graph showing differences in estimates of longitudinal distance between Toledo and Rome.


(A little bit of the) History of data visualization


1786

William Playfair, Scottish engineer and political economist, is credited as the creator of the first bar chart (featuring Scottish trade data, 1780 - 1781), as well as line and pie charts.



(A little bit of the) History of data visualization


1846

Emma Hart Willard, America’s first professional female cartographer, created the Temple of Time, which depicts the fall and rise of empires throughout history. It won a medal at the 1851 World’s Fair in London.



(A little bit of the) History of data visualization


1856

Florence Nightingale was an English wartime nurse who campaigned to improve sanitary conditions of military hospitals. The Diagram of the Causes of Mortality in the Army of the East shows that deaths from preventable diseases (blue) outnumbered combat fatalities (red) in military hospitals in 1854 & early 1855.



(A little bit of the) History of data visualization


1869

Charles Minard, a French civil engineer, produced what is referred to as, “the greatest visualization created.” Napoleon’s Russian Campaign displays 6 types of data in 2D (# troops, distance traveled, temperature, lat / lon, direction of travel, location relative to specific dates).



(A little bit of the) History of data visualization


1900

W.E.B. DuBois, was an African American writer, scholar and activist. He used photographs and data visualizations to commemorate the lives of African Americans at the turn of the century and challenge the racist caricatures and stereotypes of the day.

Assessed value of household and kitchen furnitures owned by Georgia Negroes Recreation by Ijeamaka Anyene for the 2021 #DuBoisChallenge

Proportion of Freemen and Slaves Among American Negroes. Recreation by Luis Freites for the 2021 #DuBoisChallenge. Source: Twitter



(A little bit of the) History of data visualization


The emergency of programming languages and tools in recent years has made data visualization design easier than every before.

Why do we visualize data?

Spend the next few minutes discussing with your Learning Partners, and if possible, pull up some example visualizations that demonstrate your thoughts / discussion points

03:00

. . . to answer questions / derive insights


Fig Caption: Unusual climate anomalies in 2023 (the red line, which appears bold in print). Sea ice extent (a, b), temperatures (c–e), and area burned in Canada (f) are presently far outside their historical ranges. These anomalies may be due to both climate change and other factors. Sources and additional details about each variable are provided in supplemental file S1. Each line corresponds to a different year, with darker gray representing later years.


A nice Twitter thread on key takeaways from the above paper

. . . to explore & generate new questions


Exploratory data analysis (EDA) is not a formal process with a strict set of rules. More than anything, EDS is a state of mind…you should feel free to investigate every idea that occurs to you. Some of these ideas will pan out, and some will be dead ends. As your exploration continues, you will hone in on a few particularly productive insights that you’ll eventually write-up and communicate to others.”

-Hadley Wickham, author of R for Data Science

ggplot(diamonds, aes(x = carat)) +
  geom_histogram(binwidth = 0.5)

ggplot(mpg, aes(x = fct_reorder(class, hwy, median), y = hwy)) +
  geom_boxplot()

ggplot(diamonds, aes(x = price, y = after_stat(density))) + 
  geom_freqpoly(aes(color = cut), binwidth = 500, linewidth = 0.75)

. . . to prompt discussion




. . . to prompt discussion




gif created from Antti Lipponen’s Temperature Anomolies.

. . . to prompt discussion




gif created from Mark SubbaRao’s Climate Spiral. For a similar visualization with accompanying {ggplot2} code, see Nicola Rennie’s TidyTuesday contribution!

. . . to create art / tell a story




Patchwork Kingdoms, by Nadieh Bremer portraying the “digital divide” in schools across the world

. . . to create art / tell a story


To enlarge, Right click > Open Image in New Tab



Vertices of Visualization


Why R for data viz?






  • great ecosystem of data wrangling & visualization packages (inc. a massive and growing collection of {ggplot2} extensions)
  • amazing online learning communities
  • data viz fundamentals apply no matter the language / tool



Take a Break

~ This is the end of Lesson 2 (of 3) ~

05:00