This answer to ‘What is a good color scheme for representing multiple data sets on a scatter plot?‘ shows examples of
- Ineffective versus effective scatter plots.
- How helps represent multiple data sets on a scatter plot.
- Effective color use (symbolisms, intensity, size) to convey quantitative information when displaying scientific data.
espouses and vividly demonstrates in his collector-quality books the precepts for excellent data presentation. Today his influence is so pervasive, in the form of and small multiples for example, that many of the best approaches to visually display data, in the New York Times or in Oliver Uberti’s work for the National Geographic for example, borrow heavily from him, as does this answer.
Ineffective versus Effective Scatter Plot
According to Tufte, effective visual data display focuses on the ratio of data : non-data ink and minimizes non-data ink (aka chart junk) as much as possible. See a compelling example of a scatter plot below from his book, ‘The Visual Display of Quantitative Information‘ (apologies, both figures photographed by arguably the world’s worst photographer, yours truly).
As this example shows, when design transparency is accorded the importance it inherently deserves, information rather than its presentation is maximized.
In his book, ‘Envisioning Information‘ (3), Tufte defined small multiples as (),
“Illustrations of postage-stamp size are indexed by category or a label, sequenced over time like the frames of a movie, or ordered by a quantitative variable not used in the single image itself.”
Small multiples, i.e., series of similar graphs, help
- Make large data sets coherent by breaking down the data into more easily digestible ‘small bites’.
- The reader/audience quickly shift from figuring how the figure works to digesting the information it conveys.
- Draw the eyes to compare between data sets and make the trends/patterns pop out.
- Engender simultaneous engagement with the data at different levels, both as broad overview and in fine grain.
Both aesthetics and need to accurately convey quantitative information should guide the choice of color in data visualization. In practical terms, this means
- Usage should be consistent, i.e., same color used throughout the data series or paper to convey the same attribute.
- Universally accepted color symbolisms can be leveraged to one’s advantage. For example, in traffic systems, green, amber/yellow/orange and red have stereotypical meanings. These can be easily extrapolated to convey similar meaning about biological data, for example, green to convey stimulation/response increase and red to convey inhibition/response reduction.
- Black, white and shades of gray can be used to convey quantitative differences between groups within a data set. For example, rather than use the entire rainbow hue for a dose-response series, choosing white for zero/No Rx and black for maximum dose with shades of increasing gray for doses between zero and maximum would reduce unnecessary noise and communicate information within the data series far more effectively.
- Keep color-blindness in mind when choosing colors in figures. One suggestion is to replace red with magenta in multi-color plots to accommodate those with red-green colorblindness for example (see other tips in ).
- Color combined with size to convey quantitative information embodies the adage, ‘less is more‘. Symbol sizes can thus be used to accurately convey quantitative differences, i.e., effect size (see an example below from ).
- Color intensity can be used to convey quantitative information (see a beautiful example below from ).
See below () scatter plots that use both small multiples and color to try to display a substantial amount of data while trying to adhere to Tufte’s precepts about good visual data display. Things I’d do differently today:
- Use either different colors alone or symbols alone, not together (overkill).
- Use a less harsh, less obtrusive approach to indicate the average.
- Make the axes and tick marks thinner (less obtrusive).
IMO, scientists interested in improving their data visualization skills would benefit from attending Tufte’s one-day courses (), typically held in the DC area and in California. US academic institutions such as the NIH usually sign off on such educational expenses or at least they used to, and course attendance used to come with the bonus of getting all his books for free.
1. Tufte, Edward R. “The visual display of quantitative information.” 1983.
2. Kelley, Stanley, Richard E. Ayres, and William G. Bowen. “Registration and Voting: Putting First Things First1.” American Political Science Review 61.2 (1967): 359-379.
3. Tufte, Edward R. “Envisioning Information.” 1990.
7. Yoo, Ha-Na, Jin-Won Lee, and Jeong-Chil Yoo. “Asymmetry of eye color in the common cuckoo.” Scientific Reports 7.1 (2017): 7612.
8. Kamala, Tirumalai, and Navreet K. Nanda. “Protective response to Leishmania major in BALB/c mice requires antigen processing in the absence of DM.” The Journal of Immunology 182.8 (2009): 4882-4890.