Stanford CS448B 15 Deconstructing Visualizations
TLDR
This article contains my notes from Stanford's CS448B (Data Visualization) course, specifically focusing on the fifteenth lecture about deconstructing visualizations. I'll discuss the importance of classification, mark extraction, data extraction, and redesign.
Original
Notes
Forward Thinking
- For the data explainer project, do we have to find one dataset and create our three visualizations off of that one dataset, or is it alright if we find a high-level topic that we are interested in, and create three visualizations within that topic but using separate datasets?
- When using social network analysis, how do you validate your findings and/or determine if your findings are statistically significant? Is there an analogous "p value" standard for graph analysis? [Do you use qualitative or quantitative measures of validity?]
- Why do we go for complex graphs if we can break down a complex concept into multiple, easily digestible graphs [e.g. broken down into strongly connected components]?
Wouldn't this also help with making the structure more intuitive?
Pixels are a poor representation of charts and graphs
Cannot index, search, manipulate or interact with the data
Goal: Reconstruct higher-level representation of charts and graphs that lets machines and people redesign, reuse and revitalize them
What is a good representation?
Approach
- Classification: Determine chart type
- Mark extraction: Retrieve graphical marks
- Data extraction: Retrieve underlying data table
Classification
Training the Classifier
Method | Accuracy |
---|---|
[Prasad 2007] Multi-class SVM | 84% |
ReVision: Multi-class SVM | 88% |
ReVision: Binary SVM (yes/no per type) | 96% |
Corpus
Over 2500 labeled images and 10 chart types/
ReVision binary SVMs give 96% classification accuracy.
http://vis.berkeley.edu/papers/revision/
Mark and Data Extraction
Assumptions
Bar charts and pie charts only
No shading or texture, 3D, stacked bars, or exploded pies
Extraction Results
Data Extraction Error
Redesign
Limitations
Graphical Overlays
Visual elements that are layered onto a chart to facilitate the perceptual and cognitive processes involved in chart reading
Taxonomy
Demo
Reference Structures
Help by breaking marks into regular segments andaid reading axis values
Highlights
Draws viewers’ attention to specific marks
Redundant Encodings
Emphasize data values or trends
Summary Statistics
Enables comparison with statistics based on the data
Annotation
Provide context and support collaboration
Most overlays only require access to marks
- Reference structures (marks)
- Highlights (marks)
- Redundant encodings (marks and data)
- Summary statistics (marks)
- Annotations (marks)
Interactive Documents
How can we facilitate reading text and charts together?
Goal: Extract references between text and chart
Problem: Diversity of writing styles
Example 1: Pew Research
Before:
Skepticism for capitalism is lowest in Brazil (22%), China (19%), Germany (29%) (although East Germans are less supportive than West Germans) and the U.S. (24%). Skepticism for free markets is highest in Mexico (60%) and Japan (60%).
After:
Skepticism for capitalism is lowest in Brazil (22%), China (19%), Germany (29%) (although East Germans are less supportive than West Germans) and the U.S. (24%).Skepticism for free markets is highest in Mexico (60%) and Japan (60%).
Example 2: Economist
Before:
Top earners have attracted more opprobrium as their salaries and the performance of the economy have headed in opposite directions. Europeans and Latin Americans tend to have similar attitudes to the rich; the Anglo-Saxon world is a bit more forgiving.
After:
Top earners have attracted more opprobrium as their salaries and the performance of the economy have headed in opposite directions. Europeans and Latin Americans tend to have similar attitudes to the rich; the Anglo-Saxon world is a bit more forgiving.
Evaluation
Avg. F1 distance: expert specified references vs. crowd pecified references
Deconstructing D3 Charts
Automatically convert D3 code into mapping based representation to enable redesign and style reuse
Automatic Redesign
Can we automatically redesign charts to improve
- Perceptual effectiveness?
- Visual aesthetics?
- Accessibility for vision impaired users?
Many specialized collections
- Scientific: PLOS, JSTOR, ACM DL, ...
- Web visualizations: D3, Processing, ...
- News: New York Times, Pew research, ...
How can deconstruction aid search?
- Search by chart type, data type, marks, data, ...
- Similarity search with inexact matching
- Query expansion
Takeaways
A chart is a collection of mappings between data and marks
We can reconstruct this representation from chart bitmaps
Such reconstruction enables redesign, reuse and revitalization