Skip to main content

Stanford CS448B 15 Deconstructing Visualizations

· 5 min read

TLDR

This article contains my notes from Stanford's CS448B (Data Visualization) course, specifically focusing on the fifteenth lecture about deconstructing visualizations. I'll discuss the importance of classification, mark extraction, data extraction, and redesign.

Original

Download PDF

.

Notes

Forward Thinking

  • For the data explainer project, do we have to find one dataset and create our three visualizations off of that one dataset, or is it alright if we find a high-level topic that we are interested in, and create three visualizations within that topic but using separate datasets?
  • When using social network analysis, how do you validate your findings and/or determine if your findings are statistically significant? Is there an analogous "p value" standard for graph analysis? [Do you use qualitative or quantitative measures of validity?]
  • Why do we go for complex graphs if we can break down a complex concept into multiple, easily digestible graphs [e.g. broken down into strongly connected components]?
    Wouldn't this also help with making the structure more intuitive?

Pixels are a poor representation of charts and graphs

Cannot index, search, manipulate or interact with the data

Goal: Reconstruct higher-level representation of charts and graphs that lets machines and people redesign, reuse and revitalize them


What is a good representation?

Approach

  • Classification: Determine chart type
  • Mark extraction: Retrieve graphical marks
  • Data extraction: Retrieve underlying data table

Classification

Training the Classifier


MethodAccuracy
[Prasad 2007] Multi-class SVM84%
ReVision: Multi-class SVM88%
ReVision: Binary SVM (yes/no per type)96%

Corpus

Over 2500 labeled images and 10 chart types/ ReVision binary SVMs give 96% classification accuracy.

http://vis.berkeley.edu/papers/revision/

Mark and Data Extraction

Assumptions

Bar charts and pie charts only
No shading or texture, 3D, stacked bars, or exploded pies

Extraction Results

Data Extraction Error

Redesign

Limitations

Graphical Overlays

Visual elements that are layered onto a chart to facilitate the perceptual and cognitive processes involved in chart reading


Taxonomy


Demo


Reference Structures

Help by breaking marks into regular segments andaid reading axis values


Highlights

Draws viewers’ attention to specific marks


Redundant Encodings

Emphasize data values or trends


Summary Statistics

Enables comparison with statistics based on the data


Annotation

Provide context and support collaboration

Most overlays only require access to marks

  • Reference structures (marks)
  • Highlights (marks)
  • Redundant encodings (marks and data)
  • Summary statistics (marks)
  • Annotations (marks)

Interactive Documents

How can we facilitate reading text and charts together?

Goal: Extract references between text and chart
Problem: Diversity of writing styles


Example 1: Pew Research

Before:

Skepticism for capitalism is lowest in Brazil (22%), China (19%), Germany (29%) (although East Germans are less supportive than West Germans) and the U.S. (24%). Skepticism for free markets is highest in Mexico (60%) and Japan (60%).

After:

Skepticism for capitalism is lowest in Brazil (22%), China (19%), Germany (29%) (although East Germans are less supportive than West Germans) and the U.S. (24%).Skepticism for free markets is highest in Mexico (60%) and Japan (60%).


Example 2: Economist

Before:

Top earners have attracted more opprobrium as their salaries and the performance of the economy have headed in opposite directions. Europeans and Latin Americans tend to have similar attitudes to the rich; the Anglo-Saxon world is a bit more forgiving.

After:

Top earners have attracted more opprobrium as their salaries and the performance of the economy have headed in opposite directions. Europeans and Latin Americans tend to have similar attitudes to the rich; the Anglo-Saxon world is a bit more forgiving.



Evaluation

Avg. F1 distance: expert specified references vs. crowd pecified references


Deconstructing D3 Charts

Automatically convert D3 code into mapping based representation to enable redesign and style reuse

Automatic Redesign

Can we automatically redesign charts to improve

  • Perceptual effectiveness?
  • Visual aesthetics?
  • Accessibility for vision impaired users?

important

Many specialized collections

  • Scientific: PLOS, JSTOR, ACM DL, ...
  • Web visualizations: D3, Processing, ...
  • News: New York Times, Pew research, ...

How can deconstruction aid search?

  • Search by chart type, data type, marks, data, ...
  • Similarity search with inexact matching
  • Query expansion

Takeaways

A chart is a collection of mappings between data and marks
We can reconstruct this representation from chart bitmaps
Such reconstruction enables redesign, reuse and revitalization