Stanford CS448B 18 TextVis

September 10, 2022 · 6 min read

TLDR

This article contains my notes from Stanford's CS448B (Data Visualization) course, specifically focusing on the eighteenth lecture about text visualization. I'll discuss the importance of documents, collections of documents, and the different types of visualizations for text data.

Original

Notes

Text as data

Documents
- Articles, books and novels
- Computer programs
- E-mails, web pages, blogs
- Tags, comments
Collection of documents
- Messages (e-mail, blogs, tags, comments)
- Social networks (personal profiles)
- Academic collaborations (publications)

Why visualize text?

Understanding: get the “gist”of a document
Grouping: cluster for overview or classification
Compare: compare document collections, or inspect evolution of collection over time
Correlate: compare patterns in text to those in other data, e.g., correlate with social network

Example: Health Care Reform

Background:

Initiatives by President Clinton
Overhaul by President Obama Text data:
News articles
Speech transcriptions
Legal documents

What questions might you want to answer?
What visualizations might help?

A Concrete Example

Word/Tag Clouds: Word Count

President Obama’s Health Care Speech to Congress

WordTree: Word Sequences

Gulf of Evaluation

Many (most?) text visualizations do not represent text directly. They represent the output of a language model (word counts, word sequences, etc.)
Can you interpret the visualization?
- How well does it convey the properties of the model?
Do you trust the model?
- How does the model enable us to reason about the text?

Stanford CS448B 18 TextVis

Original

Notes

Text as Data

Text Processing Pipeline

Keyword Weighting

Limitations of Frequency Statistics

How do people describe text?

Yelp:Review Spotlight

Tips: Descriptive Keyphrases

Visualizing Document Content

Information Retrieval

Concordance

Glimpses of structure

Phrase Nets [van Ham 2009]

Visualizing Conversation

Usenet Visualization [Viégas]

Themail (Viégas)

Document Collections

ThemeRiver (Havre et al 99)

Termite: Visualizing Topic Models [Chuang ’12]

Stanford Dissertation Browser

Summary

Original​

Notes​

Text as Data​

Text Processing Pipeline​

Keyword Weighting​

Limitations of Frequency Statistics​

How do people describe text?​

Yelp:Review Spotlight​

Tips: Descriptive Keyphrases​

Visualizing Document Content​

Information Retrieval​

Concordance​

Glimpses of structure​

Phrase Nets [van Ham 2009]​

Visualizing Conversation​

Usenet Visualization [Viégas]​

Themail (Viégas)​

Document Collections​

ThemeRiver (Havre et al 99)​

Termite: Visualizing Topic Models [Chuang ’12]​

Stanford Dissertation Browser​

Summary​

Original

Notes

Text as Data

Text Processing Pipeline

Keyword Weighting

Limitations of Frequency Statistics

How do people describe text?

Yelp:Review Spotlight

Tips: Descriptive Keyphrases

Visualizing Document Content

Information Retrieval

Concordance

Glimpses of structure

Phrase Nets [van Ham 2009]

Visualizing Conversation

Usenet Visualization [Viégas]

Themail (Viégas)

Document Collections

ThemeRiver (Havre et al 99)

Termite: Visualizing Topic Models [Chuang ’12]

Stanford Dissertation Browser

Summary