Skip to main content

Stanford CS448B 17 VisML

· 4 min read

TLDR

This article contains my notes from Stanford's CS448B (Data Visualization) course, specifically focusing on the seventeenth lecture about visualization and machine learning. I'll discuss the importance of understanding data, model training, model evaluation, and the different types of visualizations for machine learning tasks.

Original

Download PDF.

Notes

In what ways can visualizations support machine learning tasks?

Understanding DataModel TrainingModel Evaluation
What is the quality of the data?What is the structure of the model?How accurate is the model output?
Is the data representative?How can I refine the model?How can I explain the model output?
What features are available in the dataset?Is this the best technique for modeling the learning task?Is the model fair?
Is the test set representative of the dataset as a whole?How does the model output change with changes in model parameters?
Is the data labeled correctly?Why is the model behaving the way it is?

Visualizations for Understanding ML Datasets

  • Understanding Data Characteristics before Modeling
  • Selecting Features for Modeling
  • Debugging Data based on Model Outputs

https://pair-code.github.io/facets/


http://archive.ics.uci.edu/ml/datasets/Census+Income

• 1994 Census dataset • ~50k rows • 14 Attributes (Categorical and Integer)

Prediction task is to determine whether a person makes over 50K a year.


https://qz.com/994486/the-way-you-draw-circles-says-a-lot-about-you


https://medium.com/analytics-vidhya/analyzing-sketches-around-the-world-with-sketch-rnn-c6cbe9b5ac80


INFUSE (INteractive FeatUre SElection)

Domain Task: Predict if a patient is at risk of developing diabetes.

  • ML Task 1: Comparison of feature selection algorithms. (4 Algorithms )
  • ML Task 2: Comparison of classification algorithms. (4 classifiers)
  • ML Task 3: Manual selection and testing of feature sets

How expressive/effective is this visualization?


  • Clothes Image Dataset
  • 37,000 Instances
  • 14 Categories

ML Task: Classify article of clothing based on image

Dimensionality Reduction


https://github.com/uwdata/errudite

Error Analysis by:

  1. Expressive grouping of error instance
  2. Counterfactual evaluation

How does this approach compare to conventional GUI input elements ?

Visualizations for Modeling

RuleMatrix

Domain Questions:

  • What knowledge has the model learned?
  • How certain is the model for each piece of knowledge?
  • What knowledge does the model utilize to make a prediction?
  • When and where is the model likely to fail?


SMILY(Similar Medical Images Like Yours )

Domain Task:
Pathologists need to retrieve visually similar medical images from past patients (e.g. tissue from biopsies) to reference when making a medical decision with a new patient.

Control which types of similarity matter

Visualizations for Model Evaluation

Confusion Wheel

In what other ways can we visualize this data?


The What-If Tool


FairSight

https://research.google.com/bigpicture/attacking-discrimination-in-ml/

What is different/unique about visualizing ML data?

Some Guidelines for ML Visualizations

  1. Visualizations should align with user expertise
    • Model Developers and Builders
    • Model Users
    • Domain Experts
    • Non-Experts
    • Learners/Students
  2. Provide effective data representations for the task
    • Debugging and Improving Models
    • Comparing and Selecting Models
    • Interpretability and Explainability
    • Teaching ML Concepts
  3. Support understanding of model uncertainty
    Uncertainty is an inevitable feature of data-driven models in most real-world applications.
  4. Exploit interactivity and promote rich interactions
    • Editing data points
    • Evaluating Hypotheses
    • Constructing Explanations
  5. Support expressive inputs
    • Direct Manipulation
    • Query-by-demonstration

Additional Resources