Stanford CS448B 17 VisML
TLDR
This article contains my notes from Stanford's CS448B (Data Visualization) course, specifically focusing on the seventeenth lecture about visualization and machine learning. I'll discuss the importance of understanding data, model training, model evaluation, and the different types of visualizations for machine learning tasks.
Original
Notes
In what ways can visualizations support machine learning tasks?
Understanding Data | Model Training | Model Evaluation |
---|---|---|
What is the quality of the data? | What is the structure of the model? | How accurate is the model output? |
Is the data representative? | How can I refine the model? | How can I explain the model output? |
What features are available in the dataset? | Is this the best technique for modeling the learning task? | Is the model fair? |
Is the test set representative of the dataset as a whole? | How does the model output change with changes in model parameters? | |
Is the data labeled correctly? | Why is the model behaving the way it is? |
Visualizations for Understanding ML Datasets
- Understanding Data Characteristics before Modeling
- Selecting Features for Modeling
- Debugging Data based on Model Outputs
https://pair-code.github.io/facets/
http://archive.ics.uci.edu/ml/datasets/Census+Income
• 1994 Census dataset • ~50k rows • 14 Attributes (Categorical and Integer)
Prediction task is to determine whether a person makes over 50K a year.
https://qz.com/994486/the-way-you-draw-circles-says-a-lot-about-you
https://medium.com/analytics-vidhya/analyzing-sketches-around-the-world-with-sketch-rnn-c6cbe9b5ac80
INFUSE (INteractive FeatUre SElection)
Domain Task: Predict if a patient is at risk of developing diabetes.
- ML Task 1: Comparison of feature selection algorithms. (4 Algorithms )
- ML Task 2: Comparison of classification algorithms. (4 classifiers)
- ML Task 3: Manual selection and testing of feature sets
How expressive/effective is this visualization?
- Clothes Image Dataset
- 37,000 Instances
- 14 Categories
ML Task: Classify article of clothing based on image
Dimensionality Reduction
https://github.com/uwdata/errudite
Error Analysis by:
- Expressive grouping of error instance
- Counterfactual evaluation
How does this approach compare to conventional GUI input elements ?
Visualizations for Modeling
RuleMatrix
Domain Questions:
- What knowledge has the model learned?
- How certain is the model for each piece of knowledge?
- What knowledge does the model utilize to make a prediction?
- When and where is the model likely to fail?
SMILY(Similar Medical Images Like Yours )
Domain Task:
Pathologists need to retrieve visually similar medical images
from past patients (e.g. tissue from biopsies) to reference when
making a medical decision with a new patient.
Control which types of similarity matter
Visualizations for Model Evaluation
Confusion Wheel
In what other ways can we visualize this data?
The What-If Tool
FairSight
https://research.google.com/bigpicture/attacking-discrimination-in-ml/
What is different/unique about visualizing ML data?
Some Guidelines for ML Visualizations
- Visualizations should align with user expertise
- Model Developers and Builders
- Model Users
- Domain Experts
- Non-Experts
- Learners/Students
- Provide effective data representations for the task
- Debugging and Improving Models
- Comparing and Selecting Models
- Interpretability and Explainability
- Teaching ML Concepts
- Support understanding of model uncertainty
Uncertainty is an inevitable feature of data-driven models in most real-world applications. - Exploit interactivity and promote rich interactions
- Editing data points
- Evaluating Hypotheses
- Constructing Explanations
- Support expressive inputs
- Direct Manipulation
- Query-by-demonstration
Additional Resources
-
A visual introduction to machine learning
http://www.r2d3.us/visual-intro-to-machine-learning-part-1 -
How to Use t-SNE Effectively
https://distill.pub/2016/misread-tsne/ -
The Building Blocks of Interpretability
https://distill.pub/2018/building-blocks/ -
But what is a Neural Network?
https://www.3blue1brown.com/topics/neural-networks -
Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers
Hohman, F., Kahng, M., Pienta, R., & Chau, D. H. (2018).
Visual analytics in deep learning: An interrogative survey for the next frontiers. IEEE transactions on visualization and computer graphics, 25(8), 2674-2693.