Paper Reading SineStream
TLDR
This article contains my notes from the paper "SineStream: Improving the Readability of Streamgraphs by Minimizing Sine Illusion Effects". I'll discuss the problem statement, the evaluation criteria, the authors' approach, and the case study.
Paper
Notes
Stream Graph
Definition
A stream graph, sometimes written as streamgraph, is a stacked plot around a "central axis," resulting in a flowing, organic shape. Stream graphs demonstrate how topics evolve.
Problem Statement:
Given a time series and a baseline compute the top layer to as fllows:
The quesion of how to compute remains open. Two simple strategies are
- Simple model:
- Symmetric around the x-axis:
Remark:
, thus
How to compute the baseline
Reference: Stacked Graphs - Geometry and Aesthetic, L. Byron, M. Wattenberg
-
Simple model:
-
Solution 1: minimize sihouette (same as symmetric model)
silhouette: -
Solution 2: minimize the deviation from the x-axis(minimizes the wiggle)
deviation:
Sine Illusion Effects
(a) A line with uniform thickness is drawn along a sinusoidal curve. Perception leads us to using the orthogonal distance rather than the vertical distance to determine its thickness. (b) The green layer with dotted border in the streamgraph seems to have a constant thickness. However, a peak occurs in its vertical thickness (c).
Approach:
As the geometry of a streamgraph is controlled by its baseline (the bottom-most curve) and the ordering of the layers.
The authors re-interpret baseline computation and layer ordering algorithms in terms of reducing sine illusion effects.
For baseline computation, improve previous methods by introducing a Gaussian weight to penalize layers with large thickness changes.
For layer ordering, three design requirements are proposed and implemented through a hierarchical clustering algorithm.
Baseline computation
Evaluation criteria
- Wiggle metric
- Byron and Wattenberg:
- Bartolomeo and Hu:
- Byron and Wattenberg:
However, even though it is widely used, using the wiggle metric to optimize a streamgraph layout is mainly based on empirical observation and lacks a clear perception foundation. This is why authors want to introduce the sine illusion at this point.
- Sine illusion:
Authors' approach
Authors modify the original weight w i = f i by a Gaussian weight to reduce the influence of a layer when its thickness undergoes big changes:
where c can be either the median, arithmetic mean, harmonic mean, or geometric mean of the
Layer ordering algorithms
Traditional calculation
-
Byron and Wattenberg: LateOnset
-
Bartolomeo and Hu: TopOpt
Authors' approach
A comparison of different ordering algorithms is illustrated in the picture. Using LateOnset, layers are added to the streamgraph based on their start time. New layers (e.g., Layer 2 in Pink) are usually put on a slanted baseline, which introduces distortions and sine illusion effects to these new layers. TwoOpt tends to put thick layers (Layer 6 in Light Green) in the middle, resulting in large distortions and strong sine illusion effects at the neighboring layers. Compared to LateOnset and TwoOpt, authors' ordering algorithm (c) leads to a visually pleasing streamgraph. Orthogonal and vertical orientations are aligned in most layers, thus sine illusions are minimized.
- compensation degree
Authors define the compensation degree to describe the mutual compensation for every two layers as:
where L indicates the length of the combined layer. Authors define when
- thickness weight
Authors introduce a thickness weight to describe our preference for the compensation of relatively thinner layers:
where denotes the value of at the time point and indicates the number of time points.
- length weight
Authors use a length weight to describe our preference for the compensation of relatively longer layers:
Where is the length of layer
- dist
The smaller , the higher the priority that should be given to ensure that layer and layer are adjacent to each other.
- Hierarchical-Clustering-Based Ordering.
At each time step (numbered in the blue circle), the two layers , with the shortest distance are merged to obtain a new combined layer . Then calculate the distances between this new layer and all other layers, and repeat merging.
To guarantee an ordering that minimizes sine illusions, authors create the final order by minimizing the sum of distances between adjacent layers:
where the th and th layers correspond to two adjacent leaf nodes of the hierarchical clustering tree.
Case
In the demo above, the code for SineStream has been reorganized by me. If you need to view the original source code of the paper, please visit https://github.com/Ideas-Laboratory/SineStream. If you're interested in exploring other mature stream layout methods, you can visit https://d3js.org/d3-shape/stack.