Paper Reading ShapeWordle
TLDR
This article contains my notes from the paper "ShapeWordle: Tailoring Wordles using Shape-aware Archimedean Spirals". I'll discuss the definition, the original, the paper's contribution, and the code.
Paper
Notes
Definition
Word cloud is a visual representation of text data which is often used to depict keyword metadata on websites, or to visualize free form text
Pros
- Easy to understand
- Beauty
- Highlight key words
Cons
- Maybe missing data(No space to put)
- Size is not sensitive
《Word clouds considered harmful》 by Jacob Harris, a New York Times senior software architect (via FlowingData).
Original
The layout algorithm itself is incredibly simple. For each word, starting with the most “important”:
-
Attempt to place the word at some starting point: usually near the middle, or somewhere on a central horizontal line.
-
If the word intersects with any previously-placed words, move it one step along an increasing spiral. Repeat until no intersections are found.
Paper's Contribution
- Shape-aware Archimedean spirals
- Multy Center
- Editable
- Effect evaluation
Shape-aware Archimedean spirals
The Archimedean spiral is one of most widely-used Euclidean spirals,which can be readily defined in polar coordinates:
where is the polar angle, is the radial distance from the origin, is the initial distance of the starting point from the origin, and controls the spacing between successive turns. Having a uniform spacing between successive turns is an important and useful characteristic of the Archimedean spiral.
The Archimedean spiral can also be expressed in Cartesian coordinates, and , by using trigonometric functions:
Taking the derivatives with respect to yields
is the movement direction (denoted as ) of the spiral at in the 2D space. Can decompose along and
where and are the unit normal vector and unit tangent vector, resp.,at point on a circle of radius co-centered with the spiral.Such a circle can be interpreted as the isoline of an underlying distance field which measures the Euclidean distance from to the origin.
Computing the distance field. A distance field is an effective shape representation that has been used for edge bundling and trail data visualization. It is a scalar field that specifies the shortest distance to a shape contour specified by a distance transform .
Extending the Archimedean spiral.To extend the Archimedean spiral to be shape-aware, the main question is how to guide the movement of the spiral, or how to define the movement direction of the spiral at any point in the given shape. Rather than explicitly constructing the isoline and then computing the isoline normal at , take the gradient of the distance field as . This strategy can accurately approximate the normal for continuous scalar fields, like . Once is available, can be easily obtained because it is a unit vector that is orthogonal to .Then can rewrite the equation in a differential form:
However, using the same θ at every point in an arbitrary shape might not be proper, since the generated spiral might not be able to adapt to regions of high-curvature.
To characterize such sharp features, authors consider the local curvature along the spiral and approximate the curve by small tangential movements (denoted by ) perpendicular to By and also by , where R is the local curvature radius and is a user-specified parameter for the angular speed.
also same like with:
Multi-centric
Shape Segmentation. Given a shape, detect connected components in the shape and generate a distance field per component. Then use an iterative gradient-descent procedure to locate the local maxima(s) and the associated shape region(s) (called as parts) in each component. This is allowed to implicitly segment a component into a few parts.
Word Assignment. Given a list of words to fill a shape, first set a font size for each word such that the sum of the areas of all the words is 70% of the total shape area.Then use a greedy strategy to assign words to the different parts of the shape.Denoting as the j-th part of the i-th component, as the area of , and as the total number of input words, the number of words to be assigned to is:
Assuming that the word with the largest weight should be assigned to the largest part, then define the largest weight of the words in each part as:
Editable
Although the author has made great efforts and remarkable achievements in this regard, I have not encountered suitable character scenes for the interaction of this part of the content. So just skip it for now.
Code
The demo requires loading OpenCV resources. If the demo appears blank, it may be due to resource loading failure. You can try reloading using the button below.
In the demo above, the code for ShapeWordle has been reorganized by me. If you need to view the original source code of the paper, please visit https://github.com/RealKai42/Shape_Wordle.
I'm not sure the code is 100% correct. Some place maybe need correct and optimize.