Skip to main content

Paper Reading ShapeWordle

· 6 min read

TLDR

This article contains my notes from the paper "ShapeWordle: Tailoring Wordles using Shape-aware Archimedean Spirals". I'll discuss the definition, the original, the paper's contribution, and the code.

Paper

Download PDF

.

Notes

Definition

Word cloud is a visual representation of text data which is often used to depict keyword metadata on websites, or to visualize free form text

Pros

  • Easy to understand
  • Beauty
  • Highlight key words

Cons

  • Maybe missing data(No space to put)
  • Size is not sensitive

《Word clouds considered harmful》 by Jacob Harris, a New York Times senior software architect (via FlowingData).

Original

https://www.jasondavies.com/wordcloud/about/

The layout algorithm itself is incredibly simple. For each word, starting with the most “important”:

  1. Attempt to place the word at some starting point: usually near the middle, or somewhere on a central horizontal line.

  2. If the word intersects with any previously-placed words, move it one step along an increasing spiral. Repeat until no intersections are found.

Paper's Contribution

  • Shape-aware Archimedean spirals
  • Multy Center
  • Editable
  • Effect evaluation

Shape-aware Archimedean spirals

The Archimedean spiral is one of most widely-used Euclidean spirals,which can be readily defined in polar coordinates:

r(θ)=mθ+br(\theta) = m\theta + b

where θ\theta is the polar angle, rr is the radial distance from the origin, b=r(0)b = r(0) is the initial distance of the starting point from the origin, and mm controls the spacing between successive turns. Having a uniform spacing 2mπ2m\pi between successive turns is an important and useful characteristic of the Archimedean spiral.

The Archimedean spiral can also be expressed in Cartesian coordinates, xx and yy, by using trigonometric functions:

(xy)=r(θ)(cosθsinθ)\left(\begin{array}{c}x\\y\end{array}\right) = r(\theta)\left(\begin{array}{c}cos\theta\\sin\theta\end{array}\right)

Taking the derivatives with respect to θ\theta yields

{dxdθ=mcosθr(θ)sinθdydθ=msinθ+r(θ)cosθ\left\{\begin{array}{l}{\frac{dx}{d\theta} = mcos\theta-r(\theta)sin\theta}\\{\frac{dy}{d\theta} = msin\theta+r(\theta)cos\theta}\end{array}\right.

(dxdθ,dydθ)(\frac{dx}{d\theta},\frac{dy}{d\theta}) is the movement direction (denoted as UU) of the spiral at (x,y)(x,y) in the 2D space. Can decompose UU along N=(cosθ,sinθ)TN=(cos\theta, sin\theta)^T and T=(sinθ,cosθ)TT=(-sin\theta, cos\theta)^T

U=m(cosθsinθ)+r(θ)(sinθcosθ)=mN+r(θ)TU = m \left(\begin{array}{c}cos\theta\\sin\theta\end{array}\right) + r(\theta) \left(\begin{array}{c}-sin\theta\\cos\theta\end{array}\right) = mN + r(\theta)T

where NN and TT are the unit normal vector and unit tangent vector, resp.,at point (x,y)(x,y) on a circle of radius x2+y2\sqrt{x^2+y^2} co-centered with the spiral.Such a circle can be interpreted as the isoline of an underlying distance field ϕ(x,y)=x2+y2\phi(x,y) = \sqrt{x^2+y^2} which measures the Euclidean distance from (x,y)(x,y) to the origin.

Computing the distance field. A distance field is an effective shape representation that has been used for edge bundling and trail data visualization. It is a scalar field that specifies the shortest distance to a shape contour specified by a distance transform R2R+R^2 \rightarrow R_+.

ϕ(pR2,Ω)=minqΩpq\phi(p\in R^2,\Omega) = min_{q\in\Omega}||p-q||

Extending the Archimedean spiral.To extend the Archimedean spiral to be shape-aware, the main question is how to guide the movement of the spiral, or how to define the movement direction of the spiral at any point pp in the given shape. Rather than explicitly constructing the isoline and then computing the isoline normal at pp, take the gradient of the distance field as NN. This strategy can accurately approximate the normal for continuous scalar fields, like ϕ\phi. Once NN is available, TT can be easily obtained because it is a unit vector that is orthogonal to NN.Then can rewrite the equation in a differential form:

(dxdy)=mdθN+rdθT\left(\begin{array}{c}dx\\dy\end{array}\right) = md\theta N + rd\theta T

However, using the same θ at every point in an arbitrary shape might not be proper, since the generated spiral might not be able to adapt to regions of high-curvature.

To characterize such sharp features, authors consider the local curvature along the spiral and approximate the curve by small tangential movements (denoted by dsds) perpendicular to NN By RdηRd\eta and also by RdθRd\theta, where R is the local curvature radius and η\eta is a user-specified parameter for the angular speed.

dθ=Rdηrd\theta = \frac{Rd\eta}{r}

also same like with:

(dxdy)=mRdηrN+RdηT\left(\begin{array}{c}dx\\dy\end{array}\right) = \frac{mRd\eta}{r}N + Rd\eta T

Multi-centric

Shape Segmentation. Given a shape, detect connected components in the shape and generate a distance field per component. Then use an iterative gradient-descent procedure to locate the local maxima(s) and the associated shape region(s) (called as parts) in each component. This is allowed to implicitly segment a component into a few parts.

Word Assignment. Given a list of words to fill a shape, first set a font size for each word such that the sum of the areas of all the words is 70% of the total shape area.Then use a greedy strategy to assign words to the different parts of the shape.Denoting pi,jp_{i,j} as the j-th part of the i-th component, Ai,jA_{i,j} as the area of pi,jp_{i,j}, and NN as the total number of input words, the number of words to be assigned to pi,jp_{i,j} is:

Ai,juvAu,vN\frac{A_{i,j}}{\sum_{u}\sum_{v}A_{u,v}}N

Assuming that the word with the largest weight should be assigned to the largest part, then define the largest weight of the words in each part as:

Wi,j=Ai,jmaxu,vAu,vW_{i,j} = \frac{A_{i,j}}{max_{u,v}A_{u,v}}

Editable

info

Although the author has made great efforts and remarkable achievements in this regard, I have not encountered suitable character scenes for the interaction of this part of the content. So just skip it for now.

Code

warning

The demo requires loading OpenCV resources. If the demo appears blank, it may be due to resource loading failure. You can try reloading using the button below.

tip

In the demo above, the code for ShapeWordle has been reorganized by me. If you need to view the original source code of the paper, please visit https://github.com/RealKai42/Shape_Wordle.
I'm not sure the code is 100% correct. Some place maybe need correct and optimize.