Stanford CS448B 14 Network Analysis
TLDR
This article contains my notes from Stanford's CS448B (Data Visualization) course, specifically focusing on the fourteenth lecture about network analysis. I'll discuss the importance of centrality, community structure, and simulating network models.
Original
Notes
-
Diseases
-
Transportation
-
Actors and movies (bipartite)
Characterizing networks
What does it look like?
- Size?
- Density?
- Centrality?
- Clustering?
- Components?
- Cliques?
- Motifs?
- Avg. path length?
...
Centrality
How far apart are things?
Distance: shortest paths
Shortest path (geodesic path)
- The shortest sequence of links connecting two nodes
- Not always unique
- A and C are connected by 2 shortest paths
- A –E –B -C
- A –E –D -C
Shortest path from 2 to 3: 1
Shortest path from 2 to 3?
Most important node?
Degree centrality (undirected)
Normalized degree centrality
When is degree not sufficient?
Does not capture
- Ability to broker between groups
- Likelihood that information originating anywhere in the network reaches you
Betweenness
Assuming nodes communicate using the most direct (shortest) route, how many pairs of nodes have to pass information through target node?
examples
definition
the number of shortest paths connecting jk
the number that node iis on.
Normalization:
number of pairs of vertices excluding the vertex itself.
When are Cd, Cb not sufficient?
Do not capture
Likelihood that information originating anywhere in the network reaches you
Closeness
definition
Being close to the center of the graph
Closeness Centrality:
Normalized Closeness Centrality
example
Community Structure
How dense is it?
Max. possible edges:
- Directed:
- Undirected:
Is everything connected?
Connected Components - Directed
Strongly connected components
- Each node in component can be reached from every other node in component by following directed links
- B C D E
- A
- G H
- F
Weakly connected components
- Each node can be reached from every other node by following links in either direction
- A B C D E
- G H F
Community finding (clustering)
Hierarchical clustering
Process:
- Calculate affinity weights W for all pairs of vertices
- Start: N disconnected vertices
- Adding edges (one by one) between pairs of clusters in order of decreasing weight (use closest distance to compare clusters)
- Result: nested components
Cluster Dendrograms
Betweenness clustering
Girvan and Newman 2002 iterative algorithm:
- Compute of all edges
- Remove edge i where
- Recalculate betweenness
Simulating network models
Small world network
Milgram (1967)
- Mean path length in US social networks
- ~ 6 hops separate any two people
Watts and Strogatz 1998 a few random links in an otherwise structured graph make the network a small world
Defining small world phenomenon
Pattern:
- high clustering
- low mean shortest path
Examples
- neural network of C. elegans,
- semantic networks of languages,
- actor collaboration graph
- food webs
Summary
Structural analysis
- Centrality
- Community structure
- Pattern finding
Widely applicable across domains