Skip to main content

Stanford CS448B 14 Network Analysis

· 4 min read

TLDR

This article contains my notes from Stanford's CS448B (Data Visualization) course, specifically focusing on the fourteenth lecture about network analysis. I'll discuss the importance of centrality, community structure, and simulating network models.

Original

Download PDF

.

Notes

  • Diseases

  • Transportation

  • Actors and movies (bipartite)

Characterizing networks

What does it look like?


  • Size?
  • Density?
  • Centrality?
  • Clustering?
  • Components?
  • Cliques?
  • Motifs?
  • Avg. path length?
    ...

Centrality

How far apart are things?

Distance: shortest paths

Shortest path (geodesic path)

  • The shortest sequence of links connecting two nodes
  • Not always unique

  • A and C are connected by 2 shortest paths
  • A –E –B -C
  • A –E –D -C

Shortest path from 2 to 3: 1

Shortest path from 2 to 3?


Most important node?

Degree centrality (undirected)

CD=d(ni)=Ai+=jAijC_D = d(n_i)=A_{i+}=\sum_jA_{ij}

Normalized degree centrality

CD(i)=d(i)N1C_D(i) = \frac{d(i)}{N-1}

When is degree not sufficient?

Does not capture

  • Ability to broker between groups
  • Likelihood that information originating anywhere in the network reaches you

Betweenness

Assuming nodes communicate using the most direct (shortest) route, how many pairs of nodes have to pass information through target node?

examples

definition
CB(i)=j,ki,j<kgjk(i)/gjkC_B(i)=\sum_{j,k \neq i,j<k}g_{jk}(i)/g_{jk}

gjkg_{jk} the number of shortest paths connecting jk

gjk(i)g_{jk}(i) the number that node iis on.

Normalization:

CB(i)=CB(i)/[(n1)(n2)/2]C_B'(i)=C_B(i)/[(n-1)(n-2)/2]

number of pairs of vertices excluding the vertex itself.


When are Cd, Cb not sufficient?

Do not capture

Likelihood that information originating anywhere in the network reaches you

Closeness

definition

Being close to the center of the graph

Closeness Centrality:

CC(i)=[j=1,jiNd(i,j)]1C_C(i)=[\sum_{j=1,j \neq i}^Nd(i,j)]^{-1}

Normalized Closeness Centrality

CC(i)=(CC(i))/(N1)=N1j=1,jiNd(i,j)C_C'(i)=(C_C(i))/(N-1)=\frac{N-1}{\sum_{j=1,j \neq i}^Nd(i,j)}
example

Community Structure

How dense is it?

density=e/emaxdensity=e/e_{max}

Max. possible edges:

  • Directed: emax=n(n1)e_{max}=n*(n-1)
  • Undirected: emax=n(n1)/2e_{max}=n*(n-1)/2

Is everything connected?


Connected Components - Directed

Strongly connected components

  • Each node in component can be reached from every other node in component by following directed links
    • B C D E
    • A
    • G H
    • F

Weakly connected components

  • Each node can be reached from every other node by following links in either direction
    • A B C D E
    • G H F

Community finding (clustering)


Hierarchical clustering

Process:

  • Calculate affinity weights W for all pairs of vertices
  • Start: N disconnected vertices
  • Adding edges (one by one) between pairs of clusters in order of decreasing weight (use closest distance to compare clusters)
  • Result: nested components


Cluster Dendrograms


Betweenness clustering

Girvan and Newman 2002 iterative algorithm:

  • Compute CbC_b of all edges
  • Remove edge i where Cb(i)==max(Cb)C_b(i) == max(C_b)
  • Recalculate betweenness

Simulating network models

Small world network

Milgram (1967)

  • Mean path length in US social networks
  • ~ 6 hops separate any two people

Watts and Strogatz 1998 a few random links in an otherwise structured graph make the network a small world

Defining small world phenomenon

Cnetwork>>CrandomgraphC_{network}>>C{random graph}
lnetworkln(N)l_{network} \approx ln(N)

Pattern:

  • high clustering
  • low mean shortest path

Examples

  • neural network of C. elegans,
  • semantic networks of languages,
  • actor collaboration graph
  • food webs

Summary

Structural analysis

  • Centrality
  • Community structure
  • Pattern finding

Widely applicable across domains