Node similarity as a network formation model

Node similarity

Node similarity describes how many properties two nodes have in common (e.g., similar interest). Sometimes, node similarity refers to structural similarity or closeness of nodes in the network structure, but this is not meant here in this context.

Why do people connect? Similarity is often seen as a major factor for connectivity in social networks. People tend to connect with those sharing similar interests, tastes, beliefs, social backgrounds, and also similar popularity. This is often expressed by the adage ‘Birds of a feather flock together’. Also in biology, interactions between proteins or other molecules require an exact fit or complementarity of their complex surfaces which can be treated synonymously with similarity in the context of connectivity.

Node similarity as a network formation model can reproduce the frequently observed power-law (scale-free) distributions of sparsely connected networks. But more important, we can study networks of different link densities: from sparsely (power-law) to densely (non-power-law) connected networks by using different similarity thresholds. A similarity model is able to reproduce the characteristics of different densities in real networks, and hence it can be used as a model for describing the topological transition from weakly to strongly connected societies.

→ Read the full article...

From sparsely connected to densely connected networks

Complex networks of a highly connected society

Node-degree distributions of increasingly connected networks. Complex networks of different link densities show very different node-degree distributions, as shown in the Flickr network analysis (pdf). While sparsely connected networks (A) show the typically observed scale-free power-law like distribution, an increased density of interconnections (E) leads to distributions that are very distinct from power-law. This can be reproduced by a node similarity model in which different thresholds can be used to define a link and hence to generate networks of different link densities.

Matlab code of the similarity based network formation model

Example of generating a power-law like network topology (Fig. A) by using a similarity model based on Euclidean distance

  % generating a random data matrix according to m=100 properties of N=8000 nodes

  % Euclidean distance between column-vectors in X
        XX = sum(X.^2,1);
        D = XX( ones(size(XX,2),1) ,:) + XX( ones(size(XX,2),1) ,:)' - 2*X'*X;
        D = sqrt(D); 
  % set threshold of similarity for defining a link   
  % (for getting a power-law like distributed network: th=11 in case of a 100x8000 random matrix X)
        th = 11  

  % get the adjacency matrix    
        A= D<th;  

The node-degree distribution can be calculated by using the script cn_node_degree_distribution.m and then plotted in log-log scales by:

     [k,pk,nk] = cn_node_degree_distribution(A);


Node similarity as a basic principle behind connectivity in complex networks
Journal of Data Mining & Digital Humanities (2015) jdmdh:77

Matthias Scholz