# Graph Generation Algorithms

As you might have already heard about, generative AI—especially those based on large language models like ChatGPT—are quite powerful tools that (are supposed to) serve to augment human routines and reasoning. Specifically, ChatGPT builds on the GPT (Generative Pre-trained Transformer) model that has been pre-trained on LOTS of text data to be able to generate contextually appropriate responses to user input. If you have not tried ChatGPT yet, I highly recommend giving it a go!

Other types of generative AI technologies include the GAN (Generative Adversarial Network) and VAE (Variational Autoencoder) deep learning models. From a simple web search, you can learn more about the applications of generative technologies, ranging from creating poetry to generating realistic-looking images and video. Another important application of generative technologies is graph data, which defines our friendships on Facebook or relationships on LinkedIn. In general, graphs are a powerful abstraction for representing and analyzing complex systems, from biological systems to social networks. However, generating realistic graphs that capture certain properties of naturally-occuring networks can be challenging. A promising approach to tackle this problem is the use of stochastic Kronecker graphs.

Stochastic Kronecker graphs can be used to generate graphs from a starting seed and by applying Kronecker multiplication/product. The Kronecker product is a mathematical operation that combines two matrices to produce a larger matrix. By repeatedly applying this operation, we can generate a large and complex graph. For example, a starting 2×2 matrix seed could look like this $S = 1/16\cdot\begin{pmatrix}9 & 3 \\ 3 & 1 \end{pmatrix}.$

To generate a graph using the seed $S$, we can apply the Kronecker product on $S$, repeatedly, so that $S^{\otimes t}$ would define a distribution over a graph on $2^t$ vertices such the any entry $(u, v)\in S^{\otimes t}$ would define the probability of node $u$ being connected to node $v$. This process, essentially, defines the generative process of stochastic Kronecker graphs. An alternative way to look at the resulting matrix is that each element of the resulting matrix is chosen to be 1 with some probability $p$, and 0 with probability $1-p$. This probability is chosen independently for each element of the matrix.

It is well-known that many real-world networks have degree distributions that follow a lognormal or power-law distribution. However, stochastic Kronecker graph models cannot always generate graphs with degree distributions that follow lognormal or power-law. The degree distributions seem to have significant oscillations. In a recent work, my friend/collaborator and I (paper forthcoming) analyze techniques that could get rid of oscillations in the degree distribution of generated graphs—see Figure 1 below. We believe our work could lead to a better understanding and classification of graph generation algorithms.