What is the difference between graphs/networks?

Question

Note: read down to below "Question" to find the question.

Background:
In a previous question I asked how to group what I would call nodes on a network graph based on a connectivity matrix. (link)

The nomenclature used for a group of points was "cluster", "community" or "clique".
The general question behind the question is a fundamental in numerics which is "what is self" or "how is a chicken like a sphere". As cognitively dissonant as the hyperbole sounds - it allows image recognition, finite element and finite difference methods. It lives behind the idea of a limit.

When are two distinct things close enough to be considered essentially the same?

Here is the sketch that I used:
enter image description here

Let's say that I had two separate graphs, one with the node labels 1...10, and another with labels a...j but otherwise the same connectivity. Outside of labels they would be the same graph.

What if I added a $k^{th}$ node, or broke a link. They would otherwise be quite similar.

Question:

My question is:
- How do I measure a general "distance" between two graphs?

The answer isn't going to be meaningful if it doesn't in some way address the following:
- What if they do not have the same number of "nodes"?
- Is there something like "image registration" for graphs?

I am not looking for the perfect solution to an unsolved problem. Candidate sub-optimal semi-hacks that work in some cases are the "soup de jour" of this question.

One option is to say "they have to have the same number of nodes, you contrive it as a time-series via some characteristic spanning walk, and then use time-series metrics to compare one graph to another. I don't know if such a thing exists.

Image registration allows sub-pixel overlaying of otherwise dis-similar images of the same object. It uses cross-correlation functions - very fast using FFT methods, and interpolation around the "peak". An analog is to move the graph to image space, then perform image registration.

A general measure might not be about moving to a different domain and using domain-specific methods. It might be a topology solution.

Explanation, references, or sample code are always welcome. Same goes for comments and suggestions.

By the way, I'm not sure why you've chosen graph-based approach to cluster time series, but you might find [my answer](http://datascience.stackexchange.com/a/3764/2452) on different approaches to the problem helpful. — Aleksandr Blekh, Mar 05 '15 at 18:08
The motivation is related to "SAX", Symbolic Agggregate approXimation. In reading through the SAX material, I found that some time-series "fingerprints" are from characterizing state-to-state transition rates. I like the idea of converting 24000 element multivariate time-series to a weighted sum of some number of on the order of 10x10 eigen-transition-probability matrices. — EngrStudent, Mar 05 '15 at 18:58
I see. Interesting approach - never heard of it before. Thank you for clarification (I will look up the information for further reading). Hope that my answer below is helpful. — Aleksandr Blekh, Mar 05 '15 at 19:13
Finding out whether the graphs have the same structure (without regard to labels) is the well known problem of [graph isomorphism](http://en.wikipedia.org/wiki/Graph_isomorphism). It’s an NP problem not known to belong to P nor NPC; particularly, there is no polynomial algorithm known to solve this problem. Therefore just determining if two graphs have distance 0 is hard. — Palec, Mar 05 '15 at 22:15
Anony-Mousse, gung, Nick, Glen and John - I have formatted the question to more clearly state its intended scope. I think that this gives a narrower definition in the new reading - can you please consider re-opening my question? I think Alexandr's answer wasn't bad. — EngrStudent, Mar 06 '15 at 19:45

score 4 · Answer 1 · edited Apr 13 '17 at 12:50

I doubt that it is possible to answer all your questions comprehensively enough in a format of a single answer. Moreover, I don't know how familiar are you with the topic, but I'm not very much, so I would read a bit about graph theory first.

In terms of measuring distance between graphs, perhaps there are many metrics for that, but I'm aware of the Hamming distance. You can use R packages sna, igraph and others for graph-related functionality. In particular, for calculating the Hamming distance, you can use sna::hdist() function.

In terms of graph similarity measures, I'm sure that there is a number of them. Perhaps, you might take a look at SimRank, which is a generalized object similarity measure, based on graph theory.

Since your motivation for this question is using graph similarity for clustering, I think that the following two research papers on this topic will be of your interest. The first paper (Schaeffer, 2007) is more general and IMHO presents an excellent overview of approaches to and methods of graph clustering. The second paper (Zhou, Cheng & Yu, 2009) contains a brief introduction into the topic, but focuses on proposing a novel graph clustering algorithm, based on graph similarity measures.

Finally, if you would want or need to visualize graphs, based on your data, besides the above-mentioned R packages, there is diagram and many others. Moreover, not necessarily R-based graph visualization software, including the standalone one, is referred to in my related answer.

References

Schaeffer, S. E. (2007). Survey: Graph clustering. Computer Science Review, 1(1), 27-64. doi:10.1016/j.cosrev.2007.05.001 Retrieved from http://dollar.biz.uiowa.edu/~street/graphClustering.pdf

Zhou, Y., Cheng, H., & Yu, J. X. (2009). Graph clustering based on structural/attribute similarities. Proceedings of 35th International Conference on Very Large Data Bases VLDB'09, 2(1), 718-729. doi:10.14778/1687627.1687709 Retrieved from http://www.vldb.org/pvldb/2/vldb09-175.pdf

What is the difference between graphs/networks?

1 Answers1