Graph Theory - Creating an Index of Familiarity, Given Trade Frequency Counts

Question

Set Up

I'm hoping to create an "index of familiarity" between traders on a barter market.

I have data from a peer-to-peer barter market (i.e. people come in with their wares, and can trade with other individuals). The dataset records how many times each trader traded with each of all other traders who on the market (i.e. a symmetric n-by-n matrix of trade frequency counts). Here's a sample dataset (.CVS & .RData). I also have access to a similar symmetric n-by-n matrix with VALUES of items exchanged (which I suspect will be more telling).

Middlemen: I also need to be careful about 'middlemen'. These traders frequently serve as middlemen (duh), but who are not really 'familiar' with their counter-parties. I have a list of who these traders are (in my sample dataset, trader 5 is the only middlemen), and so hope to control for this. At least I hope their existence doesn't throw off index values for other traders. I suspect I should exclude middlemen from my matrix.

Example Clusters: If you look at my example dataset (see CSV or RData links above), it should be clear that traders t1, t2, t3, and t4 form a familiarity cluster (I'll call Cluster 1) and so they should all have values that suggest they are close to one-another, and distant from the t6, t7 and t8. Also, traders t6, t7, and t8 cluster into Cluster 2.

Also: As an added layer of nuance I hope to capture, note that t2 only trades with the middleman and with one counterparty in her cluster (Cluster 1). Thus I'd like to familiarity index calculated to suggest t2 is far "more familiar" with t1, t3 and t4 (folk in Cluster 1) than with t6, t7 and t8 (traders in Cluster 2).

The output I am aiming for is an n-by-n symmetric matrix of familiarity index values between each traders on the market.

I have no experience with graph theory or existing tools, but am happy to do what it takes to learn. I have this weird feeling that my question is perhaps a slight tweak on a well understood problem - one with lots of tools available to me. Thus I'm hoping there's a method you can recommend with an existing implementation.

My Background: I mostly program in R (I'd prefer to work with cran packages), but can work with Python and Java if that's going to be preferable.

Thanks for any advice you might have!

My Best Stab at a Familiarity Function (in R code)

FamiliarityIndex <- function(data){
  # Takes input dataset of symmetric n-by-n matrix of trade frequency counts
  # return a 'familiarity index'
  # How to interpret: how familair Trader COLUMN is with this Trader number ROW
  #    e.g., if col 4 row 1 is large, that's how familiar trader 4 is with trader 1 
  #            (it might differ for col 1 row 4!)

  #rescale counts
  newData = data
  for (i in 1:nrow(data)){
    colSum = sum(data[,i], na.rm=T)
    newData[,i] = data[,i] / colSum
  }

  # 
  data <- newData
  for (col in 1:ncol(data)){
    for (row in 1:ncol(data)){
      if (is.na(newData[row,col])){} else {
        newData[row,col] = newData[row,col] + 
                           sum(data[row,1:(ncol(data))] * data[,col], na.rm=T)

      }

    }
  }

  return(newData)

}

score 1 · Answer 1 · answered Dec 21 '13 at 22:06

1

What similarity measure would you use?

Once you have defined a similarity, you have a wide array of clustering algorithms available, such as single-link clustering, DBSCAN or affinity propagation.

answered Dec 21 '13 at 22:06

Has QUIT--Anony-Mousse

39,639
7
61
96

Could you perhaps expand on what you mean by a 'similarity measure'? (perhaps I misunderstand that term, but I thought that is what I had hoped to explain in my text and example?) – EconomiCurtis Dec 21 '13 at 22:45
https://en.wikipedia.org/wiki/Similarity_measure – Has QUIT--Anony-Mousse Dec 21 '13 at 22:52
I tried to add code that computationally gets at what I'm aiming for. Perhaps you have additional insight or suggestions you'd like to share. – EconomiCurtis Dec 22 '13 at 00:00
Well, I'm not good at reading R. The R syntax is awful. But once you've defined a similarity measure (if you make your matrix binary, Jaccard similarity might be an option, btw), you can try any distance or similarity based clustering methods next! – Has QUIT--Anony-Mousse Dec 22 '13 at 14:27
@EconomiCurtis, If you mean to program computation of a binary (such as Jaccard) measure this question might help: http://stats.stackexchange.com/q/49453/3277 – ttnphns Dec 22 '13 at 19:24

Graph Theory - Creating an Index of Familiarity, Given Trade Frequency Counts

1 Answers1