Set Up
I'm hoping to create an "index of familiarity" between traders on a barter market.
I have data from a peer-to-peer barter market (i.e. people come in with their wares, and can trade with other individuals). The dataset records how many times each trader traded with each of all other traders who on the market (i.e. a symmetric n-by-n matrix of trade frequency counts). Here's a sample dataset (.CVS & .RData). I also have access to a similar symmetric n-by-n matrix with VALUES of items exchanged (which I suspect will be more telling).
Middlemen: I also need to be careful about 'middlemen'. These traders frequently serve as middlemen (duh), but who are not really 'familiar' with their counter-parties. I have a list of who these traders are (in my sample dataset, trader 5 is the only middlemen), and so hope to control for this. At least I hope their existence doesn't throw off index values for other traders. I suspect I should exclude middlemen from my matrix.
Example Clusters: If you look at my example dataset (see CSV or RData links above), it should be clear that traders t1, t2, t3, and t4 form a familiarity cluster (I'll call Cluster 1) and so they should all have values that suggest they are close to one-another, and distant from the t6, t7 and t8. Also, traders t6, t7, and t8 cluster into Cluster 2.
Also: As an added layer of nuance I hope to capture, note that t2 only trades with the middleman and with one counterparty in her cluster (Cluster 1). Thus I'd like to familiarity index calculated to suggest t2 is far "more familiar" with t1, t3 and t4 (folk in Cluster 1) than with t6, t7 and t8 (traders in Cluster 2).
The output I am aiming for is an n-by-n symmetric matrix of familiarity index values between each traders on the market.
I have no experience with graph theory or existing tools, but am happy to do what it takes to learn. I have this weird feeling that my question is perhaps a slight tweak on a well understood problem - one with lots of tools available to me. Thus I'm hoping there's a method you can recommend with an existing implementation.
My Background: I mostly program in R (I'd prefer to work with cran packages), but can work with Python and Java if that's going to be preferable.
Thanks for any advice you might have!
My Best Stab at a Familiarity Function (in R code)
FamiliarityIndex <- function(data){
# Takes input dataset of symmetric n-by-n matrix of trade frequency counts
# return a 'familiarity index'
# How to interpret: how familair Trader COLUMN is with this Trader number ROW
# e.g., if col 4 row 1 is large, that's how familiar trader 4 is with trader 1
# (it might differ for col 1 row 4!)
#rescale counts
newData = data
for (i in 1:nrow(data)){
colSum = sum(data[,i], na.rm=T)
newData[,i] = data[,i] / colSum
}
#
data <- newData
for (col in 1:ncol(data)){
for (row in 1:ncol(data)){
if (is.na(newData[row,col])){} else {
newData[row,col] = newData[row,col] +
sum(data[row,1:(ncol(data))] * data[,col], na.rm=T)
}
}
}
return(newData)
}