I am a vegetation ecologist and poor student of computer science who recently learned of the Wasserstein metric. Application of this metric to 1d distributions I find fairly intuitive, and inspection of the wasserstein1d function from transport package in R helped me to understand its computation, with the following line most critical to my understanding:
mean(abs(sort(b) - sort(a))^p)^(1/p)
In the case where the two vectors a and b are of unequal length, it appears that this function interpolates, inserting values within each vector, which are duplicates of the source data until the lengths are equal.
My question has to do with extending the Wasserstein metric to n-dimensional distributions. With the following 7d example dataset generated in R:
d <- 7
obs <- 100
d7a <- matrix(nrow = obs, ncol = d, data = 0)
d7b <- matrix(nrow = obs, ncol = d, data = 0)
set.seed(123)
for(i in 1:7){
d7a[,i] <- rnorm(obs)
d7b[,i] <- rnorm(obs)
}
wassersteindNd(d7a, d7b) #fictitious function here
Is it possible to compute this distance, and are there packages available in R or python that do this?