1

The medoid function is defined in this graph neural network paper as: $$ t := \arg\min_{y\in \mathcal{X}}\sum_{j=1}^N||x_j-y||$$ which is a "multivariate generalization of the Median" and $\mathcal{X}=\{x_1,x_2,...,x_n\}$ is a collection of points in $\mathbb{R}^d$.

How does this relate to the median in higher dimensions? In $\mathbb{R}^1$, the median uses an ordered list of data-points and finds the central point, what does it mean to find the median in $\mathbb{R}^2$ and beyond, and how does this function do it?

I've never seen this function before, so i wonder if it has it's drawbacks and if there are better formulas to use? Furthermore, Wikipedia states the medoid "is not equivalent to a median" so is this paper wrong?

Jia
  • 31
  • 2

1 Answers1

1

In one dimension, the median minimises the sum of absolute values of distances, $$\textrm{median} = \arg\min_m \sum_{j=1}^n |x_i-m|$$ and thus is a medoid. In that sense it's a generalisation of the median, and it's a reasonable location summary. As Wikipedia goes on to say, medoids aren't equivalent to medians in general, because medians are really a one-dimensional concept.

There are multiple generalisations of the median to higher dimensions, all trying to preserve some properties of the median, and they are all different. A geometric median is like a medoid except it doesn't have to be one of the sample points; this paper reviews some other definitions of higher-dimensional medians

Thomas Lumley
  • 21,784
  • 1
  • 22
  • 73
  • OK and in $\mathbb{R}^2$, the median would just be finding the minimum value of $m_j=(a,b)$ of the euclidean distance? i.e. $\arg \min_m \sum_{i=1}^N \sqrt{(x_{i1}-m_1)^2+ (x_{i2}-m_2)^2}$? – Jia Jul 31 '21 at 18:08
  • That would be another possible generalisation, and it's called the "geometric median". But the minimum over all $(m_1, m_2)$ is not (in general) one of the data points. – Thomas Lumley Aug 01 '21 at 03:23