0

Suppose, the following training data describes heights, weights, and feet-lengths of various sexes

SEX         HEIGHT(feet)    WEIGHT (lbs)    FOOT-SIZE (inches)
male        6               180             12
male        5.92 (5'11")    190             11
male        5.58 (5'7")     170             12
male        5.92 (5'11")    165             10
female      5               100             6
female      5.5 (5'6")      150             8
female      5.42 (5'5")     130             7
female      5.75 (5'9")     150             9
trans       4               200             5
trans       4.10            150             8
trans       5.42            190             7
trans       5.50            150             9

Now, I want to test a person with the following properties (test data) to find his/her sex,

 HEIGHT(feet)   WEIGHT (lbs)    FOOT-SIZE (inches)
 4              150             12

Suppose, I am able to isolate only the male portion of the data and arrange it in a matrix,

$$ males = \begin{bmatrix} 6.0000 & 180.0000 & 12.0000 \\ 5.9200 & 190.0000 & 11.0000 \\ 5.5800 & 170.0000 & 12.0000 \\ 5.9200 & 165.0000 & 10.0000 \\ \end{bmatrix} $$

and, I want to find its Parzen Density Function against the following row matrix that represents same data of another person(male/female/transgender),

$$ dataPoint = \begin{bmatrix} 4 & 150 & 2 \end{bmatrix} $$

($dataPoint$ may have multiple rows.)

so that we can find how closely matches this data with those males.

In other words, my intention is to find its class through Bayes Classifier Algorithm.


my attempted solution:

$ variance(male) = \begin{bmatrix} 3.5033e-002 & 1.2292e+002 & 9.1667e-001 \end{bmatrix}$

$(males - dataPoint)^2 = \begin{bmatrix} 4.0000 & 900.0000 & 100.0000\\ 3.6864 & 1600.0000 & 81.0000\\ 2.4964 & 400.0000 & 100.0000\\ 3.6864 & 225.0000 & 64.0000 \end{bmatrix}$

$firstPart = \frac{1}{\sqrt{2 \cdot \pi \cdot variance\_of\_male}} = \begin{bmatrix} 2.131421 & 0.035984 & 0.416682\end{bmatrix}$

$secondPart = e^{\frac{-(males - dataPoint)^2}{2 \cdot variance\_of\_male}} = \frac{-\begin{bmatrix} 4.0000 & 900.0000 & 100.0000\\ 3.6864 & 1600.0000 & 81.0000\\ 2.4964 & 400.0000 & 100.0000\\ 3.6864 & 225.0000 & 64.0000 \end{bmatrix}}{\begin{bmatrix}7.0067e-002 & 2.4583e+002 & 1.8333e+000\end{bmatrix}} = !?!?$

$parzen\_density = mean(firstPart \cdot secondPart) = ???$


(1) I am unable to calculate the $secondPart$ because of the dimentional mismatch of the matrices. How can I fix this?

(2) Is this approach correct?

user366312
  • 1,464
  • 3
  • 14
  • 34
  • Your title seems to mismatch your question. Your title suggests that you are interested in [multivariate kernel density estimation](https://en.wikipedia.org/wiki/Multivariate_kernel_density_estimation), but your question suggests that you want to do some kind of classification. Please edit and clarify what exactly is your data and what do you want to achieve. It is hard to comment if the solution is correct if the problem is unknown. Moreover, what do you mean by estimating KDE "against row of a matrix"? – Tim Nov 06 '16 at 15:49
  • @Tim, done editing. Hope that helps. – user366312 Nov 06 '16 at 16:02
  • Two issues seem to prevail in the many questions you have been asking about KDE and both are at the forefront of this one. One concerns understanding mathematical notation. Our ability to help with that on this site is limited: studying a good introductory textbook on matrix algebra would be a good next step for you. The other concerns developing an intuition for what a KDE actually is doing. To that end, you might get something out of my post at http://gis.stackexchange.com/a/14376/664, which depicts various two-dimensional KDEs in two different ways. – whuber Nov 07 '16 at 14:59

1 Answers1

2

Multivariate kernel density estimation is a generalization of univariate kernel density estimation. If in univariate case we define kernel density estimator as

$$ \hat{f_h}(x) = \frac{1}{nh} \sum_{i=1}^n K\Big(\frac{x-x_i}{h}\Big) $$

then in multivariate case we can simply use product of univariate kernels (i.e assume independence of the variables)

$$ \hat{f_h}(\mathbf{x}) = \frac{1}{n} \sum_{i=1}^n \prod_{j=1}^d \frac{1}{h_j} K\Big(\frac{x_i-\mathbf{X}_{ij}}{h_j}\Big) $$

where $\mathbf{x} = (x_1,\dots,x_d)$ and $\mathbf{X}$ is a $n\times d$ matrix, or use a multivariate kernel

$$ \hat{f_h}(\mathbf{x}) = \frac{1}{n} \sum_{i=1}^n \frac{1}{\det(\mathbf{H})} K\Big(\mathbf{H}^{-1}(\mathbf{x}-\mathbf{X}_i)\Big) $$

where $\mathbf{H}$ is a bandwidth matrix. So if you choose to use product of univariate kernels, then this does not really differ from what you would be doing in univariate case. When choosing a multivariate kernel you need a "special" kernel, but the one that is most commonly used is a multivariate normal distribution parametrized by a vector of means $\mathbf{X}_i = (\mathbf{X}_{i1},\dots,\mathbf{X}_{id})$ and covariance matrix $\mathbf{H}$. If we denote the multivariate normal probability density function as $\phi$, the equation becomes

$$ \hat{f_h}(\mathbf{x}) = \frac{1}{n} \sum_{i=1}^n \frac{1}{\det(\mathbf{H})} \phi\Big(\mathbf{x};\mathbf{X}_i,\mathbf{H}\Big) $$

The most simple choice of $\mathbf{H}$ is a rescaled empirical covariance matrix $\Big(h \hat {\mathbf{\Sigma}}^{1/2}\Big)^2$ , where $h$ can be choosen, for example, using Scott's rule of thumb $h_\text{opt} = n^{-1/(d+4)}$.

More details are given in multiple handbooks that deal with this topic and the Wikipedia article on multivariate kernel density estimation, you can check also those slides by Hardle Muller and Sperlich Werwarz.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • thanks. But, that was mostly theory. Could you please, shed some light on my specific example ? That would have been easier for me to understand. – user366312 Nov 07 '16 at 12:37
  • @anonymous What exactly is unclear for you? Code example would be something like: `for (i in 1:n) { est – Tim Nov 07 '16 at 13:14