2

I have a dataset ($300 \times 14$ matrix). This means it has 14 features and 300 observations.

$n=14$ $$ \begin{pmatrix} a_{11} & 0 & \ldots & a_{1n}\\ 0 & a_{22} & \ldots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 &\ldots & a_{300,14} \end{pmatrix} $$

Now, is it unreasonable if I estimate the density (p.d.f) of each row of this matrix $$\begin{pmatrix} a_{11} & 0 & \ldots & a_{1n}\end{pmatrix} $$separately using kernel density? Or should I calculate the p.d.f of the whole matrix? In other words, it means I want to estimate the density of each multivariate vector (a vector which contains values of readings of 14 sensors) separately. When we have multivariate data, should we use estimator for the whole data?

Waldir Leoncio
  • 2,137
  • 6
  • 28
  • 42
Arkan
  • 133
  • 3
  • Could you go into more details? In general, yes, for multivariate data we use multivariate kernel density estimation. – Tim Apr 26 '17 at 11:21
  • Separately for each coordinate would not work! You must do it multivariate. You could also look into using saddlepoint approximation based on the empirical moment generating function, see https://stats.stackexchange.com/questions/191492/how-does-saddlepoint-approximation-work/192380#192380 – kjetil b halvorsen Apr 26 '17 at 12:18
  • Thanks, Tim. I have 14 sensors in a plant in which measure the same value but in different sections. Each sensor has a number and their value in the vector is based on that order (sensor 1 value,...sensor 14 value). Now I have values of these sensors for one day (300*14). Now, I want to estimate the density of a vector which has an outlier in itself, is different from others. Now, estimating the density of an each vector is reasonable or usually one estimate a matrix (whole dataset). – Arkan Apr 26 '17 at 13:01
  • Thanks kjetil. I started reading the provided link but please let me know your thought considering the above description. – Arkan Apr 26 '17 at 13:02

0 Answers0