2

I have some confusion related to how the density is estimated from the histogram. I have attached the screenshot of the paper as well. Any insights

enter image description here

I didn't get why you divide it into cubes and why is $N=(1/h)^d$ and the formula of the density estimator how did it come?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
user34790
  • 6,049
  • 6
  • 42
  • 64
  • For $d=2$ this question is answered at http://stats.stackexchange.com/questions/24568. (The statement is strangely stated. What it means is that if you divide each of the $d$ sides of the cube into, say, $m$ equal intervals of length $h=1/m$, then--by elementary geometry--the cube itself will be composed of $N = m^d = (1/h)^d$ little cubelets. (This will not be the case for arbitrary values of $h$, which is why the statement is so backwards.) The definition of $\hat{\pi_j}$ is bad because it refers to undefined symbols $n$ and $X_i$: presumably $n$ is the count of the data $(X_i)$, right?) – whuber Apr 10 '13 at 17:51
  • @whuber. I didn't get it my data S is in d dimensional space. But why am I using cubes to bin them? Am I misinterpreting it? Also can you tell me how the formula for the density is derived? – user34790 Apr 10 '13 at 17:56
  • You're dividing the d-dimensional space into small d-hyper-cube regions so you can count how many elements are in each, in order to find out how dense the data is in each little region, as an estimate of the density from which the sample was drawn. – Glen_b Apr 10 '13 at 23:39
  • @whuber are u angry with me. Please let me know – user34790 Apr 11 '13 at 02:25
  • No, I am not angry. If you would be so kind as to indicate what parts or aspects of my comment suggested anger, I would be grateful to know, because that would show me where I might be miscommunicating and help me improve my messages in the future. – whuber Apr 11 '13 at 03:24
  • @whuber I got no sense of any problem with your comment which seemed matter of fact; sometimes these things are hard to gauge in text. – Glen_b Apr 11 '13 at 07:43

1 Answers1

1

Your question is a bit unclear, since you just showed us a copy from some text, without explaining symbols. But it seems clear that $n$ is the total number of observations of some $d$-dimensional vector in the cube $[0,1]^d$.

Why you divide it into cubes?

Presumably because you are calculating a histogram-based density estimate. When you do that in one dimension, you divide the interval into subintervals, and the analog of that in $d$-dim is to divide the cube into cubelets. Then counting the number of

Why is $N=(1/h)^d$ and the formula of the density estimator how did it come?

The first part was explained in comments: Assuming that $h$ comes from dividing the side 1 into equal-length subintervals, then $N$ is just the number of cubelets. $\hat{\pi}_j$ is just the number of observations in cubelet $j$, divided into $n$. Then the formula for the histogram density estimator comes from:

  1. Assuming the density estimate should be constant in each cubelet.
  2. Letting that constant value be proportional to the number of observations falling in the cubelet (reasonable since all have the same volume.)
  3. Using that the integral of a density should be 1, so that $$ \idotsint_{[0,1]^d} \hat{p}(x) \; dx = \sum\dotsi\sum \frac{\hat{\pi}_j}{h^d}\cdot h^d =1 $$

where the sum is over the cubelets. That should explain the formula.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467