0

This question might be a bit naive.

According to theory the mean value of a r.v. is the sum of the value times the pdf.

I try to test this in R. I am using the following code:

x<- rnorm(10000, mean=3, sd=1)
hx<-dnorm(x)
mean(x) ## this gives me a value very close to 3 as expected
sum(x*hx)/10000  ## this gives me 0.04 

Why i don't get close to 3 when i run the last line of code? Am I missing something?

Thank you!

Arg
  • 85
  • 5
  • 3
    The theory asserts the mean of a random variable with pdf $f$ is given by *integrating,* not summing, $f(x)dx.$ Notice that the crucial infinitesimal element $dx$ is not present in any sum. – whuber May 13 '19 at 21:34
  • @whuber I was going to counter that the divide by 10000 represents dx, but this isn't as clear when x is randomly drawn. () – Kitter Catter May 13 '19 at 21:51
  • 3
    @Kitter That's because $1/10000$ is not $dx.$ That would only be the case when drawing randomly from a uniform distribution of width $1.$ The operation performed here is simply not an integration, nor is it even an approximation to a multiple of the integral. – whuber May 13 '19 at 21:59

2 Answers2

2

I'm not sure that the process you outline here actually represents what you want it to

    x <- rnorm(10000, mean=3, sd=1)
    mean(x)         # should be good with a mean of 3
    hx <- dnorm(x)  # this is the pdf for Normal distribution mean = 0, sd = 1 for each x
                    # note that these x are weighted by the normal distribution already
    sum(x*hx)/10000 # gives the average value of x*pdf(x) which isn't what you wanted

You probably want something closer to

   sd <- 1 # defining some parameters
   mean <- 3
   dx <- sd/100                         # dx << sd
   x  <- seq(mean-9*sd,mean+9*sd, dx)   # Integrate x around 9 sds probably overkill
   hx <- dnorm(x, 3, sd)                # for each x calculate pdf
   sum(x*hx*dx)                         # sum x*hx*dx to approximate integral

Just to add one more bit:

What you calculated is $$<\frac{1}{\sqrt{2\pi}}x e^{-x^2/2}> = \frac{3}{4 e^{9/4} \sqrt{\pi }} \approx 0.0446$$

Kitter Catter
  • 671
  • 4
  • 17
0

There are some mistakes. First the second line needs to have the mean and sd as the distribution used for x:

hx<-dnorm(x, mean = 3, sd =1)

Then the expected value is the sum of all possible values times their density : $ \int_{i=-\infty}^\infty xf(x)dx$. In your computations, you used demand realizations instead of the range of all possible values of x:

x_span = seq(-100, 100,1) 
sum(x_span * dnorm(x_span, mean = 3, sd =1))

The last line gives you the mean 3.

Chris
  • 139
  • 8
  • 2
    It's only an accident that you obtained a value of $3$ in your example. Setting `sd=0.1` instead--which if I read your answer as intended, *still* should produce a value of $3$--instead produces a value closer to $12.$ Something's wrong about that. For an analysis of what's going on, see procedure (8) in my post at https://stats.stackexchange.com/a/117711/919. – whuber May 13 '19 at 23:12
  • 1
    Well, it is not an accident. We are evaluating a continuous function at discrete points. Sure, it does not work the same way for sd=0.1, as the discretization is to coarse in this case. See the edit in Kitter Catter's answer who included the dx in this case. I just figured it would be enough to illustrate the idea. – Chris May 14 '19 at 01:51