0

I would like to ask for your lights on the following questions (especially B).

Also, I am not sure for (A), since the sample is not normally distributed (should I ignore this, given its large size?)

Using only base R...

Get a random sample of 1000 from diamonds$carat (ggplot2 package).

A. Test 0: ^2 = 0.225 versus 1: ^2 ≠ 0.225, at significance level 4%.

B. If the true value of the population variance (for the distribution of carats) is 0.231, calculate the power of the control.

My attempt for (A):

set.seed(1)

sp_car = sample(diamonds$carat, 1000, replace = F)



alpha = 0.04
df = length(sp_car) - 1


#upper
qchisq(1 - alpha / 2, df)

#lower
qchisq(alpha / 2, df)


samp_var = var(sp_car)

var0 = 0.225


#T stat
Tst = (df) * samp_var / var0


Tst > qchisq(1 - alpha / 2, df)

Tst < qchisq(alpha / 2, df)

My attempt for B:

var1 = 0.231

# probability of rejecting H0
beta = sum(pchisq(var0 * qchisq(alpha / 2, df) / var1, df),
           1 -  pchisq(var0 * qchisq(1 - alpha / 2, df) / var1, df))

# power
1 - beta

Thank you for your time!

floraa
  • 45
  • 3
  • 1
    You should tell your readers you are sampling from a dataset of 53,940 values. The code `hist(replicate(5e2, var(sample(diamonds$carat, 1e3))))` (which takes less than a second to run) will give you the information you need to answer your questions. – whuber Dec 20 '21 at 01:51
  • @Onyambu Re "not readily available in R:" On the contrary, see the help page for `qchisq`. This is part of the basic `R` installation. Second, the pdf is not invertible when the degrees of freedom value exceeds $1,$ nor is it applicable in this context anyway. You probably meant to refer to the inverse cdf. – whuber Jan 02 '22 at 18:48
  • @Onyambu Yes, `qchisq` inverts the cdf, which is what is needed here. That's simply incorrect that you want to invert the pdf. As I pointed out, the pdf isn't even invertible. It is *not* involved in computing two-tailed p-values (unless you actually integrate it to compute the cdf!). – whuber Jan 02 '22 at 18:55
  • @whuber in the link provided above, the value 14.6489 was computed from the inverse pdf. ie both 14.6489 and 15.35667 have the same density value. Is that not the inverting the pdf? We also have 17 degrees of freedom. Isnt that the way to compute p-value for two sided chi-sq test? or is the link wrong? – KU99 Jan 02 '22 at 19:22
  • @Onyambu Could you indicate what "link provided above" refers to? I don't see any links in this thread. – whuber Jan 03 '22 at 02:18
  • @whuber here is the link https://stats.stackexchange.com/questions/195469/calculating-p-values-for-two-tail-test-for-population-variance – KU99 Jan 03 '22 at 02:42
  • 1
    @Onyambu That procedure is not inverting the pdf, although in a preliminary step it does indeed find an $x$ at which the pdf has a specified density (so I think I now understand why you might have characterized the test as you did). It is a test based on a shortest-length two-sided confidence interval. The actual p-value calculations invert the *cdf,* twice. – whuber Jan 03 '22 at 03:13
  • @whuber sorry to bother, but is there a link you could provide that can enable me learn how to do that? – KU99 Jan 03 '22 at 03:15
  • @Onyambu I'm not sure what you mean by "do that," because we're discussing a fairly complex situation involving a bunch of steps. But if I'm interpreting things correctly, I believe the crux of the matter is revealed in an illustration I made for a post at https://stats.stackexchange.com/a/127541/919. It shows the thinking that underlies the post you referenced. Although my explanation is applied to a Normal distribution, it works in exactly the same way for a chi-squared distribution. – whuber Jan 03 '22 at 03:19

0 Answers0