0

A cake is specified by a 3-vector $x=(x_1,x_2,x_3)$, where $x_i$ is the percentage of the $i$-th ingredient, and therefore $\sum_ix_i=1$. I have a price list $y$ for different $x$'s:

> y             x
> 71     (0.1, 0.2, 0.7)
> 73     (0.0, 0.3, 0.7)
> 76     (0.0, 0.4, 0.6)
> 79     (0.1, 0.6, 0.3)
> ...

I computed the weighted mean and standard deviation for each ingredient $x_i$ with these formulae. $$\bar y_1=76\quad \sigma_1=6.9\\ \bar y_2=76\quad \sigma_2=7.1\\ \bar y_3=74\quad \sigma_3=7.8$$

To me, these ingredients do not affect the cake price too differently, because $\bar y_1-\bar y_3=2$, which is relatively small given that both $\sigma_1$ and $\sigma_3$ are around $7$! Ditto for $x_2$ and $x_3$.

However, when I fit a simply GLM $y=\beta_0+\beta_1x_1+\beta_2x_2$ and test if $\beta_1=\beta_2=\beta_3$, $p=8\times10^{-7}$ telling me these ingredients affect the cake price very differently! This is a bit counter-intuitive to me.

How may I reconcile my intuitions with the test results?

Sibbs Gambling
  • 2,208
  • 5
  • 20
  • 42
  • As $n$ increases the standard error of the difference decreases, even though the population sd of the data is unchanged. See the many posts explaining the difference between statistical significance and a substantive effect size; equivalently the difference between statistical significance and practical significance. There can be a world of difference between being able to distinguish your observed results from what can be generated by "no population difference + random variation" and a difference you might actually *care* about. ... ctd – Glen_b May 12 '16 at 04:56
  • 1
    ctd ... Large sample sizes will give small p-values to tiny effects. If you don't care about tiny effects you generally have no business doing ordinary hypothesis tests (you might want to look at effect sizes or perhaps examine CIs for some effect or you might consider equivalence tests ... or a number of other tools). If on the other hand, if even a small difference matters (if only you had the sample size to find it), then the usual hypothesis tests are probably the thing you want. – Glen_b May 12 '16 at 05:50
  • What is the theoretical basis for your calculations? Why should your weighted mean approach give you any information at all about the $\beta_i$? – whuber May 12 '16 at 13:49
  • @whuber The weighted mean gives me a sense of "how much the cake will cost if it is 100% made of one ingredient." So I go from "the $\bar y$'s are not so different given large variances" to "I would not expect ingredients affect price", then to "I would expect the null hypothesis $\beta_1=\beta_2=\beta_3$ to be true." – Sibbs Gambling May 27 '16 at 05:08
  • Your formulas do not appear justifiable in terms of statistical or mathematical principles. Your "sense of" does not constitute theory. In the absence of any theoretical formulation, the question seems to come down to "my intuition tells me to use this formula that differs from procedures known to work well. What's wrong with my intuition?" Answering that would be matter for psychology to consider, but it does not have any statistical content or interest. – whuber May 27 '16 at 14:24

0 Answers0