1

For instance I have a hypothetical data like the one below

Month # Households  total revenue   Average revenue

Jan-15  113 51791   458.3274336
Feb-15  196 43819   223.5663265
Mar-15  207 85322   412.1835749
Apr-15  348 95057   273.1522989
May-15  152 18265   120.1644737
Jun-15  155 42235   272.483871
Jul-15  198 12005   60.63131313
Aug-15  246 44688   181.6585366
Sep-15  299 51006   170.5886288
Oct-15  197 54446   276.3756345
Nov-15  239 58685   245.5439331
Dec-15  326 33685   103.3282209
Jan-16  179 85471   477.4916201
Feb-16  137 33720   246.1313869
Mar-16  163 68143   418.0552147
Total   3155    778338  3939.682467

Now, the CLT suggests that because the sample size( # hh) is sufficiently high, the average revenue will have approx a normal distribution.

but, can we say that about the distribution of the overall average revenue? In that case, our sample size (which I think would now be the number of months) isn't sufficiently high so can we not approximate the distribution of average overall revenue to a normal distribution and use t-test kind of tests?

  • @Tim Yes, every large sample is not normally distributed. I agree to you but I'm not suggesting it. What I'm saying is that if a sample is sufficiently large then the average of that sample will have a rough, bell shaped distribution. And, I take this a step forward to ask will the overall average (which perhaps have months as the observation points) will be following a roughly normal distribution. – Sahil Talwar Jul 25 '16 at 08:42

1 Answers1

0

CLT states gives you a distribution for sum/average of random variables when the number of random variables goes to infinity. However additional assumptions must be met, depending on the version of CLT you can use.

Lindeberg-Levy CLT requires independent and identically distributed samples with finite expected value and variance. Other versions like Lyapunov CLT requires only independent samples and some criteria, but it does not need identical samples.

One of the key assumptions is independence which is maintained in all versions of CLT. I think is reasonable to suspect some non zero covariance between values, as they depend on time.

rapaio
  • 6,394
  • 25
  • 45
  • So, in context of the above data suppose that the conditions are met then the sample mean for each month will have roughly a normal distribution, right ? And can we say that for the distribution of the total average (As in the sum total/total hh #) ? – Sahil Talwar Jul 25 '16 at 08:44
  • Suppose you have a business which grows, so it's monthly numbers will have a growth tendency. So your variables depend on time, they are not independent. I think you can't use CLT to assume a normal distribution there. – rapaio Jul 25 '16 at 08:55
  • so I cannot use a t-test or anova for analysis of this? What other measure can I use check hypothesis on this? Taking one more step, if this is my treatment group of households which for example I sent a promotional E-Mail and I have data for a control group as well for the same months, then given that my data is spanning over time how can I check if these two are similar or different (i.e, the campaign being successful ?) – Sahil Talwar Jul 25 '16 at 09:09