1

I get a little bit confused by the conclusions I can take or not with these small samples.

I have been measuring the degradation of a pollutant for 9 days. I measured the remaining concentration and only took triplicates each day, so that gives me a very small sample size. I want to check at which point there is no significant difference between 2 days anymore. I first thought of anova followed by Tukey.

But then when i checked for normality, I did the Shapiro test for normality and my data didn't pass the test. That doesn't allow me to say my samples don't come from a normal distributed population, does it? What does it tell me then? And would the qqplot and histogram tell me more or is that also not very relevant with small samples?

And does that exclude me using ANOVA? Would a permutation test make sense? but I read the equivalent of Anova requires a lot of calculation time and that it wasn't used a lot..

I would be grateful if someone could clarify the situation a little bit ...

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
gips
  • 125
  • 1
  • 1
  • 4
  • Related: http://stats.stackexchange.com/questions/2492/normality-testing-essentially-useless – nico Apr 02 '12 at 15:23
  • 5
    (1) Pollutant concentrations in natural media--air, soils, surface water, and groundwater--tend not to be normally distributed. A good point of departure in many circumstances is to assume their *logarithms* have normal distributions. (2) ANOVA etc. ignore the time-series nature of the data. You need specialized tests here. One of the simplest (yet powerful) approaches is regression of the (log) data against time, preferably using a model of degradation suggested by theory. – whuber Apr 02 '12 at 15:35
  • can you post some example data? – Abe Apr 02 '12 at 23:15
  • Whuber, thank you for your answer. Do you think you could give me a source where I can find this information (distribution of pollutant concentrations in natural media)? – gips Apr 12 '12 at 10:23
  • If the pollutant degrades over time, is there any real point at which the concentration stops decreasing? Would you rather fit a nonlinear or curvilinear model that models the concentration over time? You might be able to determine if the concentration follows a first-order reaction (or 2nd or 0th order), and you might be able to determine a half-life time for the pollutant for your case. – Sal Mangiafico Aug 09 '18 at 13:10

1 Answers1

2

Partially answered in comments:

For the question about normality testing, see tps://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless. You seem to be confused, "But then when i checked for normality, I did the Shapiro test for normality and my data didn't pass the test. That doesn't allow me to say my samples don't come from a normal distributed population, does it?" Well, you rejected normality, so then you can conclude your data is not normal.

Pollutant concentrations in natural media--air, soils, surface water, and groundwater--tend not to be normally distributed. A good point of departure in many circumstances is to assume their logarithms have normal distributions. (2) ANOVA etc. ignore the time-series nature of the data. You need specialized tests here. One of the simplest (yet powerful) approaches is regression of the (log) data against time, preferably using a model of degradation suggested by theory. – whuber

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467