Statistical data analysis - data comparison

Question

I'm currently facing some issue in determining what statistical test might be best to adopt to determine whether or not my three datasets are significantly different from each other.

I have the following datasets of probabilities observed within my Experiment:

prob355 = [0.06672, 0.07473, 0.14813, 0.25516];

prob532 = [0.07226, 0.08718, 0.09631, 0.14276];

prob1064 = [0.05979, 0.09020, 0.10891, 0.11307];

Within these datasets, each entry corresponds to a probability obtained under certain experimental conditions such that within the dataset the conditions for prob355(1) ≠ prob355(2) prob355(3) ≠ prob355(4). Between the datasets they have been ordered such that prob355(1) = prob532(1) = prob1064(1).

These datasets are accompanied by the following errors/standarddeviation/noise in the observed probabilities:

model_error_355 = [0.03255, 0.03986, 0.01321, 0.01928];

model_error_532 = [0.01370, 0.00880, 0.00856, 0.04257];

model_error_1064 = [0.03150, 0.02441, 0.02307, 0.02059];

Within these datasets, each entry corresponds to the same entry as in the prob355, prob532 or prob1064.

Basically, what I'm looking for is to compare each of the overall datasets (whilst taking the standard deviation in each value into account) amongst each other and determine if overall these datasets are statistically different from each other. Or if possible I would like to find out which of the values are significantly different from each other.

The goal of this comparison for the overall datasets would be prove the hypothesis that these values are not significantly different from one another. Secondly, the goal would be to determine if for each of the individual conditions (eg. prob355(1) = prob532(1) = prob1064(1)) the values are significantly different or not.

Currently, I'm using Matlab to run my analysis but I'm not bound to this software to run this analysis as long as the alternative is open acces :)

So, If anyone knows a good way of making this comparison I would be very happy to hear it.

Thanks!

@Dave prob are probability values observed within the system and model_error are the errors determined in the observed probabilities. — Vincent, Mar 30 '21 at 14:21
@gung-ReinstateMonica I've ruled out Anova as it doesn't take into account the errors observed within my datasets and its underlying assumptions (eg random independent error, independent and normally distributed, with zero mean and constant variance,) do not seem to correspond to my observations. Same for the students t-test (independent random samples from normal distributions with equal means and equal but unknown variances). — Vincent, Mar 30 '21 at 14:27
I think your situation remains too unclear to be answerable. If your data re not independent, you need to explain that. I'm not sure what you mean by the errors, but if there are different amounts of noise associated with the different points, ANOVA can handle that. In addition, you cannot "prove the hypothesis that these values are not significantly different from one another". The whole situation seems conceptually muddled to me. — gung - Reinstate Monica, Mar 30 '21 at 14:52
@gung-ReinstateMonica Basically a better way of wording it would be to say my goal is prove the null hypothesis prob355 = prob532 = prob1064 and thereby reject the alternative hypothesis prob355 ≠ prob532 ≠ prob1064. Correct, by errors I mean the standard deviation/noise observed within the probabilities listed in prob355, prob532 and prob1064. How would Anova be able to handle this? — Vincent, Mar 30 '21 at 16:20
[Why do statisticians say a non-significant result means “you can't reject the null” as opposed to accepting the null hypothesis?](https://stats.stackexchange.com/a/85914/7290) — gung - Reinstate Monica, Mar 30 '21 at 16:25
What do you mean by "standard deviation/noise observed within the probabilities"? Do you have SDs for single points? — gung - Reinstate Monica, Mar 30 '21 at 16:26
@gung-ReinstateMonica Yes, as each of these points correspond to a probability observed in a system and corrected for certain paramaters affecting it they are all accompanied by their own standard deviation. — Vincent, Mar 30 '21 at 16:31
Probabilities cannot be observed. See https://stats.stackexchange.com/questions/1525/whats-the-difference-between-a-probability-and-a-proportion/4850#4850. It appears you might be *estimating* them based on some observations. The details are crucial because they provide information about the precision of those estimates. You will need to provide them in order to have an answerable question. — whuber, Mar 30 '21 at 17:37

BruceET · Answer 1 · 2021-03-30T14:03:36.843

The source of the data and the objectives of data analysis are not clear.

If you have four observations at random from each of three groups, and you want to know if the group population means differ, then you might consider a one-factor ANOVA with three levels and four replications per level.

x1 = prob355 =  c(0.06672, 0.07473, 0.14813, 0.25516) 
x2 = prob532 =  c(0.07226, 0.08718, 0.09631, 0.14276)
x3 = prob1064 = c(0.05979, 0.09020, 0.10891, 0.11307)

However, prob355 is much more variable than the other two groups. So I would be reluctant to use a standard ANOVA.

x = c(x1, x2, x3)
g = rep(1:3, each=4)
stripchart(x ~ g, ylim=c(.5, 3.5))

In R, the procedure oneway.test does not require equal group variances, but this procedure finds no significant differences among the means of the three levels.

x = c(x1, x2, x3)
g = rep(1:3, each=4)
oneway.test(x ~ g)

        One-way analysis of means 
        (not assuming equal variances)

data:  x and g
F = 0.41913, num df = 2.0000, denom df = 5.4035, p-value = 0.6773

Thanks for the response! I've edited and updated my problem statement now to hopefully make it clearer. Does this make the objectives and the goals of the analysis clearer and does the offered solution change? Thanks in advance, Vincent — Vincent, Mar 30 '21 at 14:19

Statistical data analysis - data comparison

1 Answers1

Related