Is this the correct way to undertake Z-Tests on average user ratings to compare different versions of products?

Question

The below is a partial data set showing the mean user ratings for a number of products, each of which is available in a number of standard versions, e.g. a common feature added to each product such as adding electric windows to a car.

Each mean rating is comprised of circa 17 user ratings, though some are more. The user ratings were submitted on a scale from -100 to +100. N/A indicates a product which is not available in that particular version. I understand from this question I asked previously that I should not replace the N/A values with a 0 or an average.

The data is continuous interval type data.

             Control   V1        V2        V3        V4        V5        V6
Product 1    1.63     -5.19     -0.48     5.79      8.89      4.19      15.73
Product 2    0.60     0.84      4.47      N/A       0.52      21.17     N/A
Product 3    4.53     -15.20    -19.66    N/A       2.84      N/A       13.07
Product 4    7.30     17.53     20.25     17.04     N/A       4.60      9.28
Product 5    -4.05    -21.33    -14.00    -13.00    N/A       -23.71    -8.71
Product 6    26.27    14.53     N/A       21.24     N/A       27.25     35.18
Product 7    -3.12    N/A       N/A       N/A       N/A       7.88      17.38

Mean Ratings 4.74     -1.47     -1.88     7.77      4.08      6.90      13.66

I want to compare the effect of the different standard versions compared to the control.

So, I think I should be using two tailed Z-Tests so I can see how far above or below the control mean the version mean or each of its individual products is. Here is my reasoning:

The vast majority of user ratings that make up my means are normally distributed. I checked using Kolmogorov-Smirnov and Shapiro-Wilk scores.
- I checked everything that failed either of the tests with Q-Q plots and they are approximately normally distributed.
My population is > 30
- While each individual mean is circa 15 user rateings the total of the control is circa 105 and V1 is 105, I think this is correct.
I can derive the Standard Deviation for
- The individual mean scores
- The combined mean scores of the Control and V1 shown below

I also received an affirmative reply to a recent question I posted on whether or not a Z test was appropriate to this situation.

My Hypothesis

H0 - Version 1 will have no effect on the mean user rating
- The mean will not be significantly different to the control
Ha - Version 1 will have an effect on the mean user rating
- The mean will be significantly different to the control

I need to test this claim using alpha 0.05 or +1.96 to -1.96

Z-Test

I first took the mean ratings for each of the versions (from above)

Mean Ratings 4.74     -1.47     -1.88     7.77      4.08      6.90      13.66

And used SPSS to calculate Z-Scores for each

Control   = -.13054
Version 1 = -1.42611
Version 2 = -.72721
Version 3 = .50160
Version 4 = -.26823
Version 5 = .32009
Version 6 = 1.73041

This tells me that none of the common versions had a significant effect on the mean user ratings for the products.

My questions are:

Are Z tests suitable for this sort of data analysis? (Since answered Yes)

And if yes is my reasoning and my attempts correct?

If anyone could point out any glaringly obvious mistakes or problems with my method it would be much appreciated

As always, any help is much appreciated.

EDIT:

I suspect that one issue is that I am including my control mean in my Z-Test calculation, which is skewing the results. But I am not sure how to undertake such a test when I am comparing it to a known mean...?

Edit 2:

In response to David Cs answer I am able to calculate the variance for each mean rating e.g. the Control = 105.999

Descriptive Statistics                      
          N   Minimum   Maximum   Mean      Std. Deviation    Variance
Control   7   -4.05     26.27     4.7371    10.29558          105.999
Valid N (listwise)  7

David C · Answer 1 · 2016-08-15T16:43:38.987

1

Instead of using a Z-Test, you should use the T-Test for difference between means. You should not use the Z-Test because you do not know the true variances for groups Control, V1, V2,...,V6. The T-Test uses sample estimates for variances:

Z-Statistic: $$Z = \frac{(\bar{X}_2 - \bar{X_1})}{\sqrt{\frac{\sigma^2_1}{n_1}+\frac{\sigma^2_2}{n_2}}}$$ T-Statistic: $$T = \frac{(\bar{X}_2 - \bar{X_1})}{\sqrt{\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}}}$$

The two key differences are:

$\sigma^2_k$ – true population variance for group k

$s_k^2$ –$\frac{\sum^{n_k} (X_{ki} - \bar{X_k})^2}{n_k-1} $ sample variance from group k

Once you have computed your T-test statistic, you can compute the p-value here with the Student's T-Distribution with $$df = \min\big(n_1 - 1, n_2 - 1\big) $$

Or this can be done in R with the following commands:

t.test(Control,V1) 
t.test(Control,V2) 
t.test(Control,V3) 
t.test(Control,V4) 
t.test(Control,V5) 
t.test(Control,V6)

edited Aug 15 '16 at 16:43

answered Aug 15 '16 at 13:23

David C

399
3
7

Thanks David, but I still think that a Z test is the most appropriate. Firstly because the sample population of each of my means is so high, circa 105 and secondly because I know the SD for each. I can also calculate the variance for each group, see Edit 2 above. However because I was not sure myself I also asked this question previously, the answer of which agrees that a Z test is the most appropriate. http://stats.stackexchange.com/a/229803/39684 Thanks for the help though, I will keep T-Tests for difference between means in the back pocket incase the Z-Tests do not work out – Deepend Aug 17 '16 at 15:25
@Deepend A few things: (1) You ought to know the difference between the "true variance" and the "sample variance" of a population. The "true variance" is the **intrinsic** level of variance throughout the entire population, while the "sample variance" is the variance calculated from just your sample. This is an important distinction to make when deciding between the T-Test and Z-Test. (2) As you indicated, with such large sample sizes (~105), the Z-Test will work in this case. [Here](https://www.linkedin.com/pulse/z-test-vs-t-test-arunmozhi-ilango) is a good summary of how to choose – David C Aug 17 '16 at 15:42
Thanks David, much appreciated and I will keep that in mind in future. – Deepend Aug 17 '16 at 16:27

Is this the correct way to undertake Z-Tests on average user ratings to compare different versions of products?

1 Answers1

Linked