I am trying to compare user ratings of various products, the majority of which come in several standard versions. However some of the products do not come in certain versions meaning that my data set has several N/A values.
The data is continuous interval type data with a range of -100 to +100
My question is, when should these values be replaced by 0?
V1 V2 V3 V4 V5 V6 V7 V8 V9
Product 1 2.63 -5.12 -0.41 5.29 9.89 4.16 14.73 9.06 -7.80
Product 2 0.60 0.94 4.47 N/A 0.12 21.47 N/A -4.63 1.29
Product 3 5.53 -16.20 -19.56 N/A 2.24 N/A 15.07 -3.47 -6.93
With N/A values included, excel tells me the average user rating given to each product is
Product 1 = 3.60
Product 2 = 3.47
Product 3 = -3.33
However if I replace the values with 0 then it changes the scores:
Product 1 = 3.60
Product 2 = 2.70
Product 3 = -2.59
I am sure others have dealt with this question before but I am not sure what to do. In future I want to undertake t-tests or z-tests on the data.
Some Research
To be honest apart from a little bit here and some questions on Research Gate I cant find a lot on this topic, I suspect I am using the wrong search terms however. The below is what I have got so far
There appears to be lots of questions about how to replace N/A values in datasets but not when or whether it should be done
There are a few other simalar questions on CV but they have not received an answer, e.g. here
A comment in reply to a question here, only slightly similar question suggests setting NA values to the minimum of the range, but this would significantly change my results...
P.S. I am really not sure what to tag this question with so if anyone could apply better tags it would be much appreciated.
Thanks