I have two datasets. Lets say average of 1st dataset is 10, and average of second dataset is 12. Is it acceptable to divide 12/10 and say that the relationship between both datasets is 1.2? I am interpreting 1.2 as, for one event in 1st dataset 1.2 events happen in 2nd dataset. Is this a correct interpretation? Note: values of 1st dataset happen 1st. 2nd dataset is not dependent on the 1st one.
-
1No, it is wrong. The ratio of two means is not the mean of paired ratios. – Carl Dec 09 '18 at 02:35
-
@Carl what would be the best way to pair both values so a relationship can be expressed as a single digit? – Excel-lit Dec 09 '18 at 12:45
-
Well, if the situation is that one outcome could be paired with any outcome of a second population, then one method would be to randomly sample population one for an outcome with replacement, pair it with a random sample from population two with replacement, divide, and repeat that process 1000 times, then average the ratios. That is called `bootstrap`. – Carl Dec 09 '18 at 12:59
-
Bootstrap may or may not be what you need, depending on exactly what you want to do. – Carl Dec 10 '18 at 21:07
-
I was just hoping to understand if it would make sense to divide two averages and use division's product as a coefficient? – Excel-lit Dec 10 '18 at 21:23
2 Answers
It depends on the context whether it is right. Your interpretation is ambiguous.
You would be right when you say that the population has from the one data-set 1.2 events for every event in the the other dataset.
Sidenote: You could still make this more precise by adding confidence intervals (How to compute the confidence interval of the ratio of two normal means).
You would be wrong when you use some paired data interpretation (for instance the data-sets relates to the same individuals) and you say that for every individual you have 1.2 events from the one dataset for every event in the other dataset (as in a linear relationship).
Example, of the contrast: Say the one dataset contains the number of times a person watches television (say 5 times per week) and the other dataset contains the number of times a person reads a book (say 1 time per week). Then you can say that the population watches 5 times more often television than reading books. But you can not say that an individual person watches 5 times television for every time it is reading a book (in fact it might be some negative correlation and a person is gonna watch less television for every time it is reading a book).
Sometimes the context may clearly indicate which of the two interpretations is used. For instance it is obviously a comparison on the population level when the two populations that relate to the two different datasets are different (such that no individual relates to both data-sets). An example: say the one dataset is the number of statisticians in a sample of a million inhabitants in the UK and the other dataset is the number of statisticians in a sample of a million inhabitants in the USA.

- 43,080
- 1
- 72
- 161
Suppose I have for a first group of numbers $\{2,4\}$ and for a second group $\{-2,2\}$ then the mean of the first group is $\frac{2+4}{2}=3$, and of the second group is $\frac{-2+2}{2}=0$. Then the ratio of the mean values is $\frac{3}{0}=?$. By convention some people call this $+\infty$.
Now there is another way to look at this. Suppose we wanted to see the mean of all possible fractions. Then we would construct the fractions that can be formed from the two groups: $\left\{\frac{2}{-2}, \frac{2}{2}, \frac{4}{-2}, \frac{4}{2}\right\}=\left\{\ -1, 1, -2, 2\right\}$, the mean of which is $\frac{-1+ 1 -2+ 2}{4}=\frac{0}{4}=0$.
Now am I pulling a fast one? OK, suppose I interchanged the groups then in the first case I would have gotten $\frac{0}{3}=0$, which is now a number (at least) and in the second case I would have gotten $\left\{\frac{-2}{2}, \frac{2}{2}, \frac{-2}{4}, \frac{2}{4}\right\}=\left\{\ -1, 1, -\frac{1}{2}, \frac{1}{2}\right\}$, the mean of which is $0$, and no different from the prior mean of all possible fractions.
Now which of these answers is suspect $\{+\infty,0,0,0\}$? Well, obviously the first answer is suspect, but it may be correct, that depends. Making fractions and finding mean values are not commutative operations, that is, the fraction of two mean values is not the same as the mean value of many fractions, and it is generally, but not always, more consistent to do the latter.
It is unclear what pairing of numbers to make fractions is implied by the question. For example, if I were handing out right shoes of different sizes to individuals regardless of their right foot lengths, then there would be one shoe size for each foot length, i.e., only $n$ fractions for $n$ shoes and $n$ people.
In the numerical example I gave, I had two groups of sizes $n$ and $m$, where in my case both $n$ and $m$ were equal to two, but in general that would make for $n\times m$ fractions. There are still other possibilities, but whatever the case is we cannot work backwards from the ratio of two mean values to even venture a guess. Taking an average erases any knowledge of what $n$ (or $m$) might have been, or whether $n$ or $m$ were taken with replacement or without replacement. For example, if I want to calculate my odds of winning by counting cards in a poker game with some cards showing on the table, I am doing so without replacement, that is, if a card is showing, it isn't available elsewhere. Nevertheless, after a night of poker, it is perfectly fair to say that if I am up by $\$2$, that would have been the terminal running average of my winnings, and if Jack only had $\$1$ for his winnings then I can say without fear of contradiction that I won twice what Jack did. So taking a ratio of average values is not necessarily incorrect. It is the meaning that is different for different calculations having different answers.

- 11,532
- 7
- 45
- 102