(I am not a statistician, so this might seem a dumb question, but I need your expert help!)
I have data that looks like this:
----------+----------+-------+-----+-----
City Emissions People Area GDP
----------+----------+-------+-----+-----
Sheffield . . . .
Oxford . . . .
London . . . .
Total . . . .
I want to compare emissions, but to normalise by people, area and GDP, which of course are all different units.
I clearly cannot add the emissions, people and area values together because this would mean that one person was equivalent to one unit of Area or GDP, which is non-sense.
The approach I've come up with is this:
1. Remove the dimensions so we can compare the different values
I divided the values by column totals for each city, to give a fraction of total for each city.
That gives a value that describes each city's share of the total (emissions, people, area...). I'll call these E, P, A, G
2. Average these factors
Then, average the three normalised factors/fractions produced in (1):
E × ( 1/P + 1/A + 1/G ) ÷ 3
i.e. MEAN( E/P, E/A, E/G )
Question
To me this seems to give a sensible comparator. I've removed the units by dividing each by its total. Then I'm averaging "fraction of emissions per fraction of people, area, GDP". I'm giving equal weighting to each of these, which I'm happy with. So if the answer for one city comes out as 2, then I would say that that city is emitting twice its fair share of emissions.
However, I've had some criticism saying that this is not mathematically/statistically valid because it's an average of averages. It is an average of averages, but I think that's OK and the only way to compare the different things together.
Q. Is what I've done valid, or is there a better way to do this?
Edit
I now believe that this is better:
E ÷ ( P + A + G ) × 3
i.e. E ÷ MEAN( P, A, G )
Which divides share(fraction) of emissions by an average of fractions of people, area and GDP. I think this is better because P, A, G represent "things that it's OK to have emissions for". In the first attempt (above) I have allowed a city with a low overall fraction of people, area and GDP to score the same as a city with a high overall fraction when they have the same emissions, which is clearly wrong given that I have said I value P, A, G equally.