4

(I am not a statistician, so this might seem a dumb question, but I need your expert help!)

I have data that looks like this:

----------+----------+-------+-----+-----
City       Emissions  People  Area  GDP
----------+----------+-------+-----+-----
Sheffield      .         .       .    .
Oxford         .         .       .    .
London         .         .       .    .
Total          .         .       .    .

I want to compare emissions, but to normalise by people, area and GDP, which of course are all different units.

I clearly cannot add the emissions, people and area values together because this would mean that one person was equivalent to one unit of Area or GDP, which is non-sense.

The approach I've come up with is this:

1. Remove the dimensions so we can compare the different values

I divided the values by column totals for each city, to give a fraction of total for each city.

That gives a value that describes each city's share of the total (emissions, people, area...). I'll call these E, P, A, G

2. Average these factors

Then, average the three normalised factors/fractions produced in (1):

E × ( 1/P + 1/A + 1/G ) ÷ 3

i.e. MEAN( E/P, E/A, E/G )

Question

To me this seems to give a sensible comparator. I've removed the units by dividing each by its total. Then I'm averaging "fraction of emissions per fraction of people, area, GDP". I'm giving equal weighting to each of these, which I'm happy with. So if the answer for one city comes out as 2, then I would say that that city is emitting twice its fair share of emissions.

However, I've had some criticism saying that this is not mathematically/statistically valid because it's an average of averages. It is an average of averages, but I think that's OK and the only way to compare the different things together.

Q. Is what I've done valid, or is there a better way to do this?

Edit

I now believe that this is better:

E ÷ ( P + A + G ) × 3

i.e. E ÷ MEAN( P, A, G )

Which divides share(fraction) of emissions by an average of fractions of people, area and GDP. I think this is better because P, A, G represent "things that it's OK to have emissions for". In the first attempt (above) I have allowed a city with a low overall fraction of people, area and GDP to score the same as a city with a high overall fraction when they have the same emissions, which is clearly wrong given that I have said I value P, A, G equally.

artfulrobot
  • 141
  • 4
  • So just so I am clear with what you are asking, you are attempting to compare the emmissions of people and you want to correct for differences in the variables people, area, and gdp? – aplassard Apr 30 '14 at 15:22
  • 1
    @aplassard, thanks for reading. No, I want to compare the emissions of **cities**, but normalising for bigger/smaller cities on the basis of people, area and gdp. – artfulrobot Apr 30 '14 at 15:31
  • 3
    On its face, this procedure is invalid because $E/P$ is emissions per capita, $E/A$ is emissions per unit area, and $E/G$ is emissions per unit of economic output: the three are incommensurable, so adding them is not well defined. (You would get different results merely by changing GDP from pounds to euros, for instance.) However, incommensurable values *can* be added (with weights) in order to *rank* the cities. That's a question of *your* valuation; it cannot be answered with the information given. Please see http://stats.stackexchange.com/questions/9358 for more about valuation. – whuber Apr 30 '14 at 16:21
  • @whuber no, E/P is **not** emissions per capita. E, P etc. are normalized, dimensionless values because `E = (emissions for city) / (total emissions for all cities)` and `P = (people in city) / (total people in all cities)` As described in the paragraph that starts "First, ..." – artfulrobot Apr 30 '14 at 17:03
  • 3
    Thank you for clarifying that. Would you please edit your question to avoid confusion? Where you write "... to normalise by people, area and GDP," you play into my interpretation and seem to contradict your last comment. You are actually normalizing by *fraction* of people, etc. Although now your ratios $E/P$ etc are commensurable, there remain questions about their stability. I am referring to the fact that all these values--even on a relative basis--could change radically by introducing or omitting just a single city from your study. Regardless, your question is *still* about valuation. – whuber Apr 30 '14 at 18:01
  • @whuber thanks for your patience - I had a feeling I'd be using the wrong language somewhere in such a technical field! I've edited, hopefully clarifying that. As for valuation, I understand that I am assuming (and deciding that) a zero-based linear relationship `value = cx` holds true for the data, and that the nominator has an approx linear relationship with the denominator. I can see values would change by changing the dataset, but my end analysis will be on ranking the output values, which I think mitigates this. – artfulrobot May 01 '14 at 10:02
  • 1
    Any chance you would benefit by comparing actual emissions to expected emissions, based on the other 3 variables? I.e., using regression. – rolando2 May 01 '14 at 13:02
  • @ronaldo2 interesting. I think that expected emissions would have to itself be derived from the same dataset, though, which I think would bring back to where we are? – artfulrobot May 01 '14 at 13:12
  • 1
    I don't see here a provision for computing expected emissions based simultaneously on all 3 predictors. That's where regression could help. This connects with @aplassard's comment. – rolando2 May 01 '14 at 17:23

0 Answers0