0

Suppose we have a dataframe similar to the following (I have more than 3 columns and all values ​​in the dataframe are in %):

          % of municipalities with internet connection | % of households with internet access | Phenomenon 3  
Region 1                      80%                      |                  85%                 | ...
Region 2                      95%                      |                  90%                 | ...
Region 3                      90%                      |                  95%                 | ...
Region 4                      75%                      |                  80%                 | ...

I need to build an index to determine a ranking of the regions based on the observed phenomena. Anyone have any proposals? If I had only one phenomenon it would be enough to sort the regions by the %, but with more phenomena I don't know exactly how to proceed.

LJG
  • 111
  • 3
  • This might help: https://stats.stackexchange.com/questions/108418/multivariate-sorting-ranking – Darren James Nov 18 '21 at 17:08
  • Region 2 has a higher municipal internet connection rate than region 3, which has a higher rate than region, which has a higher rate than region 4, while region 3 has a higher rate than region 4, while region 3 has a higher household internet connection rate than region 2, which has a higher rate than region 1, which has a higher rate than region 4. // You already knew how to do that, so what is it that you want to rank? – Dave Nov 18 '21 at 17:08
  • This is a FAQ appearing under many guises. It asks for a way of ranking objects based on two or more characteristics (which might be uncertain--it doesn't matter). *It has no statistical answer,* because the tradeoffs you make between the characteristics depend on how *you* value them. – whuber Nov 18 '21 at 17:10
  • @DarrenJames are you proposing the calculation of the z-score of each column? – LJG Nov 18 '21 at 17:17
  • Why calculate the z-score? – Dave Nov 18 '21 at 17:19
  • @whuber I don't understand your argument. In general if a % is high then the respective region is better placed in the ranking. – LJG Nov 18 '21 at 17:20
  • 1
    @LJG Then you have your rankings...just rank based on the percentages. – Dave Nov 18 '21 at 17:21
  • @Dave the answer proposed by Darren James proposes that. Step 1) and Step 2) – LJG Nov 18 '21 at 17:21
  • @Dave yes but I need a overall ranking (which summarizes all variables) – LJG Nov 18 '21 at 17:22
  • And that's exactly the point: any ranking of all the variables reflects tradeoffs between values of each of the variables. Although statistical thinking has informed development of theories of ranking (and utility), it gets us only as far as having principled methods to elicit a mathematically consistent set of *your subjective values.* The answer in the link offerered by @Darren is one of infinitely many solutions and thereby suffers by being *completely* arbitrary. – whuber Nov 18 '21 at 17:42
  • 1
    The example in the link would involve calculating a z-score for each row (Region). As @whuber mentions, this is only one potential strategy among many and you will have to decide whether it's useful in meeting your objectives. – Darren James Nov 18 '21 at 19:47

0 Answers0