Statistical measure of international diversity

Question

I've got a large collection of geotagged tweets, each linking to an article from a given author. For each author, I'd like to derive a number that describes how diverse is their list of tweeting countries.

Of course I could just count the number of countries represented, but I'd like to normalize by the number of tweets linking to each author. So for instance, an author attracting 1000 total tweets from 50 countries should rank lower in geographic diversity than another one tweeted in 50 countries, but only from 100 tweets.

A naive way would be to use tweets per country, but this seems less useful given that there are a limited number of countries to choose from: one's 150th country is less likely to show up than one's 15th, and the simple proportion doesn't reflect this.

I've got some vague ideas about using a binomial distribution, but would love to get a more experienced perspective.

See a number of answers already on site which discuss indices of diversity, inequality or concentration, including [this one](http://stats.stackexchange.com/questions/115453/how-to-express-inequality-of-a-distribution-in-one-number/115464#115464) — Glen_b, Oct 26 '14 at 07:32

score 3 · Answer 1 · answered Oct 26 '14 at 06:07

3

You could try applying diversity index statistics. Basics are at http://en.m.wikipedia.org/wiki/Diversity_index. Ecologists and population geneticists regulularly apply these to eg community ecology questions or genetic diversity estimates.

answered Oct 26 '14 at 06:07

bob

31
1

1

Economists have similar (in many cases identical) measures, but often call them different names. – Glen_b Oct 26 '14 at 06:44
Excellent, this is really helpful. But as I think about it, I realize a problem: there are an unequal number of people living in each country. So a even perfect evenness score using Shannon, for instance, would be practically impossible--as many tweets from the Vatican City as from China, for instance! So I think maybe chi-squared is a better solution... – thisismyname Oct 26 '14 at 07:41

Statistical measure of international diversity

1 Answers1