I have a collection of $10^5$ essays, each of which on average contains $10^3$ distinct words. There are $10^6$ distinct words in the entire collection. If I index every word what is the mean and median size of the inverted index lists?
My guess is that median would be 1, but I have no clue how without the parameters can I calculate harmonic mean? Can anybody help?
UPDATE: I should have really mentioned that before: I am talking about English language.