3

I have a collection of $10^5$ essays, each of which on average contains $10^3$ distinct words. There are $10^6$ distinct words in the entire collection. If I index every word what is the mean and median size of the inverted index lists?

My guess is that median would be 1, but I have no clue how without the parameters can I calculate harmonic mean? Can anybody help?

UPDATE: I should have really mentioned that before: I am talking about English language.

matcheek
  • 375
  • 3
  • 12
  • What do you mean by *inverted list*? – cardinal Apr 27 '11 at 22:55
  • I meant inverted index, that is for each for w I have a list of documents in which a particular word w occurs. inverted_index[w] = {doc1, doc2, doc6}. In a similar fashion as index of terms at the back of the book. – matcheek Apr 27 '11 at 23:23
  • 2
    So ... ummm ... is there some relationship between the title to your question ... and the question itself? – wolfies Apr 27 '13 at 15:01

0 Answers0