In the book “Programming Collective Intelligence” Segaran explains the Fisher method for categorizing text as an alternative to Naive Bayes classifier. The Fisher method uses inverse-chi-square-distribution, which I do not really understand.
I watched this video found on stats.stackexchange about chi-square-distribution to understand at least the “forward” function: http://www.youtube.com/watch?v=dXB3cUGnaxQ
Segaran explains in his book that they use inverse chi-square to somehow get a probability “that a random set of probabilities would return such a high number”. With high number he means that an item fitting a specific category has many features with high probabilities in that category. Somehow he also seems to take into account that “if the probabilities were independent and random, the result of this calculation would fit a chi-squared distribution”. But as he mentioned before the words are not independant (which is also a false assumption at naive bayes). So how does this then work?
And if I understand it right now, the inverse chi-square function somehow checks if many of my words have a high probability of being in the text and only if all words have such a high probability it returns a high over-all probability?
I’m sort of confused.
PS: The whole paragraph: “Fisher showed that if the probabilities were independent and random, the result of this calculation would fit a chi-squared distribution. You would expect an item that doesn’t belong in a particular category to contain words of varying feature probabilities for that category (which would appear somewhat random), and an item that does belong in that category to have many features with high probabilities. By feeding the result of the Fisher calculation to the inverse chi-square function”, ou get the probability that a random set of probabilities would return such a high number.”