I am working on a text corpus. Each line contains between 10 and 50 words. There are around 25 000 words in the whole text and 1 000 000 lines. I turned this corpus into its tf-idf representation.
I was wondering is there is a sense to "high" and "low" values for each tf-idf. Can I ignore, say "words when their tf-idf is lower than 1.5" ?
Or are there real-world pathological text corpuses where tf-idf can have any values ?