I'm wondering if anyone can help me out or point out some resources to learn more about TF-IDF and document search.
I'm trying to implement a basic document search and am trying to better understand the differences and trade offs for my approach.
My current approach is to parse/stem all words in a set of documents and compute a normalized TF-IDF value for each document-word pair. When I query with keywords, I simply look for each word in the keyword, sum the TF-IDF values for each document-word, and rank them that way.
Are there any trade offs/differences/mistakes in using this approach? How does it compare to creating a vector for each document, creating a vector for the search query, and taking the cosine similarity to find the closest matches?