I tried many different text classification models from scikit learn. I trained the model using some posts from personal finance stack exchange. Posts are classified into the following four classes: "mortgage", "investing", "credit-card", and "taxes". The model generally works great when tested using the test dataset (I carved out 20% of the data for testing purpose). But if I try something that is totally unrelated to personal finance ("where is the restroom?", for example), the model still classified it into "investing". The problem is the model always picks one out of the four classes doesn't matter how irrelevant the text is. The probabilities the model assigns to the four classes always add up to 1.0. There is always one the wins out. Is there any way to tune the classifier/model so that in the case when the input is irrelevant to any of the classes, all four probabilities are low (they don't add up to 1.0)?
Thanks, Ryan