Usually, as this site's name suggests, you'd want to separate your train, cross-validation and test datasets. As @Alexey Grigorev mentioned, the main concern is having some certainty that your model can generalize to some unseen dataset.
In a more intuitive way, you'd want your model to be able to grasp the relations between each row's features and each row's prediction, and to apply it later on a different, unseen, 1 or more rows.
These relations are at the row level, but they are learnt at deep by looking at the entire training data. The challenge of generalizing is, then, making sure the model is grasping a formula, not
depending (over-fitting) on the specific set of training values.
I'd thus discern between two TFIDF scenarios, regarding how you consider your corpus:
1. The corpus is at the row level
We have 1 or more text features that we'd like to TFIDF in order to discern some term frequencies for this row. Usually it'd be a large text field, important by "itself", like an additional document describing a house buying contract in house sale dataset. In this case the text features should be processed at the row level, like all the other features.
2. The corpus is at the dataset level
In addition to having a row context, there is meaning to the text feature of each row in the context of the entire dataset. Usually a smaller text field (like a sentence).
The TFIDF idea here might be calculating some "rareness" of words, but in a larger context. The larger context might be the entire text column from the train and even the test datasets, since the more corpus knowledge we'd have - the better we'd be able to ascertain the rareness. And I'd even say you could use the text from the unseen dataset, or even an outer corpus.
The TFIDF here helps you feature-engineering at the row-level, from an outside (larger, lookup-table like) knowledge
Take a look at HashingVectorizer, a "stateless" vectorizer, suitable for a mutable corpus