It's clear that if your model is doing a couple percent better on your
training set than your test set, you are overfitting.
It is not true. Your model has learned based on the training and hasn't "seen" before the test set, so obviously it should perform better on the training set. The fact that it performs (a little bit) worse on test set does not mean that the model is overfitting -- the "noticeable" difference can suggest it.
Check the definition and description from Wikipedia:
Overfitting occurs when a statistical model describes random error or
noise instead of the underlying relationship. Overfitting generally
occurs when a model is excessively complex, such as having too many
parameters relative to the number of observations. A model that has
been overfit will generally have poor predictive performance, as it
can exaggerate minor fluctuations in the data.
The possibility of overfitting exists because the criterion used for
training the model is not the same as the criterion used to judge the
efficacy of a model. In particular, a model is typically trained by
maximizing its performance on some set of training data. However, its
efficacy is determined not by its performance on the training data but
by its ability to perform well on unseen data. Overfitting occurs when
a model begins to "memorize" training data rather than "learning" to
generalize from trend.
In extreme case, overfitting model fits perfectly to the training data and poorly to the test data. However in most of the real life examples this is much more subtle and it can be much harder to judge overfitting. Finally, it can happen that the data you have for your training and test set are similar, so model seems to perform fine on both sets, but when you use it on some new dataset it performs poorly because of overfitting, as in Google flu trends example.
Imagine you have data about some $Y$ and its time trend (plotted below). You have data about it on time from 0 to 30, and decide to use 0-20 part of the data as a training set and 21-30 as a hold-out sample. It performs very well on both samples, there is an obvious linear trend, however when you make predictions on new unseen before data for times higher than 30, the good fit appears to be illusory.

This is an abstract example, but imagine a real-life one: you have a model that predicts sales of some product, it performs very well in summer, but autumn comes and the performance drops. Your model is overfitting to summer data -- maybe it's good only for the summer data, maybe it performed good only on this years summer data, maybe this autumn is an outlier and the model is fine...