I have been reading this paper for a few days. There is one section (Section 3.3) that confuses me.
We start by gathering local features from training images of a particular class into a single set. Then for every local feature, a Gaussian kernel is placed in the feature space with its mean at the feature. The probability density function (PDF) of the class is then defined as the normalized sum of all the kernels.
To simplify the discussion, we can assume we have $N$ features with each feature has dimensionality 128. Now, I am lost when the author said
a Gaussian kernel is placed in the feature space with its mean at the feature
It seems that this means using kernel method but it also seems the author wants to use kernel density estimator, and what exactly does with its mean at the feature means?
Any suggestions to clarify these confusions are welcome!