In the literature of transfer learning and domain adaptation everyone talks about two datasets having different feature spaces and different distributions. In case of having image datasets, I think I understand what they mean by a difference in feature space: it is basically the dimensions of our images. (For example, if each image is greyscale and 4x4, it means that we have 16 dimensions. A 5x5 RGB image has $5\cdot5\cdot3=75$ dimensions.).
What confuses me is the distribution of image datasets. If I have a dataset of 1000 greyscale images and each image has a dimension of 4x4=16, then we have 1000 points in a feature space with 16 dimensions. Do I understand correctly that by using these points, we can then estimate the underlying distribution we are sampling from?
Secondly, in case one needs to check whether two image datasets come from different distributions, how can this be achieved?
I appreciate your guidance.