Out of distribution and shifting data distribution are two types of dataset shift 1, I can understand what out-of-distribution means but not what shifting data distributions are. In that blog an example of OOD is given as follow:
For example, consider what happens when a cat-versus-dog image classifier is shown an image of an airplane.
In the $X \rightarrow Y$ problem, the airplane is out of the distribution of Y, that is $\text{airplane}\not\in \{Y\}$, but when it comes to distribution shift I cannot get my head around. I wonder if it means the ratio of cats to dogs is different between that in the training set and that in the test set?
I have done some digging and thought the shifting data distribution is the same as covariate shift, and I read this blog and get the following definition:
Covariate shift appears only in $X \rightarrow Y$ problems, and is defined as the case where $P_{tra}(y|x) = P_{tst}(y|x)$ and $P_{tra}(x)\neq P_{tst}(x)$.
I wonder how we can measure the $P(x)$ since any case would be different from each other and that would only be uniform distributions?
To be more concrete, if we draw two samples randomly from an image corpus containing dogs and cats the two samples follow the same distribution(in fact, I am not certain). But if we don't know if the two samples are randomly drawn from the corpus or not how can we measure the two distributions are not the same since every two images in that corpus would be different?
Does detecting covariate shift amount to making sure if the two samples are drawn randomly from the population since otherwise the two distributions(from some perspectives we are not sure if they are important features to predict $Y$) would be different?
In this representation, I learned three methods to detect covariate shift: visualization, membership modeling, and uncertainty quantification but I don't know how they relate to the distribution.