Questions tagged [covariate-shift]

9 questions
8
votes
1 answer

Difference between distribution shift and data shift, concept drift and model drift

Lately, I am seeing both terms used interchangeably in several scenarios. Joaquin Quiñonero in MIT press (NIPS), Dataset Shift in ML NIPS 2021 workshop in DistShift Model drift: Towards Data Science Are there differences in the definitions?…
4
votes
1 answer

Why is importance-weighted empirical risk minimization finite-sample biased?

Classical risk minimization (RM) minimizes the expected loss over the training distribution $p_{\mathrm{train}}(x)$, $$\theta^*_{RM} = \arg \min_\theta E[\ell(x, \theta)]_{p_{\text{train}}}.$$ As the distribution $p_{\text{train}}$ is usually…
2
votes
2 answers

How to intuit the covariate shift?

Out of distribution and shifting data distribution are two types of dataset shift 1, I can understand what out-of-distribution means but not what shifting data distributions are. In that blog an example of OOD is given as follow: For example,…
1
vote
1 answer

Covariate shift in k-means clustering

I'm trying to build a customer segmentation framework on e-commerce data. To do this, I'm using k-means clustering on variables which quantify the purchase Recency, purchase Frequency, Monetary value of the purchase (RFM segmentation) + additionally…
1
vote
1 answer

Domain adaptation under covariate shift: estimating density ratio through a classifier

In domain adaptation under covariate shift, one approach is to weight the instances from the source domain by a factor $\frac{p_T(x)}{p_S(x)}$ in the training, where $p_S(x)$ and $p_T(x)$ represent the density of $x$ in the source and target…
1
vote
0 answers

Should I use statistical tests when the sample size is big (over 100K)?

I'm looking for a method to identify data drift of features between two different times. Background: I'm calculating the same features, on almost the same population (for example, company employees) every month. Population size is over 100K. An…
Shay.G
  • 11
  • 1
0
votes
0 answers

Can two subsamples of the same dataset have different distributions (covariate shift)?

The reason for my question is that I trained a model for binary classification; once obtained the results, I trained another model on these results where: The predicted instances are used as a training set. And the unpredicted instances are used as…
0
votes
0 answers

Test data relevance to a model (covariate shift)

I am trying to design an algorithm that will allow to calculate the relevance of test data to a trained model. This can be done by checking if predictor variables have a different distribution in train and test data (covariate shift). Main idea: If…
dokondr
  • 247
  • 2
  • 10
0
votes
0 answers

What type of domain shift exists in my data?

I am trying to understand what type of shift(s) exist in my problem to get a better grasp. I have a dataset which comprises of a deep neural network's (DNN) runtime latency ($y$), its architecture ($a$) as well as the hardware it was run on ($h$).…
saad
  • 155
  • 7