1

In domain adaptation under covariate shift, one approach is to weight the instances from the source domain by a factor $\frac{p_T(x)}{p_S(x)}$ in the training, where $p_S(x)$ and $p_T(x)$ represent the density of $x$ in the source and target domains, respectively. It can be shown that this weighting factor of density ratio is proportional to $\frac{1}{p(\delta=S|x)} - 1$, where $p(\delta=S|x)$ is the probability of an instance $x$ coming from the source domain, typically obtained through training a classifier to distinguish between the two domains.

I have three questions regarding this approach.

  1. I have seen the weighting factor written simply as $\frac{1}{p(\delta=S|x)}$, with the "-1" part dropped (eg, the link above or here, page 15). Why?
  2. What if the source and target domains turn out to be the same, $p_S(x) = p_T(x)$? In this case, the weighting factor $\frac{p_T(x)}{p_S(x)}$ should be 1 for all $x$, but the classifier would be confused and returns a somewhat arbitrary boundary and arbitrary $p(\delta=S|x)$. Does that mean the approach fails in this case?
  3. When we train a classifier to predict $p(\delta=S|x)$, should we make sure it is calibrated?
Lei Huang
  • 756
  • 6
  • 13

1 Answers1

1

Not an expert, but here are my thoughts nevertheless:

  1. Dropping the -1 does not seem correct to me. Although the authors of the slides you linked to are certainly much more knowledgeable on this topic than I am, I would venture to claim that on their slide 14, this equation is wrong:

enter image description here

Following Bayes formula, the $Pr_T[x]$ would need to simply be $Pr[x]$, since the term on the left is $Pr[x \mid \sigma=1]$. (Very happy to be corrected if I'm misunderstanding something here.) I did not follow through the rest of the derivations, but my guess would be that this is why they do not get the -1 in the end. If there is a -1 (as in the derivation in your other linked question), then leaving it out for a $\propto$ statement is certainly not correct.

  1. The decision boundary may be arbitrary but a calibrated classifier should return a decision confidence of 50% everywhere, so this should in theory not be a problem. I cannot comment on whether this may pose any practical challenges with popular classification techniques.

  2. Well - since you're using the output of the classifier as a drop-in replacement for an actual probability measure - yes, absolutely?! :-)

jhin
  • 749
  • 4
  • 12