Questions tagged [selection-bias]

Bias introduced by non-random selection of observations, such that the sample is not representative of the underlying population.

The Wikipedia page lists different flavors and examples of this bias in action.

66 questions
12
votes
1 answer

Interpretation of coefficient of inverse Mills ratio

How do you interpret the coefficient of inverse Mills ratio (lambda) in two step Heckman model?
Quirik
  • 673
  • 2
  • 6
  • 19
11
votes
3 answers

Abraham Wald survivorship bias intuition

During World War II, the statistician Abraham Wald took survivorship bias into his calculations when considering how to minimize bomber losses to enemy fire. Wald noted that the study only considered the aircraft that had survived their missions;…
Luis P.
  • 731
  • 1
  • 5
  • 12
10
votes
3 answers

What is it called when an experimenter discards results that are too unexpected?

There is a type of scientific error where an experimenter gets a result significantly different from prior researchers, assumes they made a mistake, and redoes the experiment until they get a more expected value, which they publish. I vaguely…
6
votes
2 answers

Favored methods for overcoming selection bias (special attention to healthcare fields)?

I am frequently measuring the effect of behavioral health treatment interventions on outcomes of interest. However, comparing the relative efficacy of different types of treatment is tricky - more intensive interventions may indicate clients with…
6
votes
2 answers

Is this actually an example of selection bias?

In Lesson 3, Chapter 3 of Miguel Hernán's edX course on causal diagrams, he presents this DAG: It represents a study on the effect of hormone therapy on lung cancer (whether hormone therapy causes lung cancer). Among women with lung cancer in…
suckrates
  • 827
  • 5
  • 14
5
votes
0 answers

Correcting Sample Selection Bias given actual Distribution

I have two datasets, both from the same population: The samples from the first survey are quite representative of the underlying truth. However, the second survey comes with a change in distribution due to sample selection bias. If I merge the data…
5
votes
4 answers

Bias induced from model selection

I am trying to understand the following sentence Cross-validation and information criteria make a correction for using the data twice (in constructing the posterior and in model assessment) and obtain asymptotically unbiased estimates of predictive…
4
votes
2 answers

How would a statistician describe the problem with the figure in this publication? the solution?

I pointed out a problem with averaging values over time here https://www.researchgate.net/publication/344137839_SARS-CoV-2_binds_platelet_ACE2_to_enhance_thrombosis_in_COVID-19/comments in the comments. How would a statistician describe the problem…
4
votes
1 answer

$\frac{P(x_1 \mid y, s = 1) \dots P(x_n \mid y, s = 1) P(y \mid s = 1)}{P(x \mid s = 1)}$ indicates that naive Bayes learners are global learners?

I am currently studying the paper Learning and Evaluating Classifiers under Sample Selection Bias by Bianca Zadrozny. In section 3. Learning under sample selection bias, the author says the following: We can separate classifier learners into two…
3
votes
2 answers

How to avoid selection bias while updating lead scoring (predictive) model with new data

We developed a standard lead scoring model using logistic regression on couple of months worth data. The model has been working and we have been pushing only top 1/3 leads to sales team basis that. The model is giving around 40% lift. This model is…
3
votes
1 answer

Heckman with second step probit in R

The functions selection and heckit (package sampleSelection) support a binary dependent variable in the outcome equation: The dependent variable of of the selection equation (specified by argument selection) must have exactly two levels (e.g.,…
Ilaria
  • 31
  • 3
3
votes
2 answers

Usage of Heckman estimation for a random sample

My colleague argues to use a Heckman Model in the following case (Agricultural economics): I have a random sample of farmers (n) of the general population N. In n some observations apply a given technique to improve agricultural productivity…
joaoal
  • 73
  • 7
2
votes
0 answers

How to accommodate endogeneity after matching?

I am working on a field experiment where assignment to treatment vs. comparison was random, but participation uptake was not. The design is pre-post, and attrition is certainly not MCAR. This is a clustered design (randomization at level 1).…
2
votes
0 answers

Which DAG would explain the lack of correlation between height and performance in NBA players?

A classic example of "selection bias" involves looking at the performance of professional basketball players. The example goes, among NBA players there is no correlation between height and performance. Obviously that cannot be generalized to "height…
CarrKnight
  • 1,218
  • 9
  • 18
2
votes
0 answers

Using a sample of paid survey respondents to bias correct lower response rate among larger non-paid sample

I'm running a tracker survey on a website that has a low response rate of about 2%. The survey is not incentivized but the website traffic is large enough in volume I can always meet my sample target of 5,000 each month. I have pretty good data on…
1
2 3 4 5