Questions tagged [sample-weighting]
14 questions
3
votes
1 answer
Is IPTW (inverse probability of treatment weighting) legal?
When using IPTW, one can easily get weights 10 or even 20 for the observations.
For instance, in logistic regression, weight 10 for an observation means that we have not one, but 10 observations identical to this one. Thus, if we are allowed to…

Pavel Ruzankin
- 121
- 5
3
votes
1 answer
Minimize SSE function
Consider a data set in which each target $t_n$ is associated with a weighting factor $r_n > 0$, so that the sum-of-squares error funtion becomes
$$SE(w)= \frac{1}{2} \sum_{n=1}^N r_n \left(\mathbf{w}^T \phi(x_n)− t_n\right)^2.$$
Find an expression…

Marcel Braasch
- 215
- 1
- 9
3
votes
3 answers
Weighting significance tests according to the appropriateness of their assumptions
Consider a t-test of means. One formula for computing the p-value assumes equal variances. Another formula assumes unequal variances. With small sample sizes the tests can give quite different results and one can examine the variances to see which…

Tim
- 3,255
- 14
- 24
1
vote
0 answers
Nonresponse weight adjustments in multi-stage household surveys
I have a question about nonresponse weighting in complex sample surveys in multi-stage designs, like say, The US National Comorbidity Survey Replication (NCS-R), the Health and Retirement Study (HRS), or perhaps the most well known NHANES (National…

SurveyStatLearner
- 11
- 2
1
vote
0 answers
XGBoost regressor sample weight has negligible impact on performance
I am using XGBoost regressor for a prediction problem. I did a 70/30 split for the available data (around 60K samples) to split training/validation. For the training portion, I used 80% to train the XGBoost and 20% to monitor the performance for…

Bin Zhou
- 111
- 1
1
vote
1 answer
WeightIT package error: treatment and covariates must have same number of units
While using the weightIt package in R I encountered a strange error:
"error: treatment and covariates must have same number of units"
Now, checking the root code of the package and this specific error generation and it says the number of rows for…

user207581
- 41
- 3
1
vote
1 answer
Domain adaptation under covariate shift: estimating density ratio through a classifier
In domain adaptation under covariate shift, one approach is to weight the instances from the source domain by a factor $\frac{p_T(x)}{p_S(x)}$ in the training, where $p_S(x)$ and $p_T(x)$ represent the density of $x$ in the source and target…

Lei Huang
- 756
- 6
- 13
1
vote
1 answer
How to handle different sized experiment samples
Imagine a 3 by 1 experiment. One group has 1,000 observations, one with 5,000 observations, and 4,000 observations in the last group. I'm trying to see whether the manipulation between groups in the experiment had an effect. Do I need to do anything…

Eric Tim
- 65
- 6
1
vote
0 answers
How should one compute confidence intervals for means computed with inverse propensity weights (IPW)?
Inverse propensity weighing involves a machine learning model that takes features and outputs the predicted probability that this person is in the sample. Let $w_i$ be the inverse of the output for the $i^{th}$ person in the sample. Then the inverse…

Andrew NC
- 309
- 2
- 7
1
vote
1 answer
Weighted Survey Predicted Probabilities
I have a question about calculating prevalence using predicted probabilities from a survey weighted generalized linear model.
Say my goal was to calculate the prevalence of a binary outcome using the predicted probabilities of that outcome over some…

Molls
- 80
- 6
0
votes
0 answers
Weighting for modelling probability of selection
I want to use inverse probability weighting in some regressions and to estimate some weighted means from a non-representative sample. I plan to estimate a probit model for probability of selection into the sample, where the non-representative sample…

ADF
- 113
- 2
0
votes
1 answer
Sample weighting vs. (e.g. one-hot encoded) categories
I have seen recommendations to use sample weighting when the training dataset is not evenly balanced over known categories, so that an imbalance in the number of elements in each category does not skew the training towards the most frequently…

Julian Moore
- 141
- 3
0
votes
1 answer
What is the sample size and variance for the mean that is a simple, unweighted average of two independent groups' means?
We have two groups: N1 = 10 and N2 = 100
Their means on some measurement are: Mean1 = 4 and Mean2 = 5
Their variances are Var1 = 3 and Var2 = 2.5.
Let's further assume we have no access to the individual level data.
Some guy wants to combine the two…

dl7631
- 3
- 2
0
votes
1 answer
Including a weighting variable in a linear regression
I'm looking at how temperature affects length. My length variable is the mean length calculated for every year, it is derived from ~10,000 data points. Not every year had the same sampling effort (e.g. 1998 n=300 vs 2001 n=2078).
A colleague…

watermineporcupine
- 39
- 1
- 7