Questions tagged [weighted-data]

Datasets where different pieces of data can have different "weights", i.e. different importance.

When weights are put on cases/observations/rows, they may indicate different probabilities of selection or response in complex surveys; different accuracy of the underlying observations. A useful resource describing the various type of the observation weights is http://www.ats.ucla.edu/stat/stata/faq/weights.htm.

Another situation when the weights are attached to a unit (row in the data) arises in importance sampling when the weight is the ratio of the target density (that is difficult to sample from) to the convenient sampling density.

When weights are put on variables/columns/traits (regression weights, factor analysis weights, neuron input weights, etc.), these are model coefficients. Please tag your question with the model-specific tag in that situation, and avoid this tag when your weights are attached to variables/columns rather than observations/cases/rows.

185 questions
28
votes
2 answers

Bias correction in weighted variance

For unweighted variance $$\text{Var}(X):=\frac{1}{n}\sum_i(x_i - \mu)^2$$ there exists the bias corrected sample variance, when the mean was estimated from the same data: $$\text{Var}(X):=\frac{1}{n-1}\sum_i(x_i - E[X])^2$$ I'm looking into weighted…
27
votes
2 answers

Adding weights to logistic regression for imbalanced data

I want to model a logistic regression with imbalanced data (9:1). I wanted to try the weights option in the glm function in R, but I'm not 100% sure what it does. Lets say my output variable is c(0,0,0,0,0,0,0,0,0,1). now I want to give the "1" 10…
22
votes
1 answer

Weighted Variance, one more time

Unbiased weighted variance was already addressed here and elsewhere but there still seems to be a surprising amount of confusion. There appears to be a consensus toward the formula presented in the first link as well as in the Wikipedia article. …
confusedCoder
  • 423
  • 1
  • 4
  • 7
22
votes
1 answer

Such thing as a weighted correlation?

I have some interesting data on the most popular musical artists streamed divided by location into about 200 congressional districts. I want to see if it's possible to poll a person on his or her musical preferences and determine whether he or she…
Chris Wilson
  • 389
  • 1
  • 3
  • 12
21
votes
2 answers

Weighted principal components analysis

After some searching, I find very little on the incorporation of observation weights/measurement errors into principal components analysis. What I do find tends to rely on iterative approaches to include weightings (e.g., here). My question is why…
noname
  • 500
  • 1
  • 3
  • 8
15
votes
2 answers

Can I (justifiably) train a second model only on the observations that a previous model predicted poorly?

Say I commit the following sins while building a predictive model: I take my dataset and split it into four subsets: Three for training (Train_A, Train_B, and Train_C) and one for validation. I train an initial model (Model_A) on Train_A. Because…
11
votes
1 answer

Correct equation for weighted unbiased sample covariance

I'm looking for the correct equation to compute the weighted unbiased sample covariance. Internet sources are quite rare on this theme and they all use different equations. The most likely equation I've found is this…
gaborous
  • 635
  • 1
  • 8
  • 22
10
votes
2 answers

How to calculate the standard error of a proportion using weighted data?

I know the "textbook" estimate of the standard error of a proportion is $SE=\sqrt{\frac{p(1-p)}{n}}$, but does this hold up when the data are weighted?
simudice
  • 125
  • 1
  • 4
9
votes
1 answer

Weighted least square weights definition: R lm function vs. $\mathbf W \mathbf A\mathbf x=\mathbf W \mathbf b$

Could anyone tell me why I am getting different results from R weighted least squares and manual solution by matrix operation? Specifically, I am trying to manually solve $\mathbf W \mathbf A\mathbf x=\mathbf W \mathbf b$, where $\mathbf W$ is the…
Haitao Du
  • 32,885
  • 17
  • 118
  • 213
8
votes
1 answer

How is a Poisson rate regression equal to a Poisson regression with corresponding offset term?

I do not understand the role of weights in "weighted Poisson regression". What exactly is being weighted? Is it the contribution of the observation to the log-likelihood of the model, or something else? In the following two popular threads, Where…
Alex
  • 3,728
  • 3
  • 25
  • 46
8
votes
3 answers

Variance of weighted mean greater than unweighted mean

A reviewer of mine is asking for a reason why I have used unweighted data, instead of weighted data. I have discussed the issue with a statistician and his response was along the lines of If you have independent observations and you take the…
user08041991
  • 255
  • 3
  • 6
7
votes
2 answers

What is the distribution of the (arbitrarily) weighted Maximum Likelihood Estimator?

Suppose you observe vector $X_i$ of independent variables, and $y_i$ dependent variables, with likelihood $l\left(\theta;X_i,y_i\right)$. Assume the $y_i$ are independent. Furthermore assume you are given positive weights, $w_i$ which are arbitrary,…
7
votes
1 answer

Two-sample Kolmogorov–Smirnov test with weights

I have to compare two datasets. The first is real data, while the second is a simulation. I want to look just to one variable in the datasets, and testing if it is compatibile between data and simulation. The underlying random variable in continuos.…
Ruggero Turra
  • 684
  • 7
  • 19
7
votes
2 answers

Questions on multiple imputation with MICE for a multigroup-SEM-analysis? (including survey weights)

I am planning to do a multigroup SEM analysis. I gathered survey data and calculated a survey weight. Some of my variables have item nonresponse (mostly around 5% missings). I´ve decided to use multiple imputation to handle the missing data. First,…
6
votes
1 answer

What does this sampling weight mean?

The data comes from agricultural market research on farming. The sample was derived based on stratification of farming industries (sheep, beef, grains, etc.) and random sampling within each stratum. We have population estimates (frequencies,…
NonSleeper
  • 617
  • 1
  • 5
  • 13
1
2 3
12 13