Questions tagged [weighted-sampling]

If you have survey data with weights, please use "survey-sampling" instead. If you need to draw Monte Carlo samples from a distribution that is intractable/inconvenient, and have to use a sampler from a simpler distribution that you would then correct with weights, please use "importance-sampling", "monte-carlo" and/or "simulation" instead.

This is an ambiguous tag. Please avoid using it. Better tags are:

132 questions
34
votes
5 answers

How to sample from a discrete distribution?

Assume I have a distribution governing the possible outcome from a single random variable X. This is something like [0.1, 0.4, 0.2, 0.3] for X being a value of either 1, 2, 3, 4. Is it possible to sample from this distribution, i.e. generate pseudo…
26
votes
1 answer

Standard deviation of binned observations

I have a dataset of sample observations, stored as counts within range bins. e.g.: min/max count 40/44 1 45/49 2 50/54 3 55/59 4 70/74 1 Now, finding an estimate of the average from this is pretty straight forward. Simply use the…
chezy525
  • 363
  • 1
  • 3
  • 6
14
votes
3 answers

Fitting multilevel models to complex survey data in R

I'm looking for advice on how to analyze complex survey data with multilevel models in R. I've used the survey package to weight for unequal probabilities of selection in one-level models, but this package does not have functions for multilevel…
Eric Green
  • 629
  • 9
  • 20
14
votes
0 answers

How can I measure model performance with weighted logistic regression?

I am working with some survey data that uses probability weights. A number of sources explain that likelihood-based tests and fit statistics like likelihood-ratio, AIC, and BIC are not valid in the context of the weighted MLE. Are there other tests,…
12
votes
1 answer

Defining quantiles over a weighted sample

I have a weighted sample, for which I wish to calculate quantiles.1 Ideally, where the weights are equal (whether = 1 or otherwise), the results would be consistent with those of scipy.stats.scoreatpercentile() and R's quantile(...,type=7). One…
Misha
  • 221
  • 1
  • 2
  • 3
10
votes
1 answer

What is a propensity weighting sampling / RIM?

I have come across the sampling method called "Propensity Weighting Sampling/RIM", but I do not have a good idea of what these survey methods are all about. What references in the literature cover this topic?
Beta
  • 5,784
  • 9
  • 33
  • 44
9
votes
1 answer

Machine learning with weighted / complex survey data

I have worked a lot with various nationally representative data. These data sources have a complex survey design, so the analysis requires the specification of stratification and weight variables. Among the data sources that are within my area of…
Brian P
  • 455
  • 1
  • 6
  • 12
8
votes
1 answer

When and how to use weights for sequence analysis in social science?

Weighting in sequence analysis So far, I have scarcely found papers that address the issue of weighting for sequence analysis (using for example the optimal matching algorithm). Sequence analysis normally involves several steps: setting or…
7
votes
0 answers

Frequency weights, rare events and logistic regression

I'm working on a model that requires me to look for predictors for a rare event (less than 0.5% of the total of my observations). My total sample is a significant part of the total population (50,000 cases). My final objective is to obtain…
Edu
  • 521
  • 5
  • 12
7
votes
2 answers

Inverse transformation sampling for mixture distribution of two normal distributions

I am confused by the special way required to use inverse method in the following problem, Here is the problem: Consider a mixture distribution of two normal distributions, where the desired PDF $f(x)$ is given by: $f(x) = r\, f_a(x) + (1 − r)\,…
7
votes
2 answers

Bootstrapping a sample with unequal selection probabilities

I want to "blow up" a sample, taken with replacement, for which I know the overall sampling probability $\pi_i$ for each item $i$. Is it valid to use bootstrapping and apply inverse probability weighting during the selection (as in the…
krlmlr
  • 749
  • 1
  • 8
  • 35
7
votes
0 answers

When to use longitudinal (panel) weights vs cross-section weights in complex surveys

I'm currently working with a longitudinal dataset, the Kauffman Firm Survey. The survey tracks about 5000 firms starting from 2004 - 2009. Firms die out over the years. It has both cross-sectional weights and longitudinal weights. I've checked out…
Robert
  • 275
  • 3
  • 6
6
votes
1 answer

Why is Sampling Importance Resampling (SIR) better than Importance Sampling (IS)?

From what I understand, SIR is a mechanism for sampling from a distribution $p$ that works as follows: Approximate a target distribution $p$ using an importance sample $S$ from a proposal distribution $q$ Draw a small sample $S_\text{small}$ from…
6
votes
2 answers

Particle filtering importance weights

In theory, the importance weight of a particle has to be a probability, i.e., $w_{s_t} = p(z_t|s_t)$. My question is: Since we eventually normalize the weights with their sum and get a probability distribution, do importance weights themselves have…
Zoran
  • 522
  • 4
  • 11
6
votes
2 answers

Which is the right way to handle imbalanced data in a regression problem?

I'm working on a regression problem with imbalanced data, and I would like to know if I'm weighting the errors correctly. I'll try to illustrate the concept with a simple example. Imagine I'm building a model to predict house prices in New York and…
1
2 3
8 9