7

The short version:

I can fit a model using Weighted Least Squares, given a diagonal matrix of weights $W$, by solving $(X^TWX)\hat{\beta}=X^TWy$ for $\hat{\beta}$.

Is there a GLM analogue? if so, what is it?

There seems to be a GLM analogue, e.g. with the weights argument in R's glm function. How is R using these weights?


The long version:

the situation

As a follow-up to my IPTW question, I just want to double check that I understand how to fit a parametric model using inverse probability(-of-treatment) weights (IPTW). The idea with IPTW is to simulate a dataset in which the relationship between my independent variables $(a^1,a^2,a^3)$ and dependent variable $y$ is unconfounded and therefore causal. For argument's sake let's say I already estimated an IPT weight $\hat{w}_i$ for each observation. These weights are hypothetical probability weights from the simulated dataset.

the question

I now want to fit a GLM. I'd just use WLS, but I'm working with a binary outcome and an outcome truncated at zero. So I have a linear model $\eta_i=a^T\beta$, a link $\mu_i=g(\eta_i)$, and a variance $V(y_i)$ derived from my likelihood for $y$. Then the likelihood equations are $$ \sum_{i=1}^N \frac{y_i-\mu_i}{V(y_i)}\frac{\partial\mu_i}{\partial\beta_j}=\sum_{i=1}^N \frac{y_i-\mu_i}{V(y_i)}\left(\frac{\partial\mu_i}{\partial\eta_i}x_{ij}\right)=0,~\forall j $$ as per Categorical Data Analysis, Agresti, 2013, section 4.4.5.

So all I have to do is multiply $var(\mu_i)$ by the weight $\hat{w}_i$, right? The same way I might if I wanted to incorporate an overdispersion parameter? If so, is this because the variance of, say, 5 independent observations is 5 times the variance of one independent observation?

Follow-up idea: since the likelihood is the product of the likelihood for each observation, is there some weighting procedure I can use to just weight the likelihoods?

shadowtalker
  • 11,395
  • 3
  • 49
  • 109
  • This looks relevant, I have not tried it out though: [R package for inverse probability weighing](http://www.jstatsoft.org/v43/i13/paper) – bdeonovic Jul 07 '14 at 13:12
  • Saw that. Might end up using it but they don't go through the math of actually estimating the parameters, which I would still need. – shadowtalker Jul 07 '14 at 13:14
  • Did you also follow up on: "The use of IPW to fit an MSM was described in detail, e.g., in Robins, Hern´an, and Brumback (2000), Hern´an and Robins (2006) and Cole and Hern´an (2008)."? – bdeonovic Jul 07 '14 at 14:14
  • Yep. Every single paper I've read on the subject either says nothing, or says to use `PROC GENMOD` in SAS, or the equivalent Stata command. I don't like fitting models that I don't understand, plus I already have everything in R. – shadowtalker Jul 07 '14 at 15:55
  • Some promise here http://eml.berkeley.edu/symposia/nsf99/papers/robins.pdf on pages 13-17, but the paper is near-unreadably dense and a bit over my head. – shadowtalker Jul 07 '14 at 15:56
  • Sorry, just a bit confused. Whats wrong with the [ipw](http://www.jstatsoft.org/v43/i13/paper) package? The package allows you to estimate the paramters. And I imagine the papers they allude to will show any mathematical (theoretical) derivations. – bdeonovic Jul 07 '14 at 16:32
  • I've read the papers. They don't. Also `ipw` just gives you the estimates for the weights. I have the weights; now I want to use them. I'd run into the same problem even with the package. – shadowtalker Jul 07 '14 at 18:15
  • Ideally I'd like a general way to fit an MLE on a re-weighted data set, without simulating data sets. – shadowtalker Jul 07 '14 at 21:41
  • Ah, sorry for my ignorance. – bdeonovic Jul 08 '14 at 01:14

1 Answers1

5

Fit an MLE by maximizing $$ l(\mathbf{\theta};\mathbf{y})=\sum_{i=1}^Nl{\left(\theta;y_i\right)} $$

where $l$ is the log-likelihood. Fitting an MLE with inverse-probability (i.e. frequency) weights entails modifying the log-likelihood to:

$$ l(\mathbf{\theta};\mathbf{y})=\sum_{i=1}^Nw_i~l{\left(\theta;y_i\right)}. $$

In the GLM case, this reduces to solving $$ \sum_{i=1}^N w_i\frac{y_i-\mu_i}{V(y_i)}\left(\frac{\partial\mu_i}{\partial\eta_i}x_{ij}\right)=0,~\forall j $$

Source: page 119 of http://www.ssicentral.com/lisrel/techdocs/sglim.pdf, linked at http://www.ssicentral.com/lisrel/resources.html#t. It's the "Generalized Linear Modeling" chapter (chapter 3) of the LISREL "technical documents."

shadowtalker
  • 11,395
  • 3
  • 49
  • 109
  • 1
    That this is the right formulation for *frequency* weights is clear when you consider doubling or tripling, etc., individual data values: the factor $\exp(l(\theta, y_i))$ (there's no subscript on the $\theta$) in the likelihood becomes $w_i$ multiplicative factors amounting to $\prod_{j=1}^{w_i}\exp(l(\theta, y_i))=\exp(w_il(\theta,y_i))$ whose logarithm is $w_il(\theta,y_i)$ and there you have it. Different developments (based on conditional probability calculations) are needed for probability weights. – whuber Jul 08 '14 at 20:27
  • Makes sense. I just wasn't sure that the analogy held in the case when "frequencies" didn't have to be integers. And, yes, the probabilities I'm inverting are coming from estimated conditional probabilities. – shadowtalker Jul 08 '14 at 23:31