I'm currently doing an imaginary project where I'm investigating a possible correlation between the number of tests done for a specific disease and the number of deaths caused by said disease. I'm looking at several different areas of different population sizes separately, and the time period is divided into weeks, so that one observation = one week.
I know I should use Weighted Least Squares, but I just have a really hard time grasping the WLS itself, as well as the weights I should use for the observations.
What little advice I have gotten so far says that I could use the inverse of the dependent variable (no. of deaths) as weight, with the reasoning being that because the dependent variable is a count, a reasonable approximation is to assume that it follows a Poisson distribution. And in such a distribution both the mean and the variance is the expected no. of counts (estimated by the observed counts). Therefore, by weighting with the inverse of the dependent variable, I am weighting by the inverse of the estimated variance.
I can't say I grasp the above advice completely, but even if I accept it and go with it, the no. of deaths is 0 for so many observations, resulting in the weights being NA for those observations, since I can't compute 1/0.
I'm beginning to think I have completely misunderstood the advice I have gotten. Does anyone have an idea about how I should actually calculate the weights for my observations? And possibly also explain the reasoning behind it. I really wish to grasp the logic behind the weight calculation.
Below is an example of my data:
Week | Tests | Deaths |
---|---|---|
30 | 268 | 1 |
31 | 251 | 0 |
32 | 278 | 1 |
33 | 248 | 0 |
34 | 374 | 1 |
35 | 348 | 2 |