0

I'd like to modify the answer to this question to allow weighted observations. I think all I need to do is weight the inputs X and Y.

X = w * X
Y = w * Y

the other parts of the procedure should follow. Please correct me if I am wrong.

Candy Chiu
  • 103
  • 4
  • I don't think you want to use matrix multiplication! Provided `X` and `Y` use their *first* indexes to represent the observations and `w` is a vector corresponding to the observations, you need to use ordinary (componentwise) multiplication. – whuber Aug 13 '14 at 16:00
  • @whuber How about the idea itself? – Candy Chiu Aug 13 '14 at 16:14
  • I think it's on the right track but it depends on what the weights mean. Certainly it's the case that the rows of $X$ and $Y$ should be multiplied by the same monotonically increasing function $f$ of $w$, because that would cause the corresponding terms in the quadratic objective function to be multiplied by $f(w)^2$. – whuber Aug 13 '14 at 16:16
  • Yes, I am applying the same weight function to X and Y. Would you explain why the monotonically increasing constraint is required? Currently I am using discrete factors as weights, for example, first 10 rows use 2.0, second 10 rows use 1.5, etc. – Candy Chiu Aug 13 '14 at 16:26
  • If $f$ were not monotonically increasing it would make no sense to call the $w_i$ "weights" because higher values would have *less,* rather than more, influence in the solution. The issue concerns what these weights *mean*. For instance, if they measure precision (as a reciprocal standard deviation) then they should just multiply the data, but if they are sample weights (such as inverse probabilities) then their *square roots* should multiply the data. That is why your question can be answered only very generally and provisionally. – whuber Aug 13 '14 at 17:36
  • @whuber I reviewed some literature, and read through the comments again to understand your point. In my case, the weight is a measure of importance of a data point. If weights are supposed to be used to fulfill the OLS assumption, how can I use them to assign significance? – Candy Chiu Jan 16 '15 at 19:48
  • You would have to stipulate what you mean by "importance." We all understand that such weights would be positive and increase with greater "importance," but that would not differentiate between using the weights as given or, say, their squares or cubes. How are we to know exactly which numbers to attach to the various levels of importance for the purpose of fitting a model? That is, exactly *how much* should "importance" influence the results? You somehow need to supply answers to those questions. – whuber Jan 16 '15 at 20:11
  • @whuber Weights are sample weights, so yes, I am using the square roots. The residuals are skewed by the weights, making them unusable for diagnostic. – Candy Chiu Jan 16 '15 at 22:19
  • One ordinarily wouldn't take the square root of a sample weight. If the weights are variances, you would be using the reciprocals of their square roots. – whuber Jan 16 '15 at 22:21
  • @whuber Objective = sum w*(y'-y)^2. For this set of data, variance is not a problem. In this case, how do I use the residuals for diagnostics? – Candy Chiu Jan 20 '15 at 13:22

1 Answers1

0

Weights per the theory of Generalized Least-Square should be constructed as relating to an empirically estimated or theoretical based model of the distribution of error terms.

Most often applied when the absolute magnitude of the error terms are assumed, for example, to be related to the size of the predictor (also called independent) variable. In such a case, transforming both Y and X by dividing by the square root of X will make the error term distribution 'homogeneous', which is an underlying assumption of standard least squares theory. Note, all predictor variables are divided by the square root of the size variable(Xi) so the intercept term, which has always a value of 1, is now not constant, but has a value for the ith term of 1/sqrt(Xi). As a result, the transformed model no longer has an intercept term.

Constrainted least-squares can be mechanically performed by solving the min of the sum of squares of the actual minus fitted subject to a linear constraint (technique of lagrange multipliers). The nature of statistical estimation of the standard deviation of such computed regression coefficients is complex, perhaps Baynesian regression theory or just run your own simulations.

EDIT: In practice I have used Box-Cox Analysis of Transformations as a guide to suggest the proper transformation to employ. The latter is best applied by a random sampling of the dataset, and the gathering of a range of suggested transformations. Linking the transformation to the suspected mechanism/error distribution linked to the nature of the data, for example, survival data, is also advised.

AJKOER
  • 1,800
  • 1
  • 9
  • 9
  • You seem not to acknowledge that weights can represent other information besides "the distribution of error terms." For instance, they can be probability weights for sampling or they can reflect replicate results. The distinction is critical because which numerical values to use in the estimation procedure will differ. – whuber Aug 13 '14 at 17:33
  • In your example, one could replace the observations with their average value on replication. Your estimate of the variance of each set of replicatons is based on the original observed variances within that particular replication. Nothing new theoretically. – AJKOER Aug 13 '14 at 17:57
  • No, there's nothing new. But blind adherence to your solution would be incorrect in many circumstances (where it does not even apply, because the weights would not be related to the error variances). – whuber Aug 13 '14 at 18:24
  • If one is using probability sampling, Sampling theory will provide an estimate of the parent populations mean and, more importantly, a variance estimate which one can insert into the weighting. The advantage of applying GLS theory is you have precision estimates. Baynesian regression can incorporate prior beliefs, but again, a variance estimate on your beliefs is required. – AJKOER Aug 13 '14 at 18:56
  • Note, if the data is noisy, look to applying a transformation (like log or best, what is indicated by the Box-Cox technique) to make the variability more homogeneous and normally distributed. For regression analysis, weighting is secondary to transforming the data. – AJKOER Aug 13 '14 at 20:50