1

I am looking to compute the diagonal entries of a projection matrix

$$ P(X) = X (X' X)^{-1} X' $$

where $X$ is a design matrix that contains high dimensional fixed effects, that is, $X = [A ~~ D]$ with $A$ covariates and $D$ sets of fixed effects, $D = [D_1 ~~ D_2 ~~ \cdots~~]$. Generating and operating with the matrix $D$ can be computationally prohibitive, but I know that

$$ P(X) = P(D) + P((I - P(D)) A) $$

It's computationally feasible to obtain $(I - P(D)) A$, and thus the diagonal entries of $P((I - P(D)) A)$, without generating the matrix $D$. However, I don't know how to compute the diagonal entries of $P(D)$ without generating $D$ (unless there is only one set of fixed effects, in which case the diagonal entries of $P(D)$ would be one over the number of rows in each group).

Is there a way to compute the diagonal entries of $P(D)$ without creating the matrix $D$?

Mauricio
  • 111
  • 1
  • Referring to $D$ as "fixed effects" is kind of confusing to me... I think of fixed effect to be a modeling decision, not a property of the measured variables. So, what makes $D$ and $A$ different from each other? Are you saying that $D$ is made up of factor variables? Is there any kind of structure you can leverage to make your problem simpler (e.g., orthogonality)? – Wesley Oct 04 '21 at 22:22
  • $D$ is made up of ts of indicators for various factor variables (i.e. its columns are 0/1 indicators for each group). The computational difference between $A$ and $D$ is that each factor variable may have a large number of groups, so computing $P(D)$ is computationally prohibitive if there are, for example, thousands of columns in each sub-matrix $D_i$ and millions of rows. – Mauricio Oct 04 '21 at 22:40
  • The $i$th diagonal entry is the $i$th element of the projection of the $i$th coordinate vector, so if you can do the projection efficiently you can compute the diagonal entries one at a time without computing the whole projection matrix. And you can compute the projection efficiently, from its description as a linear regression. – Thomas Lumley Oct 04 '21 at 23:08
  • @ThomasLumley Unfortunately this is not true, since there are as many coordinate vectors as there are rows of $X$. For my data this would be millions of rows, meaning I'd have to do the projection millions of times (no longer efficient) or create a $n \times n$ identity matrix (less feasible than creating $D$ in the first place). – Mauricio Oct 04 '21 at 23:23
  • @ThomasLumley It seems that your point assumes there is a very efficient way of computing the projection for the $i$th coordinate vector. I must admit I don't know of one? The way I am computing the projection is computationally feasible but still expensive (each coordinate vector would be a $N \times 1$ vector, with $N$ the number of rows and that is in the millions). Is there a simplification I am missing since the $i$th coordinate vector has $0$s everywhere except the $i$th coordinate? – Mauricio Oct 04 '21 at 23:56
  • If I understand your question correctly, isn't it just $1/n$, with $n$ the number of fixed effects? See https://stats.stackexchange.com/questions/174243/difference-between-fixed-effects-dummies-and-fixed-effects-estimator/174267#174267 – Christoph Hanck Oct 05 '21 at 10:59
  • @ChristophHanck AFAIK this is only true when there is a single set of fixed effects (single factor variable, like de-meaning within individuals in a panel). With multiple fixed effects (e.g. time _and_ individual fixed effects) in the matrix $D$ then this wouldn't be $1/n$. – Mauricio Oct 05 '21 at 11:29
  • Indeed, my link refers to the diagonal elements of the projection matrix corresponding to the fixed effects only. Are you after the diagonal elements when further regressors are also included, then? – Christoph Hanck Oct 05 '21 at 11:56
  • Yes, specifically for the case with more than one set of fixed effects. – Mauricio Oct 05 '21 at 15:18
  • How many columns, roughly, does $A$ have? – Wesley Oct 05 '21 at 16:30
  • @Wesley $A$ would only have a few columns, 2 or 15 for the problem I am currently working on. – Mauricio Oct 05 '21 at 17:14

0 Answers0