2

I have read that least angle regression is good for high dimensional data. I didn't actually understand the meaning of high dimensional data, so does this mean $p>>n$ case?

And does anyone know any good dataset with such properties on which we can compare the performance of lars with say least squares?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Saurabh7
  • 123
  • 4

1 Answers1

4

"High dimensional" means there are many predictor / explanatory variables. It does not have to mean that there are more variables than observations. The latter is an additional problem, but with a sufficient number of variables, there can be concerns for the analysis and/or model building process even if there are lots of observations. For example, models with lots of variables can be harder to interpret, and multicollinearity is more likely, even if pairwise correlations are relatively low. It takes surprisingly few dimensions for some of the paradoxical effects of high-dimensionality (e.g., many points becoming about equally 'close') to start showing up; see this excellent CV thread: Why is Euclidean distance not a good metric in high dimensions?

LARS is just a generalization of the LASSO, where you progressively expand / contract $\lambda$, the regularization parameter. This might be a reasonable choice any time you don't know in advance what value is appropriate for $\lambda$. LARS isn't the only option when you don't know what $\lambda$ to use, however. You could also search over possible $\lambda$ values via cross-validation.

Certainly in the case where $p>>n$ you will need to do something other than straight OLS regression with all of your available variables, because OLS will fail in that situation. You don't have to use a LASSO variant (e.g., LARS), however. For instance, you could run a Principal Components Analysis (PCA) and extract $k$ PCs, where $k<<n$, and then use OLS. Which of these (or other) options is best will depend on the situation and your goals.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650