Interpreting exercise in Elements of Statistical Learning

Question

I am reading exercise 6.4 from The Elements of Statistical Learning (Hastie, Tibshirani and Friedman) and I am having difficulty interpreting exactly what is being asked in the following question

Ex. 6.4 Suppose that the $p$ predictors $X$ arise from sampling relatively smooth analog curves at $p$ uniformly spaced abscissa values. Denote by $Cov(X|Y) = Σ$ the conditional covariance matrix of the predictors, and assume this does not change much with $Y$ . Discuss the nature of Mahalanobis choice $A = Σ ^{−1}$ for the metric in (6.14). How does this compare with $A = I$? How might you construct a kernel A that (a) downweights high-frequency components in the distance metric; (b) ignores them completely?

Note that (6.14) is the kernel given by $$K_{\lambda , A}(x_0, x) = D(\frac{(x-x_0)^TA(x-x_0)}{\lambda})$$

To me it sounds like there are some number of periodic curves and each observation $x_i$ is a sample from one of them, in which case $Y$ would be a categorical variable indicating which curve a given sample has been drawn from. Then $\Sigma_j$ would be the covariance matrix of all of the observations belonging to curve $j$. I don't think this makes sense however since then we would have a different weight matrix $A$ for each of the analog curves.

I suspect this answer will be useful in solving the problem, but I still can't quite get a concrete interpretation of exactly what is being asked.

How exactly should this question be interpreted?

please see a solution 6.4 of ESL here https://yuhangzhou88.github.io/ESL_Solution/ESL-Solution/6-Kernel-Smoothing-Methods/ex6-04/ — Yuhang, Apr 17 '21 at 22:02
Hi @Yuhang , I am not clear on exactly how this answers the question. By "we can decrease $\rho(x_i, x_j)$" I assume you mean manually downweight? If my interpretation of the question above is correct, then downweighting based on correlation would not necessarily downweight high frequency components? Additionally, this solution assumes that we manually weight each observation based on its frequency which I doubt is what the authors intend? — Seraf Fej, Apr 18 '21 at 11:10

Interpreting exercise in Elements of Statistical Learning

0 Answers0