I'm looking for an advanced linear regression case study illustrating the steps required to model complex, multiple non-linear relationships using GLM or OLS. It is surprisingly difficult to find resources going beyond basic school examples: most of the books I've read won't go further than a log transformation of the response coupled with a BoxCox of one predictor, or a natural spline in the best case. Also all examples I've seen so far approach each data transformation problem in a separate model, often in a single predictor model.
I know what a BoxCox or YeoJohnson transformation is. What I'm looking for is a detailed, real-life case study where the response/relationship are not clear cut. For example, the response is not strictly positive (so you can't use log or BoxCox), the predictors have non-linear relationships between themselves and against the response, and the maximum likelihood data transformations don't seem to imply a standard 0.33 or 0.5 exponent. Also the residual variance is found to be non-constant (it never is), so the response has to be transformed as well and choices will have to be made between a non-standard GLM family regression or a response transformation. The researcher will likely make choices to avoid overfitting the data.
EDIT
So far I gathered the following resources:
- Regression Modeling Strategies, F. Harrell
- Applied Econometric Time Series, W. Enders
- Dynamic linear models with R, G. Petris
- Applied Regression Analysis, D. Kleinbaum
- An Introduction To Statistical Learning, G. James/D. Witten
I only read the last (ISLR) and it is a very good text (a 5 five stars on my watch), although more oriented towards ML than advanced regression modeling.
There is also this good post on CV that presents a challenging regression case.