I've been surveying the different methods of approaching linear multivariate problems (ex PCA, PLS, factor analysis etc.) and want generate a model for Y's that depend non-linearly on $X$'s via linearizations of the $X$'s. But I have not found much with regards to the process of linearizing variables such that I can use one of these linear models. Thus at the moment I am fairly blind to any pitfalls from doing this naively. Two specifics come to mind:
(1) It seems that standardizing variables is common but it is not clear to me how to do it if I presume a nonlinear relation (it may be just that I don't understand why we standardize). Say that I presume $Y = a X^2$, where I want to determine $a$, I could standardize $X$ (and $Y$?), then linearize, and then fit. or, I could linearize via $U = X^2$, then standardize $U$ (and $Y$?), then fit. Intuitively, these operations don't commute so I would expect two fits, but there is only one value of $a$ by definition, how can I resolve this conflict?
(2) I have many $X$'s and I presume that some $X$'s will have a nonlinear relation with $Y$. So far, I have computed different transforms of my $X$'s ($\log(X)$, $X^2$, etc.) and allowed the fitting routine to pick out which ever it wants (say via step-wise regression). However, I have little intuition as to how to regard $X$ and it's transformations; should I allow the fitting routine to pick one 'version' $X$ and preclude having a model that looks like $Y = a_1 \log(X_1) + a_2 X_1^2 + a_3 X_1^{-1} + a_4 \log(X_2) + a_5 X_2^2 + a_6 X_2^{-1}$, and instead enforce that there can only be one term with $X_1$ and one term with $X_2$?
On the one hand, these different transforms are highly correlated (though non-linearly) with $X_i$ by construction and now I'm thinking about maybe having a co-linearity problem. At the same time, polynomial solutions are fairly standard even though $X^n$ and $X^m$ are correlated just the same for $n\neq m$.
I would note that I have tried looking for "nonlinear multivariate methods" but the only sources I have found are (at the moment) way above my head.
Thank you for any guidance!
In response to your second comment (I was character limited in the comment):
The bare-bones problem is this: I have one measurement ($Y$) that I want to model, and I want to use several other measurements (the different $X_i$'s) to model $Y$.
Because I have several $X_i$, I understood this to be a multivariate problem. The simplest model I could do is something like $Y=b+\sum_i a_i X_i$. However, I don't want to presume linearity between $Y$ and each $X_i$, so I want to explore linearity between $Y$ and different transforms of $X_i$ (ie $X_i^2$, $\log(X_i)$, etc).
On the one hand, I increase the number of variables by the number of transforms I want to consider, so computationally this gets expensive, so there is an incentive to not over-reach. At the same time, I know to consider screening $X_i$ variables that are strongly correlated with each other, which has the computational benefit of reducing how many variables I use to make the model. I just don't understand if I should screen within a family of $X_i$ transforms in some way. If not, then it seems like I would treat each transform of $X_i$ as a new $X_i$ itself. But then I worry that if I build a screening protocol, I would reject transforms of $X_i$ since by construction they are correlated. Sorry if this isn't any clearer.