Python's scikit-learn
package has a convenient pipe
function that can combine machine learning techniques into one model with fitting and predicting functions. I was following this tutorial for chaining PCA and logistic regression and everything works as expected but I am having trouble describing this in terms of math notation because I am unsure what is really going on.
I know PCA algorithm is as follows:
- Normalize/standardize data matrix: $Y=HX$
- Calculate covariance matrix: $S=\frac{1}{n-1} Y^T Y$
- Eigenvalue Decomposition: $S = Z \Lambda Z^{-1}$
- Find PC Scores of original data: $T_L=YZ_L$
Where the transformation matrix is $Z_L$, and $L$ is the number of principal components used. Therefore, we can compress the original data using $T_L = Y Z_L$, where $T_L$ still has the same number of rows but only $L$ columns, thereby resulting in a reduced dataset.
How do I then combine the resulting $T_L$ matrix which is my transformed data with a multinomial ($>2$ classes) logistic regression?
I know multinomial logistic regression model is something like: Suppose the response variables has $K$ levels in the space $G=\{0,1,2,...,K\}$, representing the set of possible classes. The probability of determining a particular class is defined as,
$$Pr(G=k | X=x) = \frac{e^{\beta_{0k}+\beta_k^Tx}}{\sum_{l=1}^K e^{\beta_{0k}+\beta_k^Tx}}$$
But I do not know how to relate the two in a logical manner. (Or how the cost function should be defined.)