I have a datasets that include X_train table, X_test table, Y_train table and then one Y_test table for prediction. In the feature table (X), it has around 800 columns. In the Y table (label tabel), around 200 columns . I read from this paper that it might be possible to reduce dimension from both X and Y table. I used the following code, reduced the size of X table:
from sklearn.decomposition import PCA
pca = PCA(0.8)
pca.fit(X_train_normalized)
PCA(copy=True, iterated_power='auto', n_components=0.8, random_state=42,
svd_solver='auto', tol=0.0, whiten=False)
X = pca.transform(X_train_normalized)
X_test = pca.transform(X_test_normalized)
My question is how can I do the dimension reduction also to the Y table? Should I concatenate X train and Y train table before processing PCA and them split them or there is a better way to do it? If my approach is not correct, please kindly give your comments as well. Thanks you!