1

I'm working with a numeric dataset where I have some data (say $X$) and I need to predict a two variable target (say $A$ and $B$). Additionally these two variables are higly correlated ($cor(A,B) > 0.95$). Initially I thought I could make two prediction : one to predict $A$ using $X$ and $B$ , and one to predict $B$ using $X$ and $A$. What I'm asking is if my idea was good (minding the correlation) , alternatively I'll deal with multi target regression that is a new argument for me.

ab94
  • 111
  • 1

1 Answers1

1

First of all, think of the following question. In practice, how will you execute your model to predict new observations ?

  • Predict $A$ and use the resulting prediction $\hat{A}$ it to predict $B$ (or the other way round) ?
  • Predict $A$ and $B$ simultaneously ?

In the first case, keep in mind that $\hat{A}$ is predicted using your other features. Let them be $(X_1, X_2, ..., X_n)$. Actually, $\hat{A} = f(X_1, X_2, ..., X_n)$, so using it to predict $B$ is definitely possible. It is in fact doing some feature engineering, albeit using a machine learning technique, and you can use it both for $A$ and $B$.

But keep in mind though, that you will be using $\hat{A}$, not $A$, to predict $B$. Hence, it becomes more advisable to predict $\hat{A}$ in your training dataset rather than $A$ and use it during training.

In the second case, you will simply not be able to use $A$ or $B$. You should definitively consider using multi target regression. I suggest you have a look on this post.

AshOfFire
  • 550
  • 3
  • 10