For example, there are self-driving car companies that take the approach of training several neural nets to recognize sign posts, pedestrians, cars, road marks etc. and then compose those intermediate neural nets together.
On the other hand, there are other self driving car companies that focus on an end-to-end approach - taking the raw image data and mapping it to a particular action (go left, right, speed etc.).
How do you know when to train against several intermediate variables and composing them together vs training directly against the final target variable?
What are the advantages of each approach?