0

In Andrew Ng's course, I see RNN loss being calculated as a sum of the losses from each time step as seen here:

enter image description here

In Stanford's CS224N, I see loss calculated as an average of individual losses as seen here: enter image description here

Why are there two different approaches? Which one is preferred?

Hank
  • 1
  • 1

0 Answers0