In Andrew Ng's course, I see RNN loss being calculated as a sum of the losses from each time step as seen here:
In Stanford's CS224N, I see loss calculated as an average of individual losses as seen here:
Why are there two different approaches? Which one is preferred?