2

There is mse function: C = $\frac{1}{2n}$ * $\sum(length(y - a)^2)$

why not just use C = $\sum(length(y - a))$ ?

(where "length" is the vector's length, "y" - ideal network's output, "a" - current network output)

Sycorax
  • 76,417
  • 20
  • 189
  • 313
Dmytro Nalyvaiko
  • 577
  • 2
  • 6
  • 12

2 Answers2

1

You're talking about L1 norm and L2 norm. Both work for neural networks. However, they are different:

Without more information, I can't comment on how L2 norm is better (or worse) for your problem.

SmallChess
  • 6,764
  • 4
  • 27
  • 48
-1

Short answer: both can be used.

Longer answer: both measures are in active use. The first measure is based on the Euclidean distance, the second one on the taxi-cab distance. Or more formally: the $L_2$ distance and the $L_1$ distance.
Which is better depends on the context. Intuitively: the Euclidean distance prefers many small/medium errors over a few big errors while the taxi-cab distance is more forgiving when it comes to a few large errors. Which one is preferable depends on the context and what you are trying to achieve.

dimpol
  • 882
  • 4
  • 13
  • you are talking about length function, but I'm asking why result of length function should be powerd by 2? – Dmytro Nalyvaiko Mar 09 '17 at 11:59
  • That could be because you want to 'punish' big errors in certain training cases over the same error spread over multiple training cases. Suppose over 2 training cases one algorithm has an error of 0 and 3 respectively and another algorithm has an error of 2 for both training cases. Squaring the errors would make the second algorithm preferable, not squaring would have the first algorithm as better. The right choice depends on the context – dimpol Mar 09 '17 at 12:12
  • > Squaring the errors would make the second algorithm preferable 0^2 + 3^2 = 9; 2^2 + 2^2 = 8; why second algorithm is preferable? – Dmytro Nalyvaiko Mar 09 '17 at 13:43
  • An error-score of 8 is lower than an error-score of 9, lower error-score is preferable. – dimpol Mar 10 '17 at 22:20