Solution: for some reason, I had forgotten that the non-linear activation function is applied at every layer of the neural network, not just at the output layer. Hopefully to others reading my original question below will understand why I asked it. Thank you for the answers, though.
Original: Suppose I have a multilayer perceptron network of a couple of layers with some output nodes that will be subject to the classic sigmoid activation function - how will this change which output node will have the highest value for a given input vector (and is selected as the final classification)? Namely, (denoting the sigmoid function as f(x)) if x' > x, f(x') > x, meaning the same output node will be selected as the final classification.
I think I am missing something about its importance in gradient descent or the determination of loss but please clarify this for me if you know my thinking error.