I am trying to train a neural network on a soft target problem (note that my neural network has a softmax activation at the end). My labels are in the form $[x_1, x_2, x_3, x_4, x_5, x_6]$ (each $x_i$ is a probability), where $\sum_{i=1}^{i=6} x_i = 1$. The labels are not one hot encoded, so there's no scenario where one of the $x_i$ is a 1.0 and the rest are 0.
Cross-entropy loss for hard targets: $$-\log(x_i) \text{ (where $x_i$ is the probability of the target class)}$$ The negative log probability of the class corresponding to the hard target is exactly zero (minimized) when the predicted probability is 1.0, which clearly shows that cross-entropy accurately quantifies the classification task. So this intuition makes sense as to why cross-entropy is used for hard targets.
For soft targets, I found this answer, which states that cross entropy can also be used for soft targets. However, unlike cross-entropy for hard targets, I cannot rationalize why: $$-\sum_y p(y) \log q(y)$$ Would help the neural network successfully learn to produce softmax scores that align with the soft targets. The intuition that I used to understand hard targets doesn't apply here.
Is there an intuitive explanation for why cross-entropy will help with soft targets as well?