According to these notes from Penn State on MLE and notes from Stanford CS229 on logistic regression, training data is assumed to be IID.
The Penn State notes say "...which implies by definition that the X are independent." and the Stanford notes state that "Assuming that the m training examples were generated independently, we can then write down the likelihood of the parameters as..." However, according to this Cross Validated answer the observations do not need to be IID.
Isn't X a random variable and in order to take the product for the joint probability, don't the observations need to be independent? Also is there a difference in saying that the labels (y's) are independent versus the observations (x's) are independent?
Could someone explain? What am I misunderstanding?