In one of the Machine Learning lectures where the topic was Entropy and information, the professor explained that $Surprisal = s_y(c) = -log p(y=c)$ where $p(y=c)$ is the probability of a class given a dataset where y may belong to different classes. Also, $Entropy= E[s_y(c)]$. He went on to state that $s_y(c)$ is the number of bits needed to efficiently encode a class.
What does "number of bits to efficiently encode a class" exactly mean?