If your interest is prediction, then there is seldom a need to select features unless your model is in danger of overfitting. Why just throw away all the information available from a feature by removing it from your prediction model?
Even if your model is in danger of overfitting because of a low ratio of cases to predictors, a good strategy can be to keep all the features while penalizing them in some way to minimize overfitting. Ridge regression, which avoids overfitting by down-weighting regression coefficients, is directly applicable to a multinomial logistic regression model. Methods that learn slowly, like boosted trees, serve a similar function and take advantage of all the available information.
For classification schemes like logistic or multinomial regression, a useful rule of thumb to avoid overfitting without penalization is to have at least 15 or so cases in the smallest class per predictor you are evaluating. With over 900 cases in your smallest class, you probably have no need for predictor selection or penalization at all unless you have more than about 60 predictors total (including levels of categorical predictors above the first, and interaction terms).
In situations in which you need to cut down on features rather than penalize them, it's best to use knowledge of the subject matter to pre-select features or to combine multiple related features into a single feature before looking at outcomes. Frank Harrell's class notes and book provide a wealth of information on such strategies. Of the feature-selection approaches noted in the question, Harrell does say (page 4-48, class notes):
Do limited backwards step-down variable selection if parsimony is more important than accuracy. But confidence limits, etc., must account for variable selection (e.g., bootstrap).
So in that context backward elimination is the least objectionable, as you are taking into account all the available information with a more comprehensive model before you decide to start throwing information away. But it's often best to avoid outcome-driven feature selection in the first place.