If you have an experiment that is powered to show an overall effect of a categorical variable, it will not necessarily be powered to show something for each individual category. Thus, it is perhaps not surprising that some categories will not be associated with an adjusted or unadjusted p-value >0.05. As a result I would certainly not see the p-values by themselves as an issue for interpreting whether there is an effect or not. The effect sizes for each category (presumably versus some reference category) may however indicate whether the order of the categories does translate into a monotone effect on the outcome, or whether the observed effects deviate so much from a monotone effect that such an assumption is questionable (or no longer credible). That would not necessarily mean that one would question whether the variable somehow matters, just in what functional form it matters.
For prediction tasks the significance or non-significance of variables (or of levels within a variable) is not really a good criterion for deciding for whether to include it. The main risk with including too many levels is overfitting (too many levels and too little data). Model averaging between models $i=1,\ldots,M$ within various levels (from none to all) with weights $w_i \propto \exp(-0.5 \text{AIC}_i)$ (or AICc) included would likely be a better idea.