For those initial factor predictors, it is arguable whether those insignificant levels should be "merged". Note that your approach seems to be simply dropping those insignificant levels, this approach is incorrect.
For a significant factor predictors (at least one level is significant), the number of its significant levels depends on which level is chosen as the base level. Because the estimate of the level is the difference between that level and the base level.
For example, a significant factor has 4 levels A, B, C, D.
If we choose level A as base level, we get result as below (only level D is significant)
B .
C .
D ****
However, when we choose level D as base level, we will find that all the levels are significant.
A ****
B ****
C ****
Because level A, B, C are similar while the level D is different from them.
As a result, simply dropping insignificant levels does not make sense. Lots of researcher think we should include all the levels as long as one of them is significant. The programmer of R is one of these. And this approach is simple.
Some researcher of other school think we can "merge" those insignificant levels to reduce the number of parameters. But this idea needs a more sophisticated approach to test all the potential combination of merging levels step by step.
For the example above, we can try to merge AB, AC and BC first and get three new models and find the best one (let we say AB, then we get AB, C, D). Then we can try to merge AB and C together and test it, because we should only drop one parameters and test it for each step.
For those significant levels, we should also try to merge them for the reason mentioned first.
So if we follow this school of researchers, the working load increase a lot (because we have to try all the combination pairs of the levels step by step, both significant levels and insignificant levels)
For those initial continuous variable, we may split them as factor/group, but this approach also need a sophisticated testing. We should firstly consider that numeric variable as a one-level factor, then try to split it into 2-level factor with all different splitting points and choose the best point. Then try different points to split one of the levels into two levels again. This idea is similar to the CART (classification and regression tree), which also splits numeric into discrete group/node, in order to model the non-linear effect.
Besides, we can use spline and so on to model the non-linear effect, which may be easier in some cases than splitting into factors.