The answer depends on the type of model.
Ordinary least squares and generalized linear regressions don't need scaling (unless you are challenging the floating-point precision of your computer). If you change the scale of a predictor variable all that happens is that the estimated coefficient has a corresponding change of scale so the end result is the same. Tree-based models (e.g., random forest, boosted trees) use cutoffs within the range of a categorical predictor or selection of one level from a categorical predictor. So there shouldn't be any advantage to scaling with tree-based models. And if you do scale with these approaches and then use your model to predict on new cases, you have to scale the new data in a way that matches your original scaled data. So why scale to start with?
Scaling is important is when the modeling method effectively makes direct comparisons among the predictors in some way. For example, clustering based on multi-dimensional Euclidean distances among cases needs scaling so that a predictor whose scale leads to high numerical values doesn't overwhelm the contributions of predictors whose scales lead to numerically small values. That's also the case for principal component analysis, where you need to start with similar variances among the predictors. It's needed for approaches like ridge or LASSO regression, which put a penalty on the sum of the squares or the sum of the absolute values (respectively) of regression coefficients. Unless all the predictors are on a common scale, you will be differentially penalizing predictors depending on their numerical scales and their corresponding scale-dependent coefficient magnitudes. I believe that is also the case for neural-net methods, although I don't have experience with them.
For these methods that need scaling, however, don't restrict continuous values to a range of [0,1] as you seem to propose. The best way to meet these requirements for each continuous predictor is to calculate the mean and standard deviation, then for each individual value subtract the mean and divide by the standard deviation. This puts each continuous predictor into a scale with mean 0 and standard deviation 1 so that all predictors have the same empirical variance.*
If you do have to scale for one of those latter modeling approaches, the handling of binary or multi-level categorical features is a conundrum. Scaling will give different weights among a set of binary predictors depending on the 0/1 class balance for each predictor. For multi-category predictors, the weighting will differ depending on your choice of a reference predictor unless you take special precautions. See this page for more discussion and further links.
So I'd say don't scale unless your modeling approach requires it. If it does, use your knowledge of the subject matter to help decide whether or how to scale categorical predictors.
*This is sometimes called "normalization" of the data even though the final values need not follow a normal distribution. Not everyone uses terms like "scaling" or "standardizing" or "normalizing" in the same way, so you have to look into just what was done when you read others' work.