I am using a tree-based method (specifically, random forest) to model the quality of sunsets based on weather measurements. One feature available is the height of the clouds. When there are no clouds the data is set to 99999. It's my impression that keeping the values at 99999 (or setting them to 0 or -999) will bias the predictions, as a tree will consider the 99999 real physical values when they should really be effectively ignored. I've considered adding a dummy variable to indicate whether there are clouds or not, but if I want to include cloud height, which I think could be relevant to the quality of sunsets, I feel like I'll need to do something with the 99999s. Is there an accepted way of handling this type of intentionally missing data with tree-based methods?
I've found a few questions related to this issue, but none have a solution to my problem:
Dummy variable method for missing data in ML/predictive models
How to deal with intentionally missing data
How should I define missing values due to skip questions in SPSS?