I am working on a project with a set of data where people have a certain quality of interest, say "red hair." My goal is to estimate the probability that an individual has red hair. Normally, we'd use some model (logistic regression, random forest, etc.) that would use the data to regress on the different covariates.
My data has two values: 1 and NA. A value of 1 indicates that that individual does have red hair. The NA, however, only indicates that we do not have data - it does not necessarily mean that that individual does not have red hair. (If it makes a difference, an NA value indicates it is likely that the individual does not have red hair since red hair is a relatively rare trait, but otherwise we don't have any information.) I also have standard demographic information (sex, age, etc.) available as covariates that I'd optimally use to predict red-headedness.
Understanding that most (all?) models aren't equipped to handle independent variables with no variability, I wanted to ask a question: How would you try to estimate the probability that a given individual has red hair with the data mentioned above?
Any thoughts, discussions, articles, or creative approaches are welcome!