I am developing a set of medical diagnostic procedures that will be assessed using binary categorical variables. I want to assess the relative importance of these criteria. So that we can focus our treatments on those that have the highest level of impact on the patients overall health.
Mock Example of Our Data
This is an example of the kind of data we would collect. On a given day, we would evaluate the patient based on a set of binary evaluation criteria (aka "metrics").
Our Interest in Machine Learning
What we want to do is to start to understand correlations and relationships between the metrics so we can prioritize our treatments. The work we do is an advanced form of physical therapy. We tailor our exercise program to the improvements we see in the patient. We experiment with different exercises to find combinations that maximize the total number of metrics the patient has. But I don't think this is the most efficient way to improve patient health because the quantity of metrics they test positive for is not the most important factor. Some of the metrics are clearly more important than others just based on our theoretical understanding and training. But actually finding this in the data has proved hard to do just by looking by eye at tables of 1s and 0s. Computing Pearson correlations is easy but insufficient for identifying patterns systematically. From what I have read, I think a machine learning, algorithmic approach would be substantially more effective at identifying effective treatments.
In what sense is our problem binary
Although we use binary features, this isn't a binary classification problem I think. Health is not a binary category for us. The patient isn't considered healthy unless they test positive on all the metrics we use. So simply saying they are healthy or unhealthy isn't a useful problem to solve because we already have a way to diagnose this.
I think our goal is to use machine learning to help better identify degrees of "health" by clustering criteria that seem to influence each other. At the moment, we are using just binary features. In our work, generally numerical features (i.e. defined over $\Bbb R$) don't work well because its hard to quantify attributes about the patient in numerical terms that are actually useful for predicting treatments. Graded/ordinal metrics also aren't great because it is hard to know how to define the magnitudes of the scale. So binary metrics are often the most useful.
What I'm Looking For
I was thinking of testing out code for machine learning algorithms applicable to binary features. I figure if I find some examples to start with, I can experiment a bit and test out which ones might be most useful for our purposes. But I'm having troubling narrowing down what my options are. Many times when I search binary machine learning I get "binary classification" which I don't think is what I want. Decision trees look plausible, but I'm not sure what kind I should be looking for given how many kinds there are.
Key Properties to Keep in Mind
- Binary Features
- Unsupervised learning
- Features are not independent (in probability sense) and there will be correlations between them.
- I may be looking for something related to "feature selection"
My Question
What are the most common machine learning algorithms applied to binary categorical data?
This maybe be too subjective, in which case I'll delete it if asked.