If my input is numeric and my output is continuous I can use linear or nonlinear models. I can split the inputs by factors if an input is a factor.
If my input is numeric and my output is boolean I can use a glm.
How do I approach if my input is several columns of factors and my output is a list of factors?
input:
- "in1" is "factor1a" ... "factor1k",
- "in2" is "factor2b" ... "factor2m"
- ...
output: "a", "b" ...
I can make a table of conditional rates based on sample sizes, but it seems to me that there should be something more "text book" in approach.
When I say "factor" it is all factors. All inputs are factors and all outputs are factors. For output "k" only 3-4 (for example) of the inputs might lead to that output. When I say "conditional rates" I mean "given desired output a" the two recipes are "recipe_one" and "recipe_two" in a mix of n/m. There are particular combinations of input factors that make output factors. So "k" is made up of combinations of either "a", "b" and "j" or "a", "j" and "r" or "a", "m" and "p".
In common between these is "a". I might make a tree (or hierarchical) model to describe frequencies of ingredients in "k". By nested in that sense I would mean that "a" is common to all recipes, and "j" is common to two. I think of it in this way because I would typically make nested for loops to do my counting. (Yes I am still working on improving my use of "apply".)
I want to understand the "ingredients" to get a particular output, but it seems that for some of them, they are a mixture of different recipes of factor inputs. My goal is to maximize one of the outputs and minimize the others. To do that I need to control the inputs.
Please let me know if I am using the words improperly.