output is a factor ... how do I model it

Question

If my input is numeric and my output is continuous I can use linear or nonlinear models. I can split the inputs by factors if an input is a factor.

If my input is numeric and my output is boolean I can use a glm.

How do I approach if my input is several columns of factors and my output is a list of factors?

input:

"in1" is "factor1a" ... "factor1k",
"in2" is "factor2b" ... "factor2m"
...

output: "a", "b" ...

I can make a table of conditional rates based on sample sizes, but it seems to me that there should be something more "text book" in approach.

When I say "factor" it is all factors. All inputs are factors and all outputs are factors. For output "k" only 3-4 (for example) of the inputs might lead to that output. When I say "conditional rates" I mean "given desired output a" the two recipes are "recipe_one" and "recipe_two" in a mix of n/m. There are particular combinations of input factors that make output factors. So "k" is made up of combinations of either "a", "b" and "j" or "a", "j" and "r" or "a", "m" and "p".

In common between these is "a". I might make a tree (or hierarchical) model to describe frequencies of ingredients in "k". By nested in that sense I would mean that "a" is common to all recipes, and "j" is common to two. I think of it in this way because I would typically make nested for loops to do my counting. (Yes I am still working on improving my use of "apply".)

I want to understand the "ingredients" to get a particular output, but it seems that for some of them, they are a mixture of different recipes of factor inputs. My goal is to maximize one of the outputs and minimize the others. To do that I need to control the inputs.

Please let me know if I am using the words improperly.

Could you explain the connection between "conditional rates" and the "factor" nature of your response? For modeling such things, the details of your response variable matter, such as whether it has a natural ordering, whether it consists of counts, and so on. Please provide some relevant information. — whuber, Feb 12 '16 at 14:05
The very similar (but more specific) question at http://stats.stackexchange.com/questions/195246 is getting some constructive attention--perhaps that answers your question? — whuber, Feb 12 '16 at 15:14
@whuber - it isn't speaking to nested relationships. Is there a general term for this breed of analysis? — EngrStudent, Feb 12 '16 at 15:32
Unfortunately it is not clear... Do you mean hierarchical models? — Tim, Feb 12 '16 at 15:50
@Tim - I might be. My question is "what is the textbook approach to the problem". You are asking "are you using the 'hierarchical model' textbook approach". I'm not sure. I need to know what the textbook approaches are and how to specify them before I can say yes or no. Thank you for the word, I'm looking for its use in articles. It seems to be synonymous with "multi-level model" and I am looking at those. — EngrStudent, Feb 12 '16 at 16:23
What you mean by "nested" is murky. I'm having difficulty understanding the description in the question, because it uses various vague (and mostly undefined) terms like "output," "recipe," "mix," "combinations," and "ingredients." — whuber, Feb 12 '16 at 16:50

score 1 · Answer 1 · edited May 23 '17 at 12:39

The phrase you were looking for is "hierarchical multinominal logistic" analysis. Though you handle primarily numeric data in your engineering role, one of your favorite tools handles this very well: the CART model. The classification part of classification and regression tree has this built in.

Here are some relevant links to go look at and learn from:

output is a factor ... how do I model it

1 Answers1