0

I am trying to run a regression on non-numeric survey data. The question being, does flour type or mixing type lead to a better cake review? Dummy Data is attached and factors are explained below

Flour Type White Flour, Whole Wheat Flour, Gluten-Free Flour

Mixing Type Kitchen-Aid, Hand Mixed, Commercial Mixer

Cake Review Horrible Cake, Bad Cake, Okay Cake, Good Cake, Great Cake

Can you code the factors into numeric data?

noah Hersch
  • 1
  • 1
  • 1
  • 1

2 Answers2

4

This is done by coding the levels of each factor as binary variables (also called dummy or indicator variables). Some software packages do this for you. A linear regression with only categorical dependent variables is called ANOVA.

This website explains pretty clearly what the binary variables look like.

Essentially, you will have a new variable for each level of your factor (e.g. White Flour). The new variable will take the value 1 if that recipe used white flour and 0 if it did not.

Michael Webb
  • 1,936
  • 10
  • 21
  • 1
    "There will always be one fewer dummy variable than the number of levels." - ISLR, chapter 3, page 86 – julian Oct 17 '17 at 15:18
  • I took a stab at coding the variables in sheet 2 but unsure where to go from then https://docs.google.com/spreadsheets/d/1h7SmlceB-JsNrInQ6aKumAbKYMth77yJBlUsqKiU0ho/edit?usp=sharing – noah Hersch Oct 17 '17 at 17:57
  • I would code your 'cake review' variable (dependent variable) as 1, 2, 3, 4, 5 and use a linear regression or a ordinal logistic regression. – Michael Webb Oct 17 '17 at 19:35
0

Yes. For instance, on R, a lm() with categorical explanatory variables is the same as an anova performed directly on those categorical variables because the program dummy codes your categorical variables. It's well explained here

A.

Al3xEP
  • 183
  • 8