1

I have a dataset with the following types of predictors:

  • binary (e.g., gender),
  • nominal with 3 categories,
  • ordinal, and
  • continuous

Question:

What is the best way to set up a regression model that includes these different types of variable?

Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
Edwin
  • 41
  • 5
  • 2
    Please provide more details. What have you tried, what are you trying to model? Also define best. – mpiktas Mar 28 '11 at 19:35
  • 2
    Edwin: Some more information would help people answer your question. What is the level of measurement of your dependent variable? What are the level(s) of measurement of your independent variable(s). How many of each I and D vars are there? What software are you using? – Brett Mar 28 '11 at 19:35
  • Also include what the actual IVs and DVs _are_. – Phillip Cloud Mar 28 '11 at 19:49
  • Neither binary nor nomianal predictors pose special problem - they all are representable as binary or other contrast forms of variables. The only real challenge is [ordinal predictors](http://stats.stackexchange.com/q/195246/3277) - they are not as easy to process. – ttnphns Jul 24 '16 at 11:58

2 Answers2

1

The lm() procedure in R handles the entire range of linear models, not just multiple regression. All you have to do is make sure your predictors are set up to be of the right type.

Binary is the special case of nominal where the number of levels is two.

Nominal variables must be set to mode factor. They can be coerced to factors from character variables by using factor(). Note that linear models use one of the levels as a baseline, so it effectively disappears. By default this will be the first in your list of levels. If you don't specify the order of the levels they will be put in alphabetic order. You can change the order using relevel().

For ordinal data you need them to be ordered factors. Use ordered() to coerce characters or factors to ordered factors.

For continuous predictors you want the predictor to be a double. Use double() to enforce this.

EdM
  • 57,766
  • 7
  • 66
  • 187
0

As the comments suggest, it is only by fully understanding and specifying your research design that you will establish what regression method best corresponds to your data.

In the case where your DV is a categorical variable, which seems likely if you are dealing with social data, I would recommend reading extensively from Long and Freese to make an informed choice. Long and Freese use Stata, but equivalent commands exist in both R and SPSS.

Fr.
  • 1,343
  • 3
  • 11
  • 22