1

I have a dataset with 50 numerical input variables and 1 output column. The output is categorical and ordered. I want to describe the relationship between the numerical variables and the categorial output. There are 3 leves in the categorial output. There are more than 50K observations in the dataset.

I'm thinking about using Ordered Logistic Regression (polr function from MASS R package). But I'm not sure if it works for numerical variables. Most of the implementations I have seen are focused on categorical input variables, not su much on numerical ones.

The goal of the analysis is to measure the correlation between the numerical variables and the output, as well as the amount of noise.

Any help regarding useful algorithms and/or implementations in R are very welcome.

Data format:

+-----------+-----------+-----------+--------+
| Variable1 | Variable2 | Variable3 | Output |
+-----------+-----------+-----------+--------+
|         4 |        27 |        87 | GOOD   |
|         1 |        43 |        56 | BETTER |
|         0 |        67 |         3 | BEST   |
+-----------+-----------+-----------+--------+
Sorlac
  • 111
  • 1
  • 4
  • The only way `polr` (and just about any other regression procedure) can fit categorical explanatory variables is by representing them as *numerical* vectors. You can therefore expect it to work for arbitrary explanatory variables. – whuber May 04 '20 at 20:22

0 Answers0