1

Assume that I want to predict a response with 3 classes. I have two features $X_1$ and $X_2$ where $X_1$ is continuous and $X_2$ is categorical with 5 categories. What would be the number of parameters in the case we are using softmax parametrization?

I was thinking, we would have the bias, $X_1$ and we would split $X_2$ into 5 different variables. Would this be possible? Or could we just keep $X_2$ as a categorical predictor. I'm a bit confused as to how to handle the categorical predictor in this case.

zer0square
  • 13
  • 2

1 Answers1

2

Your thinking is right: you would need to split up the categorical variable into categories. But you don't need 5—just 4.

The reason is that you only have 4 degrees of freedom. If it's not class 1, not class 2, not class 3, and not class 4, then it must be class 5. (You might wonder where the weight for that class goes, if there's no parameter. In a way, it's squeezed into the bias and the other four parameters.)

The features of your model are then:

  • $f_1(\vec{x}, y) \mapsto x_1 \,\times y$
  • $f_2(\vec{x}, y) \mapsto [(x_2 = 1) \land y]$
  • $f_3(\vec{x}, y) \mapsto [(x_2 = 2) \land y]$
  • $f_4(\vec{x}, y) \mapsto [(x_2 = 3) \land y]$
  • $f_5(\vec{x}, y) \mapsto [(x_2 = 4) \land y]$

Here, I'm using the Iverson bracket notation.

Altogether, you have six parameters: the bias term (1), the weight for your continuous feature (1), and the weights for your categorical feature (4).

Arya McCarthy
  • 6,390
  • 1
  • 16
  • 47