3

Many machine learning approaches use one-hot vectors to represent categorical data. This is sometimes called using indicator features, indicator vectors, regular categorical encoding, dummy coding, or one-hot encoding (among other names).

I'm searching for a compact way to denote a one-hot vector within a model.

Say we have a categorical variable with $m$ categories. First, apply some arbitrary sorting to the categories. A one-hot vector $v$ is then a binary vector of length $m$ where only a single entry can be one, all others must be zero. We set the $i^\text{th}$ entry to 1, and all others to 0, to indicate that this $v$ represents the categorical variable taking on the $i^\text{th}$ possible value.

One clunky attempt based on misguided set notation;

$$ v \in \{0, 1\}^m \qquad\qquad \sum_{i=1}^m v_i = 1 $$

I've also seen math-oriented people refer to a one-hot vector using the notation

$$ \mathbf{e}_i $$

But I don't understand where this notation comes from or what it is called.

Can anyone help me out? Is there a paper that does a good job of this?

Thank you,

Zen
  • 21,786
  • 3
  • 72
  • 114
aaronsnoswell
  • 175
  • 1
  • 6

2 Answers2

4

There are several ways to note dummy variables (or one-hot encoded), one of them is the indicator function :

$$ \mathbb{1}_A(x) := \begin{cases} 1 &\text{if } x \in A, \\ 0 &\text{if } x \notin A. \end{cases} $$

For $e_i$ it is a vector of the standard base, where $e_i$ denotes the vector with a $1$ in the $i$ ith coordinate and $0$'s elsewhere. For example, in $\mathbb{R}^5$, $e_3 = (0, 0, 1, 0, 0)$

Fisher
  • 151
  • 5
0

Found some relevant threads to your question. Hope this helps.

"Dummy variable" versus "indicator variable" for nominal/categorical data

What is "one-hot" encoding called in scientific literature?

abi
  • 51
  • 4