How to construct a Probability Mass Function from a sample of data

Question

In decision tree learning, specifically when calculating Gini impurity, I understand that probabilities are assigned to a class label based on the node label proportions, such as, child_node = [0,0,0,0,1,1,1,1,1,2] with $P(X = 0) = 0.4$, $P(X = 1) = 0.5$ and $P(X = 2) = 0.1$.

But I have the following questions about how a probability mass function (PMF) would be constructed according to the definition in the link. If the sample space for child_node is $\Omega = \{0,1,2\}$. What measurable function should be used to map the values from $\Omega$ to a measurable space? What would this measurable space look like?

Lastly, I'm unsure exactly how the probability P(X = x) can be determined using $P\bigl(\{\omega\in\Omega : X(\omega) = x\}\bigr)$. What does $\{\omega\in\Omega : X(\omega) = x\}\ $ actually mean? I understand this to mean something like.

>>> omega = {0,1,2}
>>> x = 3
>>> {om for om in omega if om in range(x)}

{0,1,2}

which is just the same as $\Omega$? Presumabley I am wrong here?

The measurable space is $\mathbb R$ with its Borel sets. The notation $\{\omega\in\Omega\mid P(\omega)\}$ is universally understood in mathematics to refer to the subset of $\Omega$ consisting of all elements for which the logical predicate $P$ is true. Its existence is guaranteed by an axiom of set theory. It sounds like you might be foundering on the concept of random variable. See https://stats.stackexchange.com/questions/50 and https://stats.stackexchange.com/questions/199280 or [search our site](https://stats.stackexchange.com/search?tab=votes&q=%22random%20variable%22%20definition). — whuber, Aug 30 '19 at 20:17
Thanks @Whuber, the suggested links were really helpful. Your explanation is just what I needed. So in this example, $X(\omega)$ could just do nothing? If so, when $\omega = 0$, does $P\bigl(\{\omega\in\Omega : X(\omega) = x\}\bigr)$ just mean $P(\{0\})$? And based on the data, $P(\{0\}) = 0.4$? — Josmoor98, Aug 31 '19 at 16:36
Yes, that's right. You might feel a little more secure in your reasoning by viewing the class labels as strings "0", "1", and "2" rather than numbers. Indeed, the concept of random variable isn't needed at all here: in a box with four copies of "0", five copies of "1", and one copy of "2", the empirical probability function is $\Pr(\{''0''\})=4/10,$ $\Pr(\{''1''\})=5/10,$ and $\Pr(\{''2''\})=1/10.$ No maps to measurable spaces are involved: the empirical probability merely reports the *proportions* observed in the data. — whuber, Sep 01 '19 at 13:29
Yes that does help. Thank you for your help. This clears up the confusion — Josmoor98, Sep 02 '19 at 09:43

How to construct a Probability Mass Function from a sample of data

0 Answers0