Suppose I have 150 records with continuous and categorical values.In which only one column has categorical values with three categories namely setosa, versicolor and virginica.
How to calculate categorical distribution for them?
Suppose I have 150 records with continuous and categorical values.In which only one column has categorical values with three categories namely setosa, versicolor and virginica.
How to calculate categorical distribution for them?
The term categorical distribution describes probabilities of observing $k$ exclusive events that for convenience are denoted as numbers $x \in \{1,...,k\}$. A probability mass function assigns probability to each of the events
$$ \Pr(x = i) = p_i $$
with a constraint that $\sum_{i=1}^k p_i = 1$. If you want to calculate the probabilities from data, then you are possibly interested in an empirical distribution. Calculating empirical probabilities is very simple. If the number of times that $x=i$ was observed in the dataset is denoted $n_i$, then
$$ \hat p_i = \frac{n_i}{\sum_{i=1}^k n_i} $$
Notice however that such an estimate would be obviously incorrect if you did not observe some value in your dataset, which is possible in general. In such a case the estimate of probability from your data would be zero (i.e. impossibility). This is called the zero-frequency problem and a number of work-arounds for it are possible. The simplest correction is to add some value $\alpha$ to your counts
$$ \hat p_i = \frac{n_i + \alpha}{(\sum_i n_i) + k\alpha} $$
The common choice for $\alpha$ is $1$, i.e. applying uniform prior based on Laplace's rule of succession, $1/2$ for Krichevsky-Trofimov estimate, or $1/k$ for Schurmann-Grassberger (1996) estimator. Notice, however, that what you do here is apply out-of-data (prior) information in your model, so it gets a subjective, Bayesian flavor. With this approach you have to remember the assumptions you made and take them into consideration.
Such an approach is equivalent to Bayesian estimation using Dirichlet prior (as described in Wikipedia) with equal parameters $\alpha = (1,...,1)$. You can use a Bayesian approach even if there is no zero-frequency problem, but when you want to include some out-of-data information in your statistical model. In this case the maximum a posteriori estimate for $p_i$ is
$$ E(p_i) = \frac{n_i + \alpha_i }{\sum_i n_i + \alpha_i} $$
where the $\alpha_i$ can be interpreted as assumed a priori "pseudocounts" for each event. In this case, the probabilities $p_i$ follow a Dirichlet distribution
$$ p_i \sim \mathrm{Dir}(n_1+\alpha_1,...,n_k+\alpha_k) $$
Of course, if your out-of-data knowledge suggests using some informative, non-uniform, prior you can use different values for the $\alpha_i$.
Schurmann, T., and P. Grassberger. (1996). Entropy estimation of symbol sequences. Chaos, 6, 41-427.