I have data like this:
group length
1 5
1 5
1 2
1 3
1 5
1 5
1 3
1 2
1 5
1 3
2 3
2 3
2 3
2 3
2 5
2 2
2 5
2 3
2 3
2 3
I would like to get the probability of length
being each of the values length
takes on (2, 3, 5
) separately for group
. I would like to get this with regression. Transformations of the data are fine if required. I am using Stata
right now, but any explanation/pseudo-code is greatly appreciated.
To illustrate what I mean, here is how I would do this manually:
*1. Transform the data to get counts by length
for each group
, calculate total, and calculate probabilities:
group length_2_N length_3_N length_5_N Total prob_2 prob_3 prob_5
1 2 3 5 10 .2 .3 .5
2 1 7 2 10 .1 .7 .2
What I want is to get the .2, .3, .5
and .1, .7, .2
from a regression. It is fine if I need to split the data by group
and run two regressions. Any hints?
I think that I basically am wanting to get P(length = x) = $\alpha$
, where x = {2,3,5}
(for each group
). Additionally it would be useful to estimate P(length = x) = $\alpha$ + $\beta$ group
.