Usually, with a continuous dependent variable, we can apply linear regression and then predict values based on new data.
For instance, defaults on loans: let's say we know an individual will default on his loan, and we want to estimate how long it takes him to default (1 year, 2 years, 3 years... after he took the loan).
With linear regression, we can predict for a new individual that, based on his characteristics, he will default after X years.
But what I'm looking for is a model which will give me probabilities for each of the values.
Here, it would be: for a new individual that we know is going to default, what is the probability he will default after 1 year vs the probability he will default after 2 years...
One possibility would be to consider that the dependent variable is categorical, and regress a logit / probit model to get probabilities.
But 1) there is some loss of information. Multinomial logit does not consider the categories as related. At best, ordered logit will order them. But we still don't take into account the increment is the same between all categories (1 year).
And 2) if we want to consider defaults on more than a few years, the number of categories of the dependent variable quickly increases, which will affect the performance of the predictions.
So if anyone has an idea on how to tackle this problem, I'd like to know your thoughts! I feel like I'm not approaching it right at the moment, and maybe I need another kind of modelisation altogether.
Thank you very much !