Maximum likelihood estimation involving both probabilities and probability densities

Question

Note: based on suggestions in the comments, I have rewritten this question. Please refer to the history for the original version.

In general my question regards how to compute likelihoods in mixed cases with both probabilities and probability densities.

Here's how I collect the data. I have a behavioral experiment involving (human) participants, who in each of $N$ trials perceive a stimulus of intensity $i\in [-1;1]$. In response to this stimulus intensity, participants in every trial make a binary choice $c$ ($0$ or $1$) and a continuous rating $r\in[0;1]$.

To account for these data, I have a model, which in each trial takes $i$ as input and internally computes a hidden variable $x=f(i,\theta)$. (*)

Based on $x$ and some to-be-optimized model parameters, I can in each trial compute the likelihood for the actual participant's choice $c$ through a sigmoid/softmax function of $x$:

$p(c)=c-{2\cdot(c-0.5)\over1+e^{-\beta \cdot x}}$

The likelihood of the continuous rating is given by a normal distribution (probability density) with mean $\lambda\cdot|x|$ and standard deviation $\sigma$ for $0<r<1$. For the special cases $r=0/1$, the probability is the area under the normal in the range $]-\infty;0]$ for $r=0$, and $[1;\infty[$ for $r=1$, respectively:

\begin{equation} p(r)=\begin{cases} \mathcal{N}(r,\lambda|x|,\sigma), & \text{if}\hspace{5pt}0<r<1\\ \Phi(0,\lambda|x|,\sigma), & \text{if}\hspace{5pt}r=0\\ 1-\Phi(1,\lambda|x|,\sigma), & \text{if}\hspace{5pt}r=1 \end{cases} \end{equation}

with $\mathcal{N}($variable,mean,std$)$ being the normal distribution and $\Phi($variable,mean,std$)$ being the CDF of the normal distribution.

Note that $p(c)$ and $p(r)$ can be assumed independent conditional on $x$.

My question: what is the mathematically correct way to calculate the combined likelihood of $p(c)$ and $p(r)$? My goal is to use this likelihood to perform maximum likelihood estimation of the model parameters $\beta,\lambda,\sigma,\theta$, such that the model becomes maximally predictive of behavior.

(*) i don't know whether the specifics of the function $f$ are important. It's a linear function, however with a nonstationary slope (let me know if more information is required).

Hello. Not clear to me. Do you mean they are two type of observations in your data, either a $0/1$ variable assumed to be generated from a Bernoulli distribuion or a continuous variable assumed to be generated from a Gaussian distribution ? — Stéphane Laurent, Dec 05 '14 at 09:58
Exactly. Just as you said, there are two different behavioral observations, one binary (leading to $p$), one continuous (leading to $pd$), and both shell be optimized simultaneosly. — monade, Dec 05 '14 at 11:55
Do you know in advance the type of an observation ? Or is it random ? — Stéphane Laurent, Dec 05 '14 at 12:06
Sorry, i wasn't precise enough: in *each* trial both types of observation are made. — monade, Dec 05 '14 at 12:33
Ok, if I understand these are paired observations, one binary and the other one continuous ? Are they assumed to be independent ? — Stéphane Laurent, Dec 05 '14 at 13:14
In order to write down the likelihood you need to specify a joint distribution for the continuous and the binary variable. — Stéphane Laurent, Dec 05 '14 at 15:19
You should state your model, formally, with equations. Then I am sure we will spot your problem! — kjetil b halvorsen, Dec 05 '14 at 17:49
A worked example of working with likelihoods involving both densities and probabilities is posted at http://stats.stackexchange.com/questions/49443/how-to-model-this-odd-shaped-distribution-almost-a-reverse-j/49456#49456. Your actual problem is unclear: what do you mean by "1st observation" and "2nd observation"? Why do they have no parameters in common? It sounds like you might be struggling with writing down the likelihood, but for us to help you with that you will need to indicate exactly what your probability model for the data is. — whuber, Dec 05 '14 at 21:58
Sorry, I thought "1st observation" and "2nd observation" would be clear from the comments above: In each trial of an experiment, I make two observations, one discrete (1st), one continuous(2nd). So 1st and 2nd is meant relative to each trial. For both observations I compute probabilities in each trial based on the model's state (here $x$) and parameters. In case this wasn't actually a misunderstanding, I am happy to provide additional information about my probability model (although currently i don't understand what could be missing). — monade, Dec 05 '14 at 22:25
What is the $x$ in these equations? Are you trying to fit a different value of $x$ for each data point, or one $x$ for your entire model? — Ben Kuhn, Dec 07 '14 at 01:28
Thanks Ben, that might have been the missing information - I'm sorry that it takes me so long to provide a minimal working example. I forgot to mention that there is an input variable $i$ to the model, which is known and provided as input in each trial. Further, $x$ is a function of $i$ ($x=f(i,\theta)$), and as such varies from trial to trial, too (I now call $x$ a *hidden* model variable; $\theta$ is an additional model parameter). The goal is to optimize the model's predictions of choices $c$ and ratings $r$ based on some input $i$, by fitting the parameters $\beta,\sigma,\lambda,\theta$. — monade, Dec 07 '14 at 07:48
Your new question made me realize a mistaken answer to one of your first questions: $p(c)$ and $pd(r)$ **can be considered independent**! Still, given a lack of statistical background, I am unable to compute the joint distribution. I am seeing this http://en.wikipedia.org/wiki/Joint_probability_distribution#Mixed_case But i don't know how to apply it to my case. (Note: I deleted my erroneous comment above and updated my question) — monade, Dec 09 '14 at 10:36
This question remains problematically flawed because it relies too heavily on trying to explain your analysis without sufficiently explaining what your data are like. You would likely get better help by scrapping what you have written, describing how you collect the data and how you are thinking about them, and asking readers directly to explain how to derive a likelihood. — whuber, Dec 09 '14 at 16:52
Your point is well taken. I have rewritten the entire question, which hopefully provides further clarification. However, since I have not really mentioned anything new, something might be still missing. In that case, please let me know exactly what information is missing. — monade, Dec 09 '14 at 17:41
You say $r \in [0,1]$ is normally distributed, there's a problem here. — Stéphane Laurent, Dec 10 '14 at 12:09
That's a very good point. I now tried to fix this, by setting the probability of $p(r=1)$ equal to the area under the normal distribution in the range $[1;\infty[$ and the probability of $p(r=0)$ equal to the area under the normal distribution in the range $]-\infty;0]$ — monade, Dec 10 '14 at 17:25
Three things: (1) most likely my fix is wrong, given that I have no expertise in such things, (2) the necessity of this fix probably should be an indicator to me that the normal distribution just won't work here, (3) I'd be glad about suggestions how to properly model the uncertainty around $r$. The general idea is that the continuous rating $r$ doesn't exactly correspond to $\lambda|x|$, but instead varies around the mean $\lambda|x|$ with a certain uncertainty given by $\sigma$. — monade, Dec 10 '14 at 17:25

Maximum likelihood estimation involving both probabilities and probability densities

0 Answers0