Composition of probability density

Question

I know probability distribution for parameter $\phi$. I have the empirical distribution/statistical distribution of $X$ that is dependent on parameter $\phi$ for $\phi \in [0,1]$. I assimilate this empirical distribution to the probability distribution for X.

1/ Can I do so? Are notions similar?

Then, knowing distribution for $\phi$, and 'empirical distribution' for $X$, I would like to compute the distribution for $X(\phi)$.

I first thought of using the inverse density function method. This gives a random number generator for my unknown distribution, if I can compute inverse cdf $F^{-1}$.

However, I cannot always compute $F^{-1}$. I thought then at some rejection method.

2/ I wonder if I can deal with this problem as a composition of probability density problem, and what solutions are at hand. I had a quick look and saw some optimization approaches for this class of problems (ex here).

3/ Finally, why not find the distribution of X multiplying each value of $X(\phi)$ by the probability density for this value of $\phi$? (simple product of underlying probability density by X values)

Thanks!

#

EDIT

I try to reformulate the problem in term of Bayesian statistics.

I have a prior $\phi$ with uniform distribution. I then know the distribution of X conditional to this prior for $\phi$, $P(X|\phi)$. From Bayes rule, I can deduce $P(\phi|X) = \frac{P(X|\phi)*P(\phi)}{P(X)}$.

Now, my prior is no more uniformly distributed. In other words, I have same parametric model for the distribution of X, but it is now parametrized with a non uniform RV $\theta$ following some know distribution with density $f_{\theta}$. I would like to know new $P(X|\theta)$. From Bayes rule, $P(X|\theta) = \frac{P(\theta|X)*P(X)}{P(\theta)}$.

My problem is solved if I have a relationship between $P(\theta|X)$ and $P(\phi|X)$. Should I plug in $P(\phi|X)$, which I know, in the equality $P(X|\theta) = \frac{P(\theta|X)*P(X)}{P(\theta)}$?

I hope that tentative explanations sound clearer. If not, could you guide me towards could formulation and solution?

#

EDIT - I try to clarify first sentence after Zen's comment, and to reformulate

With 'I have the empirical distribution/statistical distribution of $X$ that is dependent on parameter $\phi$ for $\phi \in [0,1]$.', I wanted to say: I know the distribution of data in a situation where some parameter, $\phi$, is uniformly distributed. I assume a model for the empirical distribution, that is parametrized with $\phi$.

Now, I am in another situation, where this parameter, which I consider a random variable, is having another distribution, with some known density $f_{\phi}$. I also assume that empirical distribution's model holds with the new underlying distribution.

Data can be produced in this model where the parameter distribution is no more uniform, but is $f_{\phi}$. I want to find the distribution for these data.

Thanks, apologize for naiv question and ackward stats lexicon.

Best regards.

Help me understand what you are trying to say. When you talk about probability distribution for a parameter are you talking about a Bayesian prior? What does the notation $X(φ)$ mean? — Michael R. Chernick, Sep 07 '12 at 13:37
thanks. $\phi -> X(\phi), \phi \in [0,1]$ is the empirical distribution of X as a function of some parameter $\phi$. But $\phi$ in nature is not uniformly distributed. How is 'empirical distribution' of X knowing that $\phi$ is not uniform. — kiriloff, Sep 07 '12 at 13:53
infact, when saying ϕ−>X(ϕ),ϕ∈[0,1] is the empirical distribution of X as a function of some parameter ϕ, I mean that it is empirical in a synthetic experience where $\phi$ distribution is taken uniform. — kiriloff, Sep 07 '12 at 14:01
Do you do this without any real data?? If so it sounds like you are talking about a parametric distribution for X and a uniform Bayesian prior on phi. But I am having trouble because we seem to be using very different terminology and yours is very unfamiliar to me. — Michael R. Chernick, Sep 07 '12 at 15:54
It looks like you helped me a lot with this request for clarification. I seems to me that is is precisely a problem of marginal likelihood estimation, and that method s.a. Gibbs sampling could solve it. What computational or exact methods are at hand for my problem, or how would you handle it? thanks again. — kiriloff, Sep 07 '12 at 17:56
A scientist (but not statistician) told me he had an idea the distribution for X could simply be obtained multiplying marginal density of X by marginal density of parameter $\phi$. With 'multiply' is meant: if $X(\phi=a)=y$ and $p(\phi=a)=p$ then 'joint distribution' (?) $p(X)$ is $y*p$ at point $\phi=a$. It is intuitively-pleasant, but is it true? how to demonstrate or counterexample? — kiriloff, Sep 07 '12 at 18:38
You don't need to use MCMC. But it now does sound like you want to use Bayesian methods with the uniform prior on p. I think I can give a sensible answer now. — Michael R. Chernick, Sep 07 '12 at 19:02
@Zen thanks. The OP is have a difficult time explaining his problem. I feel that if I can understand what the real question is I can solve it easily. So I tried to help him explain the problem. Based on his responses to my queries my answer is given based on my best guess about the problem. But I am still not sure I got it righr. — Michael R. Chernick, Sep 08 '12 at 05:02
@Zen Sorry for unclear stats in my post. I tried to reformulate the problem. Do you have some idea from this point? Regards. — kiriloff, Sep 10 '12 at 11:10
Dear fonjibe, what is nice is that you (unlike many "askers") are giving us feedback and working hard to help us understand your problem. — Zen, Sep 10 '12 at 17:05
That said, my advice is that you stick to standard statistical terminology. If we don't understand your terms, we can hardly give any useful help. For example, when you say that "I have the empirical distribution/statistical distribution of $X$ that is dependent on parameter $\phi$", it makes no sense. The empirical distribution depends only on the sample values. It can't depend on any paramenter. So, what are you really trying to say? — Zen, Sep 10 '12 at 17:19
@Zen Thanks Zen. I realize how vague I understand the notions behind. I try to clarify what you mention. — kiriloff, Sep 10 '12 at 18:45
@Zen I really feel than Michael's and your comments are very constructive for my finding the answer, since you point out the right underlying notions. I really hope we can converge to understanding and solution. — kiriloff, Sep 10 '12 at 18:58
Can you describe your problem without mentioning probability and statistics first? **What do you observe?** Which questions are you trying to answer? Do you want to predict some future thing based on the information contained in your sample? — Zen, Sep 10 '12 at 19:08
@Zen I observe the number of events w.r.t. a parameter varying in $[0,1]$ (the thing I called $\phi$ before). These data are from experience in world 1, where this parameter is uniformly distributed. In world 2, this parameter is not uniformly distributed. I know how its distribution. But in world 2, I don't have any data. I assume that in world 1 and 2 the laws behind events production (the 'model') are the same. What I want is the 'behavior', the 'distribution' of data in world 2. Does it mean something ? Yes: I want to 'predict' how events would occur in world 2, knowing world 1. : ) — kiriloff, Sep 10 '12 at 19:29
What does "I observe the number of events w.r.t. a parameter varying in $[0,1]$" mean? Parameters are unobservable. Things you **can** observe: prices of stocks, weights of things, heights of people, countings of some kind of occurence, temperatures, concentrations of chemicals, numbers of bugs in a computer program, etc. What do you observe? — Zen, Sep 10 '12 at 19:36
@Zen thanks! I observe: number of (wheather) events, in a world1 experience where I can tune a physical property $\phi$. I measure the number of event when $\phi = 0$, $\phi = 0.01$, ..., $\phi = 1$. This gives me a curve: 'number of events' w.r.t. $\phi$. World 2 is real world, with $\phi$ varying without my control. I know how property $\phi$ is distributed, but I cannot measure anything in real world. I want to get the distribution of 'number of events' in real world, from the lab experiment, and from what I know about $\phi$ in real world. Does it make sense? — kiriloff, Sep 10 '12 at 19:48
As far as I understand, you are doing this: you model the counting of certain climate events as a (say) Poisson Process (PP). Since you don't know the rate of the PP (which corresponds to your parameter $\phi$), you are plugging in different values of $\phi$ and simulating realizations of a PP with that value of $\phi$. After that you show the results of the simulations to a climate expert who can construct a prior $\phi$ based on all his knowledge, considering the simulated countings for each value of $\phi$. — Zen, Sep 10 '12 at 23:41
Those "imaginary" countings are just a proxy to allow the expert to express his opinion about the unobservable parameter $\phi$. After all, he knows a ton about climate events, and nothing about that particular greek letter. This is essentially a subjectivistic elicitation of the prior distribution of $\phi$. — Zen, Sep 10 '12 at 23:45
After that you can talk about the real world, and ask, for example: what are the odds that we will have a certain number of climate events in the next year, given that last year we observed these (real) climate events (this is your data, the things that you observe). — Zen, Sep 10 '12 at 23:47
@Zen The first two parts of your comment perfectly translate what I have such hardship to tell! I am not sure that I understand third part. Now I have subjective $\phi$ and 'imaginary' countings at hand. I would like to give 'the odds that we will have a certain number of climate events' in real world, but not 'given that last year we observed these (real) climate events'. I don't observe 'real world' data, and I would like to use subjective prior and 'imaginary' countings to find out the 'odds of events' in real world here and now. Is it a good problem ? — kiriloff, Sep 11 '12 at 07:23
Well, to be honest, I don't like the idea that you want to make predictions based only on your prior opinion. After eliciting your subjective prior, you should have some real data that update your prior, before you start making predictions. If you don't give nature a chance to change your prior opinion, the predictions will we be, in a sense, weak. — Zen, Sep 11 '12 at 15:35
But **you can** do it: just compute the so called "prior predictive pmf" as $p(x)=\int_\Theta p(x\mid\theta)\pi(\theta)d\theta$, where $p(x\mid\theta)$ is the sampling pmf, and $\pi(\theta)$ is the prior density which you've already elicited. — Zen, Sep 11 '12 at 15:38
Just a note: since terminology was a problem at the beginning of this conversation, "pmf = probability mass function" (remember that $x$ is discrete: number of climate events). — Zen, Sep 11 '12 at 15:43
@Zen Zen, thanks a lot. This is what I was looking for. I have two remaining questions. 1/ how to compute 'prior predictive pmf' ? is there some classical computation method to compute integral (MCMC?) ? 2/ I see that you have concerns about philosophy/approach. do you have some reference in literature where this problem/these concerns are mentioned? Thanks again for clarifying the whole thing. what is a good reference manual in bayesian stats ? — kiriloff, Sep 11 '12 at 19:43
I really like Robert's "The Bayesian Choice", but take a look at this question http://stats.stackexchange.com/questions/125/what-is-the-best-introductory-bayesian-statistics-textbook/24526#24526 — Zen, Sep 11 '12 at 19:56

Michael R. Chernick · Accepted Answer · 2012-09-10T12:47:57.867

2

The problem posed in language that is familiar to me is that you want to determine the posterior distribution of phi given a prior distribution on phi that is

uniform on [0, 1].

X is distributed according to a parametric distribution say with density f$_φ$(x).

Take a sample of size n iid from f$_φ$(x).

The posterior distribution for φ is gotten by Bayes rule

g(φ|x) = c f$_φ$(x$_1$)f$_φ$(x$_2$)....f$_φ$(x$_n$) for 0<=φ<=1

g(φ|x) = 0 otherwise. c is the normalization constant that makes ∫g(φ|x)dx=1 where the integration is over the interval [0,1]. The uniform prior density for φ appears as the constant 1 when 0<=φ<=1 and is 0 otherwise. The product of the f$_φ$s is the likelihood function given X$_1$=x$_1$, X$_2$=x$_2$..., X$_n$=x$_n$.

Based on your edit it sounds like you want to infer the likelihood from the posterior and the prior. But I don't understand why you would want to do that. You are normally given data and you pick a prior and a model and then compute the posterior. But if by P(X) you mean ∫p(X|θ)p(θ)dθ then the formula you have is correct.

edited Sep 10 '12 at 12:47

answered Sep 07 '12 at 19:21

Michael R. Chernick

39,640
28
74
143

Thanks a lot for the time you spent already. I think that my problem is indeed one of baysian statistics, but not one of posterior distribution determination. I feel it is closer to computation of marginal likelihood problem, as here: http://en.wikipedia.org/wiki/Marginal_likelihood. I know the distribution for random variable $\phi$, and I have a iid data points $X=(x_{1},...,x_{n})$ with $x_{i}$ distribution $p(x_{i}|\phi)$. I am not sure it is clearer. I have distribution for $\phi$ already, what I need is distribution for X incorporating information on distribution for $\phi$. T – kiriloff Sep 08 '12 at 10:26
Thanks a lot, and I apologize: I am not a stat guy. – kiriloff Sep 08 '12 at 10:26
I tried to give my question a new start: I explain how I think I can use the solution you propose in your answer for my problem. You pointed out exactly the adequate notions, and I try to reformulate my question with your words. – kiriloff Sep 10 '12 at 10:51
1

@fonjibe I have edited my answer to address your edit. – Michael R. Chernick Sep 10 '12 at 12:52
Thanks a lot. Yes, this is what I mean with $P(X)$. Maybe the naive explanation of the problem I try to give to Zen above tells more about what I am looking for. I am not given data 'in world 2', to take the same naive vocab, I am looking for data distribution in world 2 knowing the distribution 'in world 1' and assuming same 'model' in both worlds. I hope I can do some progress. – kiriloff Sep 10 '12 at 19:34

Composition of probability density

1 Answers1

Linked