Sample multivariate PDF from KDE with different norm

Question

I am using KDE with a modified metric for the distance. The PDF is as expected (see below: color is the probability and the dot is the point used to fit the KDE). But due to the new metric, I cannot use the usual sampling methods as they suppose a gaussian kernel with $(||\textbf{x}-\textbf{x}_i||)$. Here I have like something like $1/(||\textbf{x}-\textbf{x}_i||)$. So, the new kernel would looks like:

$$K(\textbf{x}-\textbf{x}_i) = \exp\left(\frac{-1}{||\textbf{x}-\textbf{x}_i||*h}\right)$$

This kernel does not integrate to 1 so I bound it in the unit hypercube and I want to sample in this hypercube.

How to generate new sample in this context?

From what I read (Find CDF from an estimated PDF (estimated by KDE), for instance), I have to come up with the multivariate inverse CDF.

Is there a simple way to do this?

For now I just use a hack which consists in:

Sample the multivariate space and get PDF values
Then I use these two information with a uniform random generator to give me a sample.

It works but will get the curse of dimensionality. Something like this: https://stackoverflow.com/q/25642599

Even the CDF in a multivariate space would be enough I guess as I would be able to use some fixed point for example to do the inverse.

EDIT

Here is the result of this experiment. Color is the probability, black point is the point used for fitting the KDE and red points are samples generated using Metropolis-Hasting MCMC.

duplicate (?) https://stats.stackexchange.com/questions/56700/random-sample-using-kde-or-bootstrapping — kjetil b halvorsen, Jun 05 '18 at 14:44
@kjetilbhalvorsen how is that? It does not answer at all. I have the PDF with the KDE but as I said, I cannot generate new samples using the classical approach by calling random on the points of the dataset because I changed the distance metric. So either I have to really compute the inverse CDF or find another way. — tupui, Jun 05 '18 at 14:45
@Xi'an Why has this been put on hold? At least give some explanations on how I can rephrase this. Super welcoming community... — tupui, Jun 05 '18 at 18:13
They probably voted to close because it is seen as a question about code (I voted to keep open). So try to reformulate without mentioning code! — kjetil b halvorsen, Jun 05 '18 at 18:19
@kjetilbhalvorsen OK thanks for the comment I will remove code stuff then. — tupui, Jun 05 '18 at 18:20
@kjetilbhalvorsen I have modified this. Hope this get reopened... — tupui, Jun 05 '18 at 18:37
Could you explain what you mean by "modified metric"? After all, you describe trying to sample from a KDE--and the apparent duplicate tells you how to do this, regardless of what the kernel might be. — whuber, Jun 05 '18 at 20:23
@whuber I am replacing the norm by a Minkowsky distance. From the duplicate, I do not understand it if you think it answer the question. — tupui, Jun 05 '18 at 21:10
I'm unable to see how the duplicate doesn't answer your question. Regardless of how you *develop* the KDE, by definition it represents a distributional estimate as a mixture of components; and sampling from that mixture is performed as you would from any mixture. That's why I'm probing for a better understanding of the role played by this distance. — whuber, Jun 05 '18 at 21:13
@whuber with the KDE I get the PDF then to sample from the distribution how I do this? From what I saw in the codes of all implementations, it is a Gaussian random generator which is used on the dataset. This will only work with a Gaussian kernel with l2-norm. In this case (the figure) it would create sample close to the star shape (there is a point here) and this is exactly the opposite of what the PDF says (high PDF values in yellow). Where am I wrong? Maybe this is coming from the fact that I do not understand R code as in the other question. — tupui, Jun 05 '18 at 21:18
When you use a non-Gaussian kernel, you sample from that kernel, that's all. Conceptually the difference is trivial, although in practice some kernels are harder to sample from than others. — whuber, Jun 05 '18 at 21:25
@whuber oh really!? But why this seems simple with non Gaussian? I thought Gaussian was gold. I find this troubling that I do not need the CDF. Will try to do this with scikit-learn. — tupui, Jun 05 '18 at 21:30
Gaussian kernels are nice because they are infinitely differentiable--but that does not make them the "best" in any universal sense. Many other kernels are in common use, including "simple" (step function) kernels and kernels that have desirable statistical properties for robust estimation. — whuber, Jun 05 '18 at 21:33
@whuber Sorry but I still do not see how to do this practically. Here https://stats.stackexchange.com/a/321726/140570 they sample from a Kernel. Behind the function, there is a call to the sampling of the gaussian (`rnorm` and this uses the CDF If I'm correct). But this will not work in my case as I do not have this PDF. When I write the Kernel its like K(x, y) = f(x, y) but nothing like K(n). If you have some simple code this could really help because here I just do not get it. Thanks — tupui, Jun 06 '18 at 13:53
What exactly is your question? You seem to have an estimation already. Is the question now how to draw from this estimated pdf in principle? Or is this "just" a technical question on how to do this in code? In what shape/program/toolkit do you generate the pdf? The principle has been laid out very nicely by @whuber, especially in his last linked article. — cherub, Jun 11 '18 at 14:17
@cherub I just want to know if there is a direct way to sample this PDF. My solution works but I am looking for a less expensive solution as whuber linked. But as I already explained. I looked at the answer and the underlying code and this only works with assuming an Euclidean metric. If not then I do not see how and if you can explain it this would constitute a perfect answer. — tupui, Jun 11 '18 at 17:51
@MartijnWeterings I specified that as most answer found here are explaining with R code. And as I do not know R I will not understand the answer. So I just wanted to specify this. I am not looking into a complete solution in python. — tupui, Jun 14 '18 at 12:26
@MartijnWeterings The question is how to sample with a different metric. I said that I am using a Gaussian Kernel and I just modified the metric. Actually, the answer from Andy Jones seems valid. — tupui, Jun 14 '18 at 14:20
@MartijnWeterings I updated the question with the expression of the Kernel — tupui, Jun 14 '18 at 14:32
Yeah I know because it just tends to infinity as on the figure. But I am bounding this in the unit hypercube so I still want to do this and using the method I am describing in the question, I am able to sample in this strange space with this Kernel. — tupui, Jun 14 '18 at 14:35
@MartijnWeterings It is between [0, 1] to be in the unit hypercube but yes that is the spirit! — tupui, Jun 14 '18 at 15:00
@Xi'an I do not get the question. I have a multivariate Gaussian kernel and I just do the inverse on the distance computation — tupui, Jun 15 '18 at 07:45
If your target is the pdf proportional (in x) to $$\exp\left\{-\frac{1}{2}\left(\frac{1}{x-x_i}\right)^2\right\}$$ in which sense is it multidimensional ? In my mathematical referential, the quantity $$\left(\frac{1}{x-x_i}\right)^2$$is only defined for real $x-x_i$'s. — Xi'an, Jun 15 '18 at 07:51
@Xi'an again I do not get it. I am not mathematician. I just can tell you that I modified an existing multivariate kernel — tupui, Jun 15 '18 at 07:56
@Xi'an I just used the standard notation but of course the distance is a L2-norm between the vectors — tupui, Jun 15 '18 at 08:01
Possible duplicate of [Random sample using KDE or bootstrapping](https://stats.stackexchange.com/questions/56700/random-sample-using-kde-or-bootstrapping) — ironman, Jun 16 '18 at 14:05
Unfortunately, your explanation was incorrect: the code in the duplicate doesn't require distance at all. It only requires that you be able to sample from the kernel function. — whuber, Jun 19 '18 at 14:01
@whuber and again, how to sample from this Kernel? The question does not answer that because if you look at the code they use, it relies on classical gaussian kernel and it will not work in this case. But thanks to AndyJones, using Metropolis-Hasting I was able to do this. — tupui, Jun 19 '18 at 14:06
The code I used in my answer explicitly points out that one can replace the sampling from a Gaussian by sampling from any other kernel. This is a perfectly trivial substitution, which is why you are getting so many baffled comments inquiring what you're trying to ask. — whuber, Jun 19 '18 at 14:15
@whuber I disagree, in your answer the sampling algorithm is `rkernel — tupui, Jun 19 '18 at 14:32
This is not mathematics--it's coding. Replace `rnorm` by any sampling function you like, period. It's that simple. `R` itself offers numerous examples and threads on this site describe literally hundreds more ways to sample from distributions. — whuber, Jun 19 '18 at 14:50
@whuber behind `rnorm` there is a call to inverse CDF of the normal distribution. In this case, the distribution is not normal. So no there is no solution on the shelf and I have to do rejection sampling. — tupui, Jun 19 '18 at 14:53
Finally--after some 30 comments--the nature of your question is revealed! Would it be possible to edit your question so that it explicitly describes the kernel? *The solution depends on the dimension of your space.* Currently it appears that you are attempting to describe a truncated version of an [inverse gamma distribution](https://en.wikipedia.org/wiki/Inverse-gamma_distribution). If this guess is correct, then there are efficient, simple, out-of-the box methods to sample from it (which might obviate any need to truncate it in the first place). — whuber, Jun 19 '18 at 14:59
@whuber As I said a million time. Doing the inverse for the distance (which leads to the figure) is only an example. In practice I want to be able to put whatever I want instead of the euclidean norm. Minkowsky, Cambera, whatever norm. So there will not be an inverse distribution for everything. By the way, this is not a gamma at all for the example in the figure. Look at the probability, it is 0 where there is a point. — tupui, Jun 19 '18 at 15:06
(1) If your question is about sampling from *any* kernel, then the duplicate is the most general answer you will get from this site to such a broad (and therefore off-topic) question. (2) Your last point isn't correct. As I said, it depends on the dimensionality--which hugely affects the probability density at zero--and this is an *inverse* gamma distribution, not a gamma distribution. — whuber, Jun 19 '18 at 16:51
@whuber no your answer does not do the job. But both answer from AndyJones and Ben are perfect! They both understood the issue here and gave working solution as opposed to you just complaining about my word choices. Remember this is a broad audience Q&A forum. We don’t all have your statistical/mathematical background. Stop showing off it only makes the experience poor. Check other stackechange topics, people are way more chill. — tupui, Jun 19 '18 at 17:29
I am sorry that I could not guess your intentions as well as others and I am sorry that you are so militantly resisting so many requests, by so many people, to make yourself understood. The job I am carrying out here as moderator is to make our threads *uniquely* intelligible to readers, because ambiguity leads to error. The voting on your question is a good indicator of what readers are thinking. Lashing out at me will not fix that--only you can clarify what your question really was. — whuber, Jun 19 '18 at 17:39
@whuber well -2 so you and xian... But whatever. I tried to answer all your questions every time, so you cannot say I did not tried. Read this again you will see I provided answers for everything. You on the contrary pushed always the same conclusions even if I said from the first time that It was not helping. I even said that this did not answered it in the question itself so... You are not making it understandable. You are pushing your answers that’s all. — tupui, Jun 19 '18 at 17:44

Ben · Answer 1 · 2018-06-15T00:07:06.320

UPDATED: If I understand your question correctly, you have generated a kernel density function from a set of data $x_1,...,x_n \in [0,1]^m$ using some non-standard kernel, and now that you have generated the "density", you want to be able to sample from this in a simple way. You are using a Gaussian kernel, but with an inverse substitution of the distance; your kernel function is:

$$K(x_i-x) \propto \exp \Bigg( -\frac{\lambda}{2 (x_i-x)^2} \Bigg) \quad \text{for all }x \in [0,1]^m.$$

Sampling from a KDE: For a set of data $x_1,...,x_n$ and a (un-normalised) kernel function $K$ the kernel density estimate (KDE) $\hat{p}$ is given by:

$$\hat{p}(x) \propto \frac{1}{n} \sum_{i=1}^n K(x_i-x).$$

(This might also have some other estimated parameters for the kernel, like a bandwidth, but that does not affect this question.) Since the KDE is literally just a uniform mixture distribution, using kernels placed at the data values, the simplest way to sample from this density is via the following two-step process:

Step 1: Generate a random value $i \sim \text{U}\{1, ..., n\}$ representing a data point;
Step 2: Now sample from the density with kernel $K$ centered at $x_i$. This can be done by a variety of methods, such as rejection sampling, Metropolis-Hastings, or transformation methods.

It is trivial to show that this gives you a random value from the density $\hat{p}$. This is because the latter is just a mixture distribution of the kernels, centered at the data points. This means that the problem of generating a random variable from a KDE degenerates down to the problem of generating a random variable from the chosen kernel. In you case, I would recommend using the present algorithm, with rejection sampling to sample from your kernel.

As I explained in the comments, I do not see how to "sample" from the Kernel in this context. Ok for exemple let say I have a Gaussian Kernel with the following metric: $\frac{1}{x_i - x}$ instead of $(x_i - x)$ and the multivariate distribution is totally unknown. — tupui, Jun 14 '18 at 08:11
As I said in the answer, if you give full details of your kernel and explain how this is linked to your metric, I will try to make sense of this. With sporadic details spread across comments, I don't know what you're doing. — Ben, Jun 14 '18 at 10:54
Well, the explanation is just above: Gaussian Kernel with an inverted distance measure. That is all. — tupui, Jun 14 '18 at 13:01
Just wanted to say this answer is also perfect. I cannot accept 2 but gave a +1. Thx for the maths and details, it really helps. — tupui, Jun 19 '18 at 17:32

Andy Jones · Accepted Answer · 2018-06-15T07:48:32.833

Rejection sampling!

Suppose you've got a distribution $P$ that you'd like to sample from, but can't, and a proposal distribution $Q$ that is kinda like $P$ which you can sample from. Then if you can find a number $M$ which bounds $p(x)/q(x)$, to sample from $Q$:

Draw a sample $x$ from $P$
Calculate a threshold $T =\frac{p(x)}{Mq(x)}$
Draw a random number $t \sim U(0, 1)$.
If $t \leq T$, return $x$
Else go back to the beginning

You might have to bound the support of $P$ and $Q$ in order for $M$ to exist, and the speed of this whole process hinges on how close your proposal $Q$ is to $P$. In your case, a good proposal would be the uniform distribution over the unit cube.

If that isn't fast enough, the next places to look are at adaptive rejection sampling, Metropolis-Hastings MCMC, and then maybe an advanced MCMC scheme like NUTS.

This is actually the first answer which does answer my questions! Will try this right away :) — tupui, Jun 14 '18 at 10:42

Sample multivariate PDF from KDE with different norm

2 Answers2