Questions tagged [density-estimation]

Estimation of probability density functions, whether by kernel density estimation, log-spline estimation or other methods.

Wikipedia has an article https://en.wikipedia.org/wiki/Density_estimation with further references.

294 questions
36
votes
2 answers

Can you explain Parzen window (kernel) density estimation in layman's terms?

Parzen window density estimation is described as $$ p(x)=\frac{1}{n}\sum_{i=1}^{n} \frac{1}{h^2} \phi \left(\frac{x_i - x}{h} \right) $$ where $n$ is number of elements in the vector, $x$ is a vector, $p(x)$ is a probability density of $x$, $h$ is…
15
votes
3 answers

Where is density estimation useful?

After going through some slightly terse mathematics, I think I have a slight intuition of kernel density estimation. But I am also aware that estimating multivariate density for more than three variables might not be a good idea, in terms of the…
12
votes
2 answers

Kernel Density Estimate for Cauchy

As far as I understand, kernel density estimation does not make any assumptions on the moments of the underlying density, and just requires smoothness. The Cauchy density function is quite smooth. Even still, when I try to do KDE using density() in…
Greenparker
  • 14,131
  • 3
  • 36
  • 80
12
votes
4 answers

How can I draw a value randomly from a kernel density estimate?

I have some observations, and I want to mimick sampling based on these observations. Here I consider a non-parametric model, specifically, I use kernel smoothing to estimate a CDF from the limited observations.Then I draw values at random from the…
emberbillow
  • 417
  • 3
  • 10
12
votes
0 answers

Help me understand the Bayesian kernel density estimation (Sibisi and Skilling, 1996)

Sibisi and Skilling (1996, also mentioned in the 1997 paper) define Bayesian kernel density as $$ f(x) = \int dx' \,\phi(x')\, K(x, x') \tag{2} $$ Here the kernel $K$ is an assigned smooth function, possibly having a few width and shape…
Tim
  • 108,699
  • 20
  • 212
  • 390
11
votes
4 answers

Is overfitting a problem in unsupervised learning?

Consider the density estimation problem for some training set $(x_1 ... x_N)$. A gaussian mixture model consisting of $N$ normal distributions centered on each $x_i$ with very small variances will "overfit": the likelihood will be very high on the…
10
votes
2 answers

Kernel density estimation and boundary bias

What sort of kernel density estimator does one use to avoid boundary bias? Consider the task of estimating the density $f_0(x)$ with bounded support and where the probability mass is not decreasing or going to zero as the boundary is approached. To…
9
votes
0 answers

Density estimation/approximation from MCMC samples

I'm looking to accurately describe the density function of a multivariate posterior probability distribution based on samples from MCMC. As far as I know, in most cases this is done either with a simple parametric fit (e.g. fitting or updating a…
8
votes
2 answers

Density estimation for large dataset

I have a unidimensional data set with more than 1000000 observations. Assuming that those observations are independent realizations of the same random variable I need to estimate the underling density function. This estimated density function will…
8
votes
2 answers

KDE for censored data

I have a sample of observations where about $30\%$ of the observations are right-censored. I want to fit a kernel density estimator to this sample but I have not found a standard method to do so. Is there any widely accepted methodology for fitting…
Dual
  • 81
  • 1
7
votes
1 answer

Why do we use parametric distributions instead of empirical distributions?

The probability density function (pdf) is the first derivative of the cumulative distribution (cdf) for a continuous random variable. I take it that this only applies to well-defined distributions like the Gaussian, t-distribution, Johnson SU, etc,…
7
votes
2 answers

Does a density forecast add value beyond a point forecast when the loss function is given?

Density forecasts are more universal than point forecasts; they provide information on the whole predicted distribution of a random variable rather than on a concrete function thereof (such as predicted mean, median, quantile, etc.). Availability of…
Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
7
votes
2 answers

Estimating the gradient of log density given samples

I am interested in estimating the gradient of the log probability distribution $\nabla\log p(x)$ when $p(x)$ is not analytically available but is only accessed via samples $x_i \sim p(x)$. There seems to be various possible solutions utilizing…
6
votes
2 answers

Calculating the area under two overlapping distribution

I have two overlapping frequency distribution, one of the buyers' demand or willingness to pay and the other one is seller's reservation price frequency distribution. The two distributions overlap and I'd like to estimate the overlapping area. What…
kms
  • 530
  • 2
  • 17
6
votes
2 answers

Error Bars for Histogram with Uncertain Data

Context I have a set of data points $\{x_1, \dots, x_N \}$ along with the respective measurement uncertainties $\{\epsilon_1, \dots, \epsilon_N\}$ in them ($N \approx 100$). These data are basically the measured distances to the occurrences of some…
AstroK
  • 61
  • 3
1
2 3
19 20