Questions tagged [distributions]

A distribution is a mathematical description of probabilities or frequencies.

Overview

A distribution is a mathematical description of probabilities or frequencies. It can be applied to observed frequencies, estimated probabilities or frequencies, and theoretically hypothesized probabilities or frequencies. Distributions can be univariate, describing outcomes written with a single number, or multivariate, describing outcomes requiring ordered tuples of numbers.

Two devices are in common use to present univariate distributions. The cumulative form, or "cumulative distribution function" (CDF), gives--for every real number $x$--the chance (or frequency) of a value less than or equal to $x$. The "density" form, or "probability density function" (PDF), is the derivative (rate of change) of the CDF. The PDF might not exist (in this restricted sense), but a CDF always will exist. The CDF for a set of observations is called the "empirical density function" (EDF). Thus, its value at any number $x$ is the proportion of observations in the dataset less than or equal to $x$.

References

The following questions contain references to resources about probability distributions:

8590 questions
527
votes
15 answers

What is the intuition behind beta distribution?

Disclaimer: I'm not a statistician but a software engineer. Most of my knowledge in statistics comes from self-education, thus I still have many gaps in understanding concepts that may seem trivial for other people here. So I would be very thankful…
222
votes
4 answers

When (and why) should you take the log of a distribution (of numbers)?

Say I have some historical data e.g., past stock prices, airline ticket price fluctuations, past financial data of the company... Now someone (or some formula) comes along and says "let's take/use the log of the distribution" and here's where I go…
PhD
  • 13,429
  • 19
  • 45
  • 47
204
votes
8 answers

In linear regression, when is it appropriate to use the log of an independent variable instead of the actual values?

Am I looking for a better behaved distribution for the independent variable in question, or to reduce the effect of outliers, or something else?
d_2
  • 2,191
  • 3
  • 14
  • 5
183
votes
2 answers

How to determine which distribution fits my data best?

I have a dataset and would like to figure out which distribution fits my data best. I used the fitdistr() function to estimate the necessary parameters to describe the assumed distribution (i.e. Weibull, Cauchy, Normal). Using those parameters I…
174
votes
6 answers

Can a probability distribution value exceeding 1 be OK?

On the Wikipedia page about naive Bayes classifiers, there is this line: $p(\mathrm{height}|\mathrm{male}) = 1.5789$ (A probability distribution over 1 is OK. It is the area under the bell curve that is equal to 1.) How can a value $>1$ be OK? I…
145
votes
3 answers

Help me understand Bayesian prior and posterior distributions

In a group of students, there are 2 out of 18 that are left-handed. Find the posterior distribution of left-handed students in the population assuming uninformative prior. Summarize the results. According to the literature 5-20% of people are…
Bob
  • 1,451
  • 3
  • 10
  • 3
128
votes
10 answers

Why does the Cauchy distribution have no mean?

From the distribution density function we could identify a mean (=0) for Cauchy distribution just like the graph below shows. But why do we say Cauchy distribution has no mean?
116
votes
4 answers

Assessing approximate distribution of data based on a histogram

Suppose I want to see whether my data is exponential based on a histogram (i.e. skewed to the right). Depending on how I group or bin the data, I can get wildly different histograms. One set of histograms will make is seem that the data is…
guestoeijreor
  • 1,161
  • 3
  • 8
  • 3
98
votes
9 answers

Understanding "variance" intuitively

What is the cleanest, easiest way to explain someone the concept of variance? What does it intuitively mean? If one is to explain this to their child how would one go about it? It's a concept that I have difficulty in articulating - especially when…
PhD
  • 13,429
  • 19
  • 45
  • 47
89
votes
5 answers

Relationship between poisson and exponential distribution

The waiting times for poisson distribution is an exponential distribution with parameter lambda. But I don't understand it. Poisson models the number of arrivals per unit of time for example. How is this related to exponential distribution? Lets say…
user862
  • 2,339
  • 4
  • 27
  • 24
88
votes
7 answers

Calculating the parameters of a Beta distribution using the mean and variance

How can I calculate the $\alpha$ and $\beta$ parameters for a Beta distribution if I know the mean and variance that I want the distribution to have? Examples of an R command to do this would be most helpful.
Dave Kincaid
  • 1,458
  • 1
  • 12
  • 18
77
votes
4 answers

What's so 'moment' about 'moments' of a probability distribution?

I KNOW what moments are and how to calculate them and how to use the moment generating function for getting higher order moments. Yes, I know the math. Now that I need to get my statistics knowledge lubricated for work, I thought I might as well ask…
PhD
  • 13,429
  • 19
  • 45
  • 47
72
votes
5 answers

Intuition on the Kullback–Leibler (KL) Divergence

I have learned about the intuition behind the KL Divergence as how much a model distribution function differs from the theoretical/true distribution of the data. The source I am reading goes on to say that the intuitive understanding of 'distance'…
cgo
  • 7,445
  • 10
  • 42
  • 61
70
votes
3 answers

How is the minimum of a set of IID random variables distributed?

If $X_1, ..., X_n$ are independent identically-distributed random variables, what can be said about the distribution of $\min(X_1, ..., X_n)$ in general?
Simon Nickerson
  • 811
  • 1
  • 8
  • 9
60
votes
5 answers

What is the advantages of Wasserstein metric compared to Kullback-Leibler divergence?

What is the practical difference between Wasserstein metric and Kullback-Leibler divergence? Wasserstein metric is also referred to as Earth mover's distance. From Wikipedia: Wasserstein (or Vaserstein) metric is a distance function defined between…
1
2 3
99 100