5

Usually, when a difference of a statistic is discussed, that discussion is presented in the context of a significance of that difference. When self-entropy, i.e., information content, is examined, especially, but not only when non-nested models are compared, we use the lower value of the AIC, AICc, BIC or other information content index to suggest what the better model is. However, more generally, entropy is case-wise, i.e., data-wise, variable.

Question With what certainty do we know, based on comparative information content indices from a particular data set, that that lower index value properly suggests the correct model more generally for a less limited data set?

I feel that non-nested model comparison of information content is not always relevant in all circumstances, for example see this Q/A. Nesting is when all of the models tested can be derived by eliminating parameters from a parent model. Non-nesting is when the models contain parameters that are not in a set with subset(s) format.

I really would appreciate any insight into the variability of comparison of information content for either nested or non-nested models in the context of subset data.

Carl
  • 11,532
  • 7
  • 45
  • 102
  • 1
    Same problem here: https://stats.stackexchange.com/questions/361065/when-is-the-aic-a-good-model-selection-criterion-for-forecasting-and-when-is-it/361279#361279 – Skander H. Oct 10 '18 at 20:37
  • @Alex I saw that question and answer. The person answering was not entirely satisfied with the answer given, and although the discussion is somewhat related to the topic here, it is not as focused. For example, the mechanics of what AIC/BIC etc. are extracting from the data are not explored, and that is the focus here. – Carl Oct 10 '18 at 20:46

2 Answers2

5

The difference in AIC (or BIC) for two models is twice the log-likelihood ratio minus a constant: it follows immediately that in any particular case selecting the AIC corresponds to performing a likelihood-ratio test, but that in different cases it corresponds to tests of different significance levels.

With nested models, the null hypothesis has to be that the smaller model holds. Given some regularity conditions, Wilks' theorem applies; so if $p$ is the difference in the number of free parameters between the models, asymptotically the probability of AIC's selecting the larger model when the smaller one in fact holds is the probability that a chi-squared r.v. with $p$ degrees of freedom exceeds $2p$. For $p=1$ the significance of the test is 0.157; for $p=2$, 0.135; & so on. When exact tests are possible the distribution of the log-likelihood ratio of course depends on precisely what the models are.

With non-nested models, even finding an asymptotic distribution for the log-likelihood ratio involves the calculation of rather complicated expectations under the null (see Cox's or Vuong's papers referenced in Generalized log likelihood ratio test for non-nested models & Comparison of log-likelihood of two non-nested models). I doubt much can be said in general about the significance of a difference in AIC.

The moral has already been given, pithily, by @RichardHardy:

How do you define what is a "correct model"? Is it the data generating process (DGP)? If so, why would you be using AIC trying to identify the DGP? The question AIC is answering is not "Which of the models is the DGP?". Try asking a different question, such as "Which model will give better predictions under a certain type of loss (associated with the likelihood being used)?", and you might find that AIC is answering "correctly" (or perhaps not?). That is, use a hammer for hammering nails

Problems with the AIC include its accuracy in small samples (the bias-correction term is only to first order) & when neither model is particularly close (in the sense of Kullback–Leibler divergence) to the true model; but it can't fairly be criticized for not doing what it wasn't made for: there are hypothesis tests for that.

Scortchi - Reinstate Monica
  • 27,560
  • 8
  • 81
  • 248
  • 1
    Shucks, I wanted to award you the bounty, so as to not waste the points. I thought that would happen automatically, but, it didn't. – Carl Oct 18 '18 at 00:18
  • 1
    However, I can upvote your answer, not because I believe it (or disbelieve it), but because it contributes substantially to the discussion. (+1). – Carl Oct 18 '18 at 03:28
  • From [Information Criterion for Minimum Cross-Entropy Model Selection](https://arxiv.org/pdf/1704.04315) "To simplify the model complexity penalty term in the AIC, Akaike (1974) makes the strong assumption that the true distribution of data belongs to the parametric distribution family being considered." Thus, AIC assumes that the correct model is being considered. Asking me to define it is besides the point, it is already assumed. Rather, to defend AIC as correct, the burden of proof is on its proponents. – Carl Oct 18 '18 at 17:16
  • Moreover, I do not see how in completely non-nested models, like ND and UD, one can make a correct model assumption from "within a parametric distribution family." Once again, the burden of proof for non-nesting case is on its proponents. All I have shown in my answer is that "squaring the circle" does not necessarily have a "Happy ending." – Carl Oct 18 '18 at 17:28
  • @Carl: Yes, the trace of a matrix is assumed to be the no. free parameters - only exactly true when the true model's in the model family, else as an approximation which gets worse as the best-fit model gets further from the true one. Check Pawitan (2001), or another account of the derivation of AIC & perhaps ask a new question if it inspires one. – Scortchi - Reinstate Monica Oct 18 '18 at 17:31
  • Entropy is not a basis for comparing different physical systems, except in a self-information content comparison context, because the information encoding efficiency ([Kolmogorov complexity](https://en.wikipedia.org/wiki/Kolmogorov_complexity)) is different for different physical systems. I am having trouble suspending disbelief with respect to the AIC assumptions, or stated otherwise, it looks increasingly to me to be a method of mathematical convenience as contrasted with a physically general approach. I am working on trying to understand this, but it is a slog. – Carl Oct 18 '18 at 18:56
  • 1
    @Carl: There's something to that, as K-L divergence is hardly the only way to quantify how far a fitted model is from the truth - I probably wouldn't have heard of it if, *per impossible*, it weren't what's minimized by ML estimation. Nevertheless, even when you know the true model for a fact, bar *p* parameters, there's a question remaining of whether, when you estimate those parameters from a particular data-set, you'll end up with a fit that's closer than if you estimate the parameters of another, simpler, model; & that's the question AIC's designed to answer. – Scortchi - Reinstate Monica Oct 19 '18 at 16:02
  • Suppose we have a model consisting of exponential decay of a single decay scheme radioisotope, e.g., tritium. Then if we fit the data with a [Lagrange polynomial](https://en.wikipedia.org/wiki/Lagrange_polynomial) R$^2$, adjusted R$^2$, AIC, BIC, and the like will select the perfectly fit polynomial over the decay generated exponential distribution. And, it will be totally wrong. That is, without modelling noise, we have no yardstick for measuring goodness of fit in a noisy system, and current methods seem too naive for that rather simple problem. – Carl Apr 24 '21 at 23:28
1

Suppose one generates values from a standard normal distribution, $\mathcal{N}(0,1)$. If we have only generated two values, $n=2$, then we have a discrete uniform distribution, not a convincingly discrete approximation of a normal distribution. Indeed, this is true for any $n=2$, no matter which generating distribution gave rise to those values, a discrete uniform distribution is a default result. Normal and uniform distributions are non-nested with respected to each other. Indeed, they have very different shapes. If we generate $\mathcal{N}(0,1)$ for increasing $n$ and examine AIC for fitting with a normal distribution versus a uniform distribution, even though we know that our generating function is $\mathcal{N}(0,1)$, AIC will not always be lesser for a normal distribution fit than for a uniform distribution fit. The plot below shows how many times out of 1000 repetitions AIC for a normal distribution model was better (less than) AIC for a uniform distribution model for $n$ varying from $n=5$ to $n=100$.

enter image description here

As can be seen in the image, AIC for a normal distribution (i.e., the correct answer) was only selected to be better than a uniform distribution 395 times out of 1000 trials or 39.5% of the time for $n=5$. This increased to 949 times out of 1000 trials for $n=100$, a value still having an error rate of slightly more than 5%. It is said that AIC is asymptotically correct, and that appears to be correct. BTW, BIC makes the same choices for both 2 parameter models as AIC. But is that useful for small to moderately sized values of $n$?

Above is an example of observed probability of model selection. It is claimed that the likelihood of AIC choosing a correct model is as follows:

The quantity $\exp\frac{\text{AIC}_{min} − \text{AIC}_i}{2}$ is known as the relative likelihood of model $i$. It is closely related to the likelihood ratio used in the likelihood-ratio test. Indeed, if all the models in the candidate set have the same number of parameters, then using AIC might at first appear to be very similar to using the likelihood-ratio test. There are, however, important distinctions. In particular, the likelihood-ratio test is valid only for nested models, whereas AIC (and AICc) has no such restriction.

Now note that the likelihood above can be reciprocated. That is, if model A is twice as likely as model B, then model B is one-half as likely as model A. In the current context, we are not dealing with likelihoods, we created a Monte Carlo simulation with truth data, such that we observed the probability of making the correct choices. We have observed in this simulation that the likelihood of making the correct choice is heavily influenced by $n$, the number of observations, and that unless $n$ is large, we did not seem to get reliable answers.

About the program: lists are initialized as normal distribution (nd) AIC (ndAlist), nd BIC (ndBlist), uniform distribution (ud) AIC and BIC (udAlist, udBlist). Two do loops are used. The outer do loop increments $n$ from 5 to 100 in increments of $n=5$. The inner do loop (1) creates $n$ random variates (named dat) from an $\mathcal{N}(0,1)$. Then (2) creates an emperical CDF named edistdata from dat. (3) Defines cdfn and cdfu functions for fitting from the CDFs of nd and ud. (4) Best fits by variation of parameters of cdfn and cdfu to edistdata. Note: Fitting to CDFs rather than PDFs markedly decreases noise and is a common procedure. This is done, rather than, for example, using mean and variance to calculate nd or min and max to calculate ud because the fitting uses a single algorithm for both nd and ud and that NonlinearModelFit routine outputs AIC and BIC for the models as well as parameters as options for the output nlmn and nlmu fit outputs, e.g., as nlmn["AIC"]. Note: it is assumed that the AIC and BIC fit parameters are correctly calculated using ML as the contrary case would be meaningless.

(*Mathematica Program*)
ndAlist = {};
ndBlist = {};
udAlist = {};
udBlist = {};
Do[
 AICndlist = {};
 BICndlist = {};
 AICudlist = {};
 BICudlist = {};
  Do[dat = 
   RandomVariate[NormalDistribution[0, 1], n, WorkingPrecision -> 40];
   edistdata = Table[{x, CDF[EmpiricalDistribution[dat], x]}, {x, dat}];
   cdfn[a1_, a2_, x_] := CDF[NormalDistribution[a1, a2], x];
   cdfu[b1_, b2_, x_] := CDF[UniformDistribution[{b1, b2}], x];
   nlmn = NonlinearModelFit[edistdata, cdfn[a1, a2, x], {{a1, 0}, {a2, 1}}, x];
   nlmu = NonlinearModelFit[edistdata, cdfu[b1, b2, x], {{b1, -2}, {b2, 2}}, x]; 
   AICndlist = AppendTo[AICndlist, nlmn["AIC"]]; 
   BICndlist = AppendTo[BICndlist, nlmn["BIC"]]; 
   AICudlist = AppendTo[AICudlist, nlmu["AIC"]]; 
   BICudlist = AppendTo[BICudlist, nlmu["BIC"]],
  {i, 1, 1000}];
 ndA = 0.; udA = 0.; ndB = 0.; udB = 0.;
 Do[If[AICndlist[[j]] < AICudlist[[j]], ndA = ndA + 1, udA = udA + 1], 
 {j, 1, 1000}];
 Do[If[BICndlist[[j]] < BICudlist[[j]], ndB = ndB + 1, udB = udB + 1], 
 {j, 1, 1000}];
Print["n: ", n, "\nAIC nd/1000: ", ndA, "\tAIC ud/1000: ", udA, "\nBIC nd/1000: ", ndB, "\tBIC ud/1000: ", udB];
ndAlist = AppendTo[ndAlist, {n, ndB}]; 
ndBlist = AppendTo[ndBlist, {n, ndB}], {n, 5, 100, 5}]
Print[ndAlist]
ListPlot[ndAlist, AxesLabel -> {"n", "AIC ND < AIC UD"}, PlotRange -> {{0, 100}, {0, 1000}}, PlotRangePadding -> {{0, 1}, {0, 0}}]

(Numerical Output)

n: 5
AIC nd/1000: 395.   AIC ud/1000: 605.
BIC nd/1000: 395.   BIC ud/1000: 605.

n: 10
AIC nd/1000: 572.   AIC ud/1000: 428.
BIC nd/1000: 572.   BIC ud/1000: 428.

n: 15
AIC nd/1000: 684.   AIC ud/1000: 316.
BIC nd/1000: 684.   BIC ud/1000: 316.

n: 20
AIC nd/1000: 725.   AIC ud/1000: 275.
BIC nd/1000: 725.   BIC ud/1000: 275.

n: 25
AIC nd/1000: 769.   AIC ud/1000: 231.
BIC nd/1000: 769.   BIC ud/1000: 231.

n: 30
AIC nd/1000: 777.   AIC ud/1000: 223.
BIC nd/1000: 777.   BIC ud/1000: 223.

n: 35
AIC nd/1000: 811.   AIC ud/1000: 189.
BIC nd/1000: 811.   BIC ud/1000: 189.

n: 40
AIC nd/1000: 841.   AIC ud/1000: 159.
BIC nd/1000: 841.   BIC ud/1000: 159.

n: 45
AIC nd/1000: 848.   AIC ud/1000: 152.
BIC nd/1000: 848.   BIC ud/1000: 152.

n: 50
AIC nd/1000: 848.   AIC ud/1000: 152.
BIC nd/1000: 848.   BIC ud/1000: 152.

n: 55
AIC nd/1000: 877.   AIC ud/1000: 123.
BIC nd/1000: 877.   BIC ud/1000: 123.

n: 60
AIC nd/1000: 886.   AIC ud/1000: 114.
BIC nd/1000: 886.   BIC ud/1000: 114.

n: 65
AIC nd/1000: 900.   AIC ud/1000: 100.
BIC nd/1000: 900.   BIC ud/1000: 100.

n: 70
AIC nd/1000: 901.   AIC ud/1000: 99.
BIC nd/1000: 901.   BIC ud/1000: 99.

n: 75
AIC nd/1000: 914.   AIC ud/1000: 86.
BIC nd/1000: 914.   BIC ud/1000: 86.

n: 80
AIC nd/1000: 932.   AIC ud/1000: 68.
BIC nd/1000: 932.   BIC ud/1000: 68.

n: 85
AIC nd/1000: 935.   AIC ud/1000: 65.
BIC nd/1000: 935.   BIC ud/1000: 65.

n: 90
AIC nd/1000: 946.   AIC ud/1000: 54.
BIC nd/1000: 946.   BIC ud/1000: 54.

n: 95
AIC nd/1000: 952.   AIC ud/1000: 48.
BIC nd/1000: 952.   BIC ud/1000: 48.

n: 100
AIC nd/1000: 949.   AIC ud/1000: 51.
BIC nd/1000: 949.   BIC ud/1000: 51.

{{5,395.},{10,572.},{15,684.},{20,725.},{25,769.},{30,777.},{35,811.},{40,841.},{45,848.},{50,848.},{55,877.},{60,886.},{65,900.},{70,901.},{75,914.},{80,932.},{85,935.},{90,946.},{95,952.},{100,949.}}

Plot of output was shown above.

The answer here echos the results of a recent paper by Yafune et al. A Note on Sample Size Determination for Akaike Information Criterion (AIC) Approach to Clinical Data Analysis, which unfortunately is behind a paywall. Those authors state in their discussion that "AIC is generally used without paying attention to the probabilities corresponding to the power of statistical tests. Since AIC is usually used for exploratory analysis, it is often difficult to determine the sample sizes in advance. For such cases, it is desirable to investigate afterwards whether the sample sizes are large enough by checking the probabilities corresponding to the power of statistical tests. If the sample sizes are not large enough, it is possible that the AIC approach does not lead us to the conclusions which we seek."

To this we would only add that the sample sizes indicated in that paper can easily exceed 100 for a power of 0.8.

Carl
  • 11,532
  • 7
  • 45
  • 102
  • 1
    An interesting tread. I am not sure if my comment is relevant here, but let me try. How do you define what is a "correct model"? Is it the data generating process (DGP)? If so, why would you be using AIC trying to identify the DGP? The question AIC is answering is not "Which of the models is the DGP?". Try asking a different question, such as "Which model will give better predictions under a certain type of loss (associated with the likelihood being used)?", and you might find that AIC is answering "correctly" (or perhaps not?). That is, use a hammer for hammering nails. – Richard Hardy Oct 11 '18 at 13:51
  • @RichardHardy For $n=2$ realizations within the support range of any two parameter model, the loss function from a proper model fit should be {0,0} meaning that AIC is indeterminate, and any other goodness-of-fit measurement would be as well. The loss function for $n>2$ realizations and a uniform distribution model is asymptotically a normal distribution with an asymptotic to zero constant subtracted from those ($y$-axis) realizations. I think the post shows that the hammer has a hole in its head for $n$ small to medium sized. How this relates to other hammers, I do not know. – Carl Oct 11 '18 at 21:05
  • @RichardHardy Also, yes, the DGP is what we are trying to reconstruct. I think that there is generic problem, even when considering the asymptotic case related to when is the support space is covered in a fine enough mesh of values to [reduce the difference between a random variate of a pdf and the pdf itself enough that they tend to the same function](https://stats.stackexchange.com/q/273185/99274). – Carl Oct 11 '18 at 21:17
  • Regarding *the loss function from a proper model fit should be {0,0}*: perhaps we understand differently what a loss function is in prediction context. What I meant by a "loss function" is a convex, scalar-valued function $f$ with a minimum at zero, the argument of which is the difference between the prediction and the corresponding true value. – Richard Hardy Oct 12 '18 at 05:38
  • Given the argument (the difference between the prediction and the corresponding true value) of 0, the loss would normally be zero. Loss functions would normally be well defined at 0, so that should not be a problem. – Richard Hardy Oct 12 '18 at 09:39
  • 2
    Recall that AIC's used to compare pre-specified models with unknown parameters fit to observations by maximum likelihood. If I've read this right, you're defining a discrete uniform distribution on a support determined by the observations. – Scortchi - Reinstate Monica Oct 13 '18 at 09:35
  • @Scortchi I am not sure. AIC implies a lot of assumptions, I have not sorted them all out, but for certain things, there is no need to do that, for example, to produce a [possible counter example](https://stats.stackexchange.com/q/369850/99274) for non-nesting utility of AIC. Where do you want to go with this? – Carl Oct 13 '18 at 10:08
  • 1
    There are some regularity conditions that determine whether AIC has various properties or not (please don't ask me about them!); but AIC is *defined* as twice the negative log-likelihood at the maximum-likelihood parameter estimates plus twice the number of free parameters estimated, so your simulation has to be be of fitting two different models to the same data by maximum likelihood, else it makes no sense. For example, you could fit the rate of an exponential distribution, & fit the log-location & log-scale of a log-normal distribution. – Scortchi - Reinstate Monica Oct 13 '18 at 11:33
  • @Scortchi I fitted the models to the same data, noted which model had the lower AIC, varied the data, noted again which model had the lower AIC, repeated that 1000 times and observed what the totals number of lower AIC values were for each model. When AIC is used to compare two models only once on a single data set, it is interpreted as an odds ratio, and an odds ratio is not a definitive outcome, just a propensity of having an outcome. Does that help? Do you want the Mathematica code? – Carl Oct 13 '18 at 19:46
  • @Scortchi In other words, the odds ratio is *post hoc*. It is predictive if and only if the first model's AIC is less than the second model's AIC. That is, iff $\text{AIC}_1 – Carl Oct 13 '18 at 20:37
  • @Scortchi Gee, a lot of discussion here. Let's take a hypothetical. Suppose the odds ratio is 2 for a given outcome. That means that it is twice as likely that $\text{AIC}_1 – Carl Oct 13 '18 at 20:55
  • 1
    What would help is explaining precisely what the models are, what their free parameters are, how you're estimating those, & how you calculate AIC when the observations are discrete under one & continuous under the other. You could illustrate the explanation with code – Scortchi - Reinstate Monica Oct 14 '18 at 13:16
  • @Scortchi I cannot explain everything about the routines without also posting on the Mathematica site and asking, because the documentation for Mathematica is not as explicit as the explanations you are asking for. Briefly, I echoed how fitting for such functions is done on the Mathematica site. More later. – Carl Oct 14 '18 at 14:21
  • Akaike weights? – EngrStudent Oct 15 '18 at 18:34
  • 1
    @EngrStudent What about weights? In this case, the weights are not revealing as both models have two free parameters. Same for BIC. As $n$ is the same for both models, the choice of $n$ does not alter which BIC is the lesser value. It is the fit itself that alters the AIC/BIC selected model and there is no difference between AIC and BIC selection of the better model in this case. – Carl Oct 15 '18 at 19:19
  • @Scortchi OK, Mathematica program now included in answer. – Carl Oct 15 '18 at 20:20
  • 1
    Thanks! I see, I think: you're estimating the two boundaries of a *continuous* uniform distribution through fitting a straight line by least squares to the empirical distribution function; estimating the mean & std deviation of a normal distribution through fitting a normal ogive by least squares to the EDF; & assuming normally, independently, distributed errors when calculating AICs. Taking this Byzantine approach means the actual statistical models you're fitting & comparing are so wildy different from those you state an interest in that any quantitative comparison using ... – Scortchi - Reinstate Monica Oct 17 '18 at 10:56
  • 1
    ... likelihood or information criteria should be taken with a very large pinch of salt. Furthermore, for ML estimation to minimize the Kullback-Leibler divergence from the fitted model to the true one, there are some regularity conditions, & I'm pretty sure that the support doesn't depend on free parameters is among them. And finally, to hark back to @RichardHardy's comment, AIC doesn't purport to aid in finding the true model, so calculating error rates from using it to do so is rather beside the point when evaluating its utility. – Scortchi - Reinstate Monica Oct 17 '18 at 11:03
  • @Scortchi No straight lines involved. The fit is of the CDFs, which are continuous sigmoidal functions, to the CDF of data, which latter is technically a continuous step function, this time of only approximately sigmoidal shape. The fitting is not "Byzantine" it is as near to superlative as it gets, with almost no noise error of fitting because the noise is cumulatively damped. Run some error statements if you do not believe me. It is a widely used algorithmic approach, possibly the best available, and is perfectly general. It yields the same parameters as those of a PDF. – Carl Oct 17 '18 at 11:41
  • @Scortchi Because CDFs are much more linear than PDFs, the fitting process is typically not problematic, good luck doing it else-wise. Regarding Richard Hardy's comment, I agree that AIC cannot find a model properly, I think the unsuspected new information here is that AIC for small to moderate size $n$, is certainly not trustworthy. I could be wrong, but do not see how, and it is hard to discount truth data like the above without having a very good reason to do so. Let me ask you this, how does one verify the theoretical likelihood calculations? What evidence is there for their correctness? – Carl Oct 17 '18 at 12:03
  • 1
    The distribution function of a continuous uniform random variable is most certainly a straight line; one disadvantage of using a linear least-squares fit to estimate the boundary parameters is that there's no guarantee that the estimate of the upper boundary is greater than or equal to the sample maximum or that the estimate of the lower boundary is less than or equal to the sample minimum. And I'll eat my hat if you can find a real-life example of anyone estimating the parameters of a normal distribution other than through the sample mean & variance. – Scortchi - Reinstate Monica Oct 17 '18 at 14:42
  • 1
    But my main point, again: a necessary condition for comparing models by AIC is that you estimate their parameters directly by maximum-likelihood. And it's not even news that selection by AIC can still fail to pick the true model from a candidate set (assuming it's in the set) as sample sizes get larger. Pawitan (2001), *In all Likelihood*, Ch. 13 gives the motivation for & derivation of AIC. – Scortchi - Reinstate Monica Oct 17 '18 at 14:43
  • @Scortchi Example of [CDF usage](https://mathematica.stackexchange.com/a/61731/42558). I can run a direct comparison of Mean[list] and Variance[list] against the ND values from NonLinearModelFit, and Min[list] and Max[list] against the fit results for UD. Furthermore, I can test whether the AIC results are ML or not. To think that they are not ML results is a punt, where the bank is Wolfram. These are likely low yield tests, and I will do these tests only if you insist. – Carl Oct 17 '18 at 20:05
  • 1
    (1) The Mathematica SE site doesn't count as real life. (2) To what end? (3) I'm sure Mathematica works out AIC properly (so don't test on my account), but you've changed the statistical model. Estimate the parameters *directly* by maximum likelihood. – Scortchi - Reinstate Monica Oct 17 '18 at 20:24
  • @Scortchi The parameters are tested for directly and much more reliably from their CDFs. Give me a reason to use a more error prone technique other than not being familiar with it, please. – Carl Oct 17 '18 at 22:44
  • @Scortchi [CDF usage](https://www.mathworks.com/help/stats/examples/fitting-a-univariate-distribution-using-cumulative-probabilities.html) Actually, I would assume that the fit routine uses ML to fit the CDFs to yield valid AIC, or BIC, so this is not an either/or CDF/ML, it is both CDF and ML. Also, CDF of UD is a sigmoidal function of line segments, not exactly a straight line, although the difference is small. – Carl Oct 17 '18 at 23:04
  • CDF Uniform distribution is $$\begin{array}{cc} \bigg{\{ }& \begin{array}{cc} \frac{x-\min }{\max -\min } & \min \leq x\leq \max \\ 1 & x>\max \\ 0 & \text{Otherwise} \\ \end{array} \\ \end{array}$$ – Carl Oct 17 '18 at 23:11