I am fitting distributions on observation data to make generalizations about the frequencies of different types of natural events. Right now I am focused on the generalized extreme value (GEV) distribution.
The PDF of a standardized GEV (loc = 0, scale = 1) is pasted from the SciPy documentation below
$$ f(x, c) = \begin{cases} \exp(-\exp(-x)) \exp(-x) &\text{for } c = 0\\ \exp(-(1-c x)^{1/c}) (1-c x)^{1/c-1} &\text{for } x \le 1/c, c > 0 \end{cases}$$
A fit for the GEV can be obtained using Maximum Likelihood Estimation (MLE) or Method of Moments (MM) in SciPy or the R extRemes package.
I have noticed the TensorFlow package can also be used to model the GEV distribution (with methods like experimental_fit
, currently not implemented for the GEV subclass). I was curious if this implementation could be used to fit a distribution to data in the same way the Scipy/extRemes package could be, and I also found the following:
- Maximum likelihood estimation with Tensorflow explains how to estimate a probability distribution with MLE, and implements an optimization using the Adam optimizer (generally used for deep learning)
- Tensorflow with Custom Likelihood Functions uses the BFGS minimize optimizer (under Maximum Likelihood Estimation in TensorFlow) and a MCMC technique (under MCMC in TensorFlow)
- Optimizers in TensorFlow Probability guides you through using the BFGS and L-BFGS optimizers in TensorFlow probability
- Fitting a normal distribution with tensorflow probability demonstrates how the keras sequential implementation can be used to fit the normal distribution (again, using the Adam optimizer)
I am assuming that the second link's use of the BFGS optimizer, along with TensorFlow's guide on using quasi Newton methods, are the ones most similar to SciPy/genextReme's implementation, though I am unsure.
Will using the TensorFlow's BFGS optimizer along with the implementation for the GEV distribution act in the same way the SciPy/extRemes package do? Why would somebody use the Adam optimizer to converge the distribution's parameters towards the maximum likelihood estimate?