Data fitting to multivariate distribution

Question

Data fitting is the process of fitting models to data and analyzing the accuracy of the fit. The models consist of common probability distribution (e.g. normal distribution). The data are two-dimensional arrays.

I want to know is there a way to do data fitting with a multivariate probability distribution function? I am familiar with both MATLAB and Python.

Also if there is an answer in R for it, it would help me.

I have used an R package to fit bivariate poisson in the past `bivpois`. I know there are packages for bivariate conway maxwell poisson too — Fierce82, Apr 24 '21 at 09:14
@Fierce82 Is there any package in R that fits the PDF to data and shows which one has the best match? — MOSTAJKIR, Apr 24 '21 at 10:36
check this excellent answer here https://stats.stackexchange.com/questions/132652/how-to-determine-which-distribution-fits-my-data-best — Fierce82, Apr 24 '21 at 11:22
@Fierce82 I read the link. What is the link tries to say is about one-dimensional value. There is no information about fitting to multivariate probability distribution function. — MOSTAJKIR, Apr 24 '21 at 12:50

Tim · Answer 1 · 2021-04-24T13:33:56.187

First of all, it's not data fitting, but model fitting, since you fit a model to the data, not the other way around.

By fitting a model to the data, we mean finding such parameters of the model that make the model the most aligned with the data that it is trying to approximate. There are many ways of doing that, for example, you can minimize some kind of loss function between predictions made by a model and the data, but with a model defined in terms of a probability distribution, the more natural approach is to use maximum likelihood estimation, or Bayesian approach.

If the probability density function of your distribution if $f(\mathbf{X}, \theta)$, where $\theta$ is a parameter or a vector of parameters, for the distribution, then with the maximum likelihood you would use an optimizer to find

$$ \underset{\theta}{\operatorname{arg\,max}} \;\sum_i \,\log f(\mathbf{X}_i, \theta) $$

Technically, this is as simple as plugging in the likelihood function to optimizer function like R's optim. You can find an in-depth explanation and many worked examples in the link above.

I think I can't express my problem: There are data in two dimensional. There are bi-PDFs. I want to find which bi-PDF is good for this data. I think you say: There is a bi-PDF that must be fitted to data. Am I wrong? — MOSTAJKIR, Apr 24 '21 at 18:07
@MOSTAJKIR you can still compare the likelihoods of different model, or even use a likelihood ratio test if you want to be more formal. — Tim, Apr 26 '21 at 13:54

Data fitting to multivariate distribution

1 Answers1