First of all, it's not data fitting, but model fitting, since you fit a model to the data, not the other way around.
By fitting a model to the data, we mean finding such parameters of the model that make the model the most aligned with the data that it is trying to approximate. There are many ways of doing that, for example, you can minimize some kind of loss function between predictions made by a model and the data, but with a model defined in terms of a probability distribution, the more natural approach is to use maximum likelihood estimation, or Bayesian approach.
If the probability density function of your distribution if $f(\mathbf{X}, \theta)$, where $\theta$ is a parameter or a vector of parameters, for the distribution, then with the maximum likelihood you would use an optimizer to find
$$
\underset{\theta}{\operatorname{arg\,max}} \;\sum_i \,\log f(\mathbf{X}_i, \theta)
$$
Technically, this is as simple as plugging in the likelihood function to optimizer function like R's optim
. You can find an in-depth explanation and many worked examples in the link above.