8

We have fitted a nonlinear function to observed data. The next step should be the assessment of the goodness of fit of this function (like $R^2$ for linear models).

What are the usual ways to measure this?

Edit 1:

The fitting was performed as follows:

  1. Perform a linear regression with independent variables A and B.
  2. Calculate distribution's parameters from regression parameters. (The distribution is nonlinear and has variable C as an input.)
  3. Assess goodness of fit of nonlinear distribution by comparing estimated to observed data.

Edit 2:

Examples for the steps mentioned above:

  1. Regression model: $log(y) = \beta_0 + \beta_1 \centerdot log(a) + \beta_2 \centerdot log(b)$
  2. $\rho = -\frac{\beta_0}{\beta_1}$ and $\theta = \beta_2$ for the following nonlinear distribution: $f(a) = \rho \centerdot a^{-\theta}$
  3. Assess the goodness of fit of $f(a)$ with a given set of $(a, f(a))$ observations.
Marco
  • 81
  • 1
  • 3
  • 5
    "Goodness of fit" depends on how the fit was performed. For instance, the appropriate GoF measure for a maximum likelihood estimator ought to differ from the GoF measure for a least squares estimator when the random component is not an additive Normal variable. If you have a specific problem in mind, then you might therefore wish to indicate how you performed the fitting. – whuber Sep 08 '14 at 15:58
  • 2
    You might be interested in: Magee, L. (1990). $R^{2}$ measures based on Wald and likelihood ratio joint significance tests. *The American Statistician*, 44(3):250–253, and Pesaran, M. H. and Smith, R. J. (1994). A generalized $R^{2}$ criterion for regression models estimated by the instrumental variables method. *Econometrica*, 62(3):705–710. – Alexis Sep 08 '14 at 16:59
  • 1
    @whuber I have added the description of the performed steps in the questions above. – Marco Sep 09 '14 at 07:16
  • @Alexis Thanks for the references, I will have a look at them. – Marco Sep 09 '14 at 07:16
  • I cannot figure out what you description means. What is the connection between the "observed data," the linear regression, the "distribution," and the "estimated ... data"? A small example or illustration might help clarify what you're doing. – whuber Sep 09 '14 at 14:03
  • @whuber Thanks for your answer. I hope my second edit clarifies it. – Marco Sep 09 '14 at 14:43
  • Thank you. It is interesting that your estimates appear to be derived from one set of data but are applied to predicting another set of data. If that is the case, there is no implicit loss function, so you would need to come up with one. In other words, exactly how would you (*quantitatively*) like to assess lack of fit? Another way to put this is to ask for which ranges of $a$ (or even $f(a)$) you seek highest accuracy and to what degree you would penalize differing amounts of inaccuracy. But why do you refer to $f$ as a "distribution" when you appear to use it only as a functional form? – whuber Sep 09 '14 at 14:52
  • @whuber Yes, there are two sets of data (I abstracted from the exact linkage). I would like to assess the goodness of fit for $a \geq 1$, with a focus on $1 \leq a \leq 400$. I interpret $f(a)$ as the density function of a distribution, but probably the word *function* is more appropriate. – Marco Sep 09 '14 at 15:05
  • 1
    @whuber Would it be a feasible way to calculate the correlation between $f(a)$ and the given observations to assess the goodness of fit? – Marco Sep 11 '14 at 09:31

2 Answers2

2

There maybe more to it, but to me it seems that you just want to determine goodness-of-fit (GoF) for a function f(a), fitted to a particular data set (a, f(a)). So, the following only answers your third sub-question (I don't think the first and second are directly relevant to the third one).

Usually, GoF can be determined parametrically (if you know the distribution's function parameters) or non-parametrically (if you don't know them). While you may be able to figure out parameters for the function, as it appears to be exponential or gamma/Weibull (assuming that data is continuous). Nevertheless, I will proceed, as if you didn't know the parameters. In this case, it's a two-step process. First, you need to determine distribution parameters for your data set. Second, you perform a GoF test for the defined distribution. To avoid repeating myself, at this point I will refer you to my earlier answer to a related question, which contains some helpful details. Obviously, this answer can easily be applied to distributions, other than the one mentioned within.

In addition to GoF tests, mentioned there, you may consider another test - chi-square GoF test. Unlike K-S and A-D tests, which are applicable only to continuous distributions, chi-square GoF test is applicable to both discrete and continuous ones. Chi-square GoF test can be performed in R by using one of several packages: stats built-in package (function chisq.test()) and vcd package (function goodfit() - for discrete data only). More details are available in this document.

Aleksandr Blekh
  • 7,867
  • 2
  • 27
  • 93
  • 1
    The link to "my earlier answer" no longer works, as the question has probably been removed. – Amonet Mar 17 '20 at 17:19
  • 1
    @Amonet Thank you for letting me know. I was able to get access to the deleted Q&A and recovered them as a public Gist. Please see https://gist.github.com/ablekh/d1aedfb324cb9dab74a8f39c6952024c#gistcomment-3216073. Hope this helps. – Aleksandr Blekh Mar 17 '20 at 18:58
0

Well, in Machine Learning the thing called Cross Validation is performed pretty often for purpose of model testing (test if that type of a model with these hyper-parameters - like number of degrees of freedom or whatever - fits your problem) - you split your data several times into train and test data sets, then run optimization over training set and compute whatever quality over tests data. The most confidential way is to run so called "QxT-fold cross validation". The pseudocode could could like:

cv_values = []
for t in range(T):
    split = randomsplit(data, number_of_parst = Q)
    for test_id in range(Q):
        model.fit(split[:test_id] + split[test_id + 1:] # test on everything excepting test_id
        cv_values.append(model.test(split[test_id]))

cv_values.mean() # whatever
MInner
  • 263
  • 2
  • 9
  • Thanks for your answer. So, how does your *model.test(...)* function look like? – Marco Sep 09 '14 at 12:57
  • @Marco The `model.fit` function depends on what your specific model look like. The code can be used, however, for any kind of model. – C.K. Jan 04 '21 at 02:49