What is the proper way to compute the r-squared between a binned distribution of observed values and a continuous probability density function?

Question

I have circular data -- observations each of which falls between -180 degrees and +180 degrees -- divided into a 15-bin histogram.

I'd like to see how well a continuous PDF -- specifically a mixture of a von Mises distribution and a uniform distribution, with particular parameters -- fits the observed histogram.

And to determine this fit, I'd like to use the r-squared statistic. I'd like to leave aside the question of whether I should be using r-squared or something else. My choice of r-squared is based on Zhang & Luck, Nature, 2008 and Zhang & Luck, Psychological Science, 2009, work I'm trying to replicate. (These papers did exactly what I'm describing I want to do -- compute the r-squared between a 15-bin histogram of circular data and the mixture model.) But if you'd like to suggest a better method, and can describe it clearly, I'd be happy to try it out.

My question is, how should I compute the r-squared? Should I bin the continuous function, and then compare the PDF bin heights to the observed bin heights? Should I take the mean of the PDF over the range spanned by each bin of the observed data? Should I compare the bin centers to the corresponding points in the continuous function?

You may feel obliged to use R-squared but that does not oblige us to feel that compulsion. My own view is that R-squared makes no sense unless you have observed and expected values of a variable (not a density). If anything it makes even less sense here without modifying predictions modulo 360 degrees. This is not an attack and you should not feel that you have to inject personal comments in defence or justification. Nevertheless the onus really is on you to show that the idea makes sense. — Nick Cox, Jul 26 '13 at 19:03
Nick, thanks for the reply. I have two responses: 1) I think it's legitimate for me to ask *how* something should be done, without regard for *whether* it should be done. Those are two separate questions: People can disagree about which method is best, but agree on the proper way to perform a certain method, whether or not they think it's the best one. 2) You say "R-squared makes no sense unless you have observed and expected values of a variable." This is exactly what I have. I have observed frequencies in my histogram, and my PDF provides expected values for these frequencies. — , Jul 26 '13 at 19:12
Your 1) is your rule about what you think is a fair question, but others may have their own ideas about what is fair technical comment. On 2), a density is continuously defined, and only in very weak sense analogous to a response with a finite set of measurements for which $R^2$ can be calculated. You'll have to hope that others are more inclined to answer your question in your own terms. Certainly you can bin into frequencies, but any fit measure will depend arbitrarily on those bins, and needs to take the circular scale into consideration. — Nick Cox, Jul 26 '13 at 19:20
"Certainly you can bin into frequencies, but any fit measure will depend arbitrarily on those bins." So if I bin my data -- which I need to do due to the limited number of observations -- I have no hope of obtaining a useful fit measure? If this is not what you meant to say, can you suggest a useful fit measure? If I could change the problem, I would -- say by not having circular data, or by not needing to bin -- but these issues are fundamental to my research -- they're not my whims. — , Jul 26 '13 at 19:24
I'd guess what you're asking for is (most like) a Kuiper-type measure comparing the fitted distribution function and the empirical distribution function, but I doubt that's a canned calculation in any software, because it requires some numerical integration. I don't approach circular distribution fitting in this way at all: I just compare two or more model fits and see which is closer, taking systematic departure as a bad sign. Trying to wrap the comparison into one overall measure does not help very much in my experience, quite apart from my earlier comments. — Nick Cox, Jul 26 '13 at 19:32
I didn't read the Zhang work you linked. Just provide a brief description of the methods in your question. — AdamO, Jul 26 '13 at 19:39
Consider the case where you have equal counts in all 15 bins, assumed of equal widths. Obviously the uniform distribution is a perfect fit. But then (upon comparing the counts to their expected values with the uniform distribution) the R^2 is not even defined. Worse, the slightest variation in the counts *will* allow R^2 to be calculated, but its value will be practically arbitrary, somewhere between -1 and 1. This alone shows that using R^2 is inappropriate and likely to lead to erroneous results. — whuber, Jul 26 '13 at 19:41
**This question appears not to characterize its references correctly.** [Zhang & Luck 2008](http://media.wix.com/ugd/049fb3_99f64b76bc92288d138dc59105b36794.pdf) actually use *chi-squared and Kolmogorov-Smirnov* tests to assess the goodness of fit; the r^2 relates to a different question. (See the Supplementary Notes p. 1.) Thus, if the concern truly is about goodness-of-fit, then referring to r^2 is irrelevant (and actually misleading as argued earlier in these comments). — whuber, Jul 26 '13 at 21:07
@whuber: From Zhang & Luck, 2008: "As evidence that our mixture model with three parameters provides an adequate description of the data, we computed the adjusted $r^2$ statistic (which reflects the proportion of variance explained by the model)." — , Jul 26 '13 at 21:19
My apologies. I did not realize explanation of variance and goodness of fit were such radically different measures. I've updated the question statement. But given your critiques of $r^2$, I'm confused about how it can be a good measure of how much variance is explained. — , Jul 26 '13 at 21:22
But really, I don't know that I should be apologizing, because if you read Zhang & Luck, 2008, they specifically refer to "demonstrating goodness of fit" when referring to their $r^2$ analysis. And when $r^2$ is low for certain subjects, they refer to the "fits" as being "worse." Any imprecision in my statement of the question is borrowed from the paper I cited, so your accusation that I've mischaracterized my references is unfair. — , Jul 26 '13 at 21:33
More to the point: Supplementary Table 1 in Zhang & Luck (2008): "Adjusted $r^2$ values indicating the goodness of fit." Please remove your comment that I've not characterized my references correctly. — , Jul 26 '13 at 21:43
Given that r-squared is widely considered a measure of goodness of fit, and given that Zhang & Luck did calculate it as one of their measures of goodness of fit, I've put the original language back in my question. — , Jul 27 '13 at 04:14
I believe Z&L are referring to something other than a distributional fit. They are assessing a different relationship when they are trying to "explain variance." Using $r^2$ to assess a fit of a *distribution* would be a gross, elementary error, one that I doubt would get past the reviewers or readers of *Nature*. — whuber, Jul 28 '13 at 21:35
Statistical errors are actually quite common in high-impact journals like *Nature*. See, for example, the report of errors in [this paper](http://www.nature.com/neuro/journal/v14/n9/abs/nn.2886.html). Some of you may be tempted only to glance at the paper under the assumption that this is enough to make you an authority on it. I urge you to resist this temptation. — , Jul 29 '13 at 06:04

score 2 · Answer 1 · edited Apr 13 '17 at 12:56

2

You can plot empirical data vs your distribution with fitted parameters. Please see the answer of this link and plot histogram of your original data and smooth histogram of your fitted data as data1 is the original data and data2 is your fitted data using MLE estimates.

edited Apr 13 '17 at 12:56

Community

1

answered Jul 27 '13 at 04:50

SAAN

531
5
16

1

The idea of visualizing the data is good, but comparing histograms is inadvisable: there are much better ways to compare distributions. Please see http://stats.stackexchange.com/questions/51718/assessing-approximate-distribution-of-data-based-on-histogram/51753#51753 for a critical evaluation. – whuber Jul 28 '13 at 21:32
1

I'm also a strong believer in the importance of visualizing data, but was hoping for a quantitive means of assessing the goodness of fit. I should have made that explicit in my question. – Jul 29 '13 at 01:14
We should use both statistic (goodness of fit, BIC, AIC etc.) and visualizing, because we have certain problem in which statistic shows it is good fit but visualization of observed and expected is not close. Both gives strong evidence for a good fit. @whuber my answer is related after using a distribution and the link you provided is about pre selection of finding approximate distribution. – SAAN Jul 29 '13 at 06:13
1

Good point, Azeem. On the other hand, you wouldn't want to weight visual fit too highly, per the concern of over-fitting the data. The beauty of BIC and AIC is they weight models by the number of parameters, and so can deem a model with a worse fit (discerned visually), but fewer parameters, the more likely source of the data. – Jul 30 '13 at 19:01

score 2 · Accepted Answer · edited Jul 29 '13 at 16:10

When computing the $r^2$ between the histogram and the continuous PDF, one should use the normalized values of the histogram (indicative of frequencies, such that the area of the histogram is 1) and the mean of the PDF in each bin (the integral over the range of each bin, normalized by the width of the bin). See this webpage for a description of how to obtain a continuous function's mean within a given range.

Note, however, that other statistics may be more appropriate for assessing goodness of fit between the observed distribution and the PDF. Two suggested in the comments to my question are the $\chi^2$ statistic (thanks, @AdamO) and the Kolmogorov-Smirnov (KS) test. Zhang and Luck (2008, 2009) calculated these in addition to $r^2$. Specifically, $r^2$ is inappropriate when the PDF being compared to the data is the uniform function, for the reasons @whuber states in one of his comments. @NickCox claims in one of his comments that all three of these measures are inappropriate for circular data.

@NickCox offers the Kuiper statistic as the best measure for testing the goodness of fit in the present scenario (thanks for that!), but claims it is unlikely to be available in software. This pessimism is unfounded: The KS test is available in the SciPy package for Python as scipy.stats.kstest (see the documentation here), and the results from this function can be used to calculate the Kuiper statistic (see this webpage for the latter statistic's definition). It appears at least one user of Python has written code that directly calculates the Kuiper statistic.

As most reviewers of scientific papers are unlikely to be satisfied with the use of a single measure of goodness of fit, I recommend using several, perhaps all, of the suggested statistics. Another option, vaguely alluded to by @NickCox, is to compare the fits of different models with something like the Bayesian Information Criterion (see van den Berg et al. [PNAS, 2012] for an example of its application) or the Akaike Information Criterion (see Fougnie et al. [Nat Commun, 2012]).

I was referring to a Kuiper-type calculation comparing the observed distribution function and the _continuous_ fitted distribution function integrated from the fitted density function. Is that what the Python code you've unearthed does? Using the means of observed and fitted density functions is at best only an indirect approximation to that. I am at a loss to see why you think any of my comments allude, vaguely or otherwise, to BIC or AIC; for the record, that was not being suggested by me. Other people replying or commenting on your question may want to add their own remarks. — Nick Cox, Jul 28 '13 at 00:38
"Using the means of observed and fitted density functions is at best only an indirect approximation to that." Please re-read my answer and you'll see that I never suggest calculating the mean of anything when computing the Kuiper statistic. My description of how to calculate the mean refers specifically to the calculation of $r^2$ per my original question. In an earlier comment you said "I just compare two or more model fits and see which is closer." The BIC and AIC are specific ways of making such a comparison. I don't think I have any more patience for this thread! — , Jul 28 '13 at 03:25
You can follow through by accepting your answer formally; you will gain reputation points thereby. I've not studied the Python code you cite to see what it does to calculate a Kuiper statistic; I do understand that your discussion of calculations with bins refers only to an $r^2$ calculation. It appears that no discussant could accept all your premises here, let alone what was off-limits in discussion, but you get to be judge and jury on what answers your question. When I say "which is closer" I meant graphically; as said I don't favour single measures of fit much. — Nick Cox, Jul 28 '13 at 09:19

score 1 · Answer 3 · answered Jul 26 '13 at 19:40

1

This is just a very simple test of calibration. By using the binning approach, you need not do any more than calculate the $\chi^2$ fit statistic and be done with it. It is very limited in scope of what it addresses, but you have not indicated whether you're interested in anything else based on your problem description.

To calculate expected frequencies, you integrate the continuous DF for your referent population distribution model over bounds you've defined by binning. The choice of which values to bin is very important indeed and should not be guided by $p$-values but rather meaningful cut points. The $\chi^2$ statistic is then $\chi^2_{(k-1)} = \sum_{i=1}^k \frac{\left(O_i - E_i\right)^2}{E_i}$ with $k$ being the number of bins.

answered Jul 26 '13 at 19:40

AdamO

52,330
5
104
209

This is the right approach, but some consideration of the degrees of freedom is in order. A mixture of a Von Mises distribution and a uniform has three parameters; if all three are fit from the data and the bins are determined independently of the data, then the DF should be close to $k-3$ instead of $k-1$. I say "close to" instead of "equal to" because of the considerations described at http://stats.stackexchange.com/questions/16921/how-to-understand-degrees-of-freedom/17148#17148. – whuber Jul 26 '13 at 19:46
@whuber, I've just read through the explanation at the link you posted, and I'm thinking it's important that I mention that I've arrived at my PDF parameters via maximum likelihood estimation, using the original observations, not binned counts. My goal is to get a measure of how well the MLE-derived parameters fit my distribution of observations, and I've chosen to construct the histogram and measure the fit of the model to the histogram because it's something I saw done in the papers I cited above. But maybe this is a bad idea? – Jul 26 '13 at 21:11
It's a good idea: use either the chi-squared test proposed by AdamO here or the K-S test. – whuber Jul 26 '13 at 21:20
The Kuiper procedure mentioned in my comments is the extension of Kolmogorov-Smirnov to circular set-ups. K-S is wrong in spirit. Chi-square is popular but pays no attention to circular scale. – Nick Cox Jul 26 '13 at 21:27
@whuber why would the df be determined using the number of bins? – AdamO Jul 26 '13 at 22:49
1

Roughly, each term $(O-E)^2/E$ behaves (asymptotically in the maximum likelihood theory) like the square of a Standard Normal distribution. There are $k$ of them, so *if they were independent* their sum would approximately have a $\chi^2_{(k)}$ distribution (because that's what a Chi-squared distribution is!) and if they were subject to $p$ linearly independent linear restrictions, they would have a $\chi^2_{(k-p)}$ distribution (because such sums act like sum of $k-p$ *independent* squared Standard Normals). So, asymptotically, it is obvious what the degrees of freedom need to be. – whuber Jul 28 '13 at 21:30

What is the proper way to compute the r-squared between a binned distribution of observed values and a continuous probability density function?

3 Answers3

Linked