3

This bears some explaining:

I have a set of data from a psychophysics experiment where participants selected a response from a discrete set of 8 possible responses. These responses were actually colours, but they are equivalent to angles (taken from a circle in colour space), so the response set is effectively:

{pi/4, pi/2, 3*pi/4, pi, 5*pi/4, 3*pi/2, 7*pi/4, 2*pi}

In my field (visual working memory), there are several competing models to fit behavioural data like mine. However, most other tasks use a continuous report scheme where they allow for 360 unique colours (angles) to be selected. There is a common toolbox to test the fit of such models, but it seems to only be effective for continuous data. For example, when I try to fit a simple von Mises I get results like this:

Von Mises Fit

It seems to me that this fit doesn't capture the distribution (variance/width) of my data. Am I doing something fundamentally wrong by trying to fit this distribution? I'm not sure how to compare model performance on my data if these fitting methods only work for continuous values.

Somebody made an offhand comment recently that I could "try adding x degrees of Gaussian noise" to improve the fit. Is this a valid strategy? I have a weak statistical background (this is an undergraduate research project), and I'm not sure what to do.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
BC1
  • 43
  • 5
  • 2
    Nice question. On general principles, adding noise is a poor idea: why degrade your data in order to apply some procedure that might not be appropriate for those data? That's Procrustean. Could you explain why you think you need to fix some continuous distribution to these data? What will it accomplish for your analysis? – whuber Jan 26 '18 at 00:27
  • For example, it is common to fit a mixed model to behavioural data like this, which assumes that the responses represent two components: (1) "guesses" where the participant had no idea which angle was correct and chose at random (a uniform distribution) and (2) noisey responses (a normal distribution with a mean at the correct angle). People can draw insights about the underlying system by observing how the parameters change. Ex: people might say that the variance of the normal distribution is indicative of the resolution of the memory that was probed for response – BC1 Jan 26 '18 at 02:29
  • 1
    Can't you fit a discretized Von Mises? Might be computationally heavy, but not impossible. – Olivier Jan 26 '18 at 13:41
  • The principle that must be respected and which is biting you is that the probability density integrates to 1. Given discrete data, you can't get a peak density matching your peak that is also consistent with your mean and variability. What justifies your bin width, as your histogram should be spikes rather than bars? – Nick Cox Jan 26 '18 at 13:50
  • I've seen the von Mises called the _circular_ normal, although this terminology also seems to be fading. – Nick Cox Jan 26 '18 at 13:59
  • I would also suggest that the von Mises might not be the best fit for this data; it seems like the data is strongly peaked with heavy tails. However, this is indeed hard to see considering the discrete nature of the data. The easiest way to deal with this is using the corrections for grouped circular data that have been developed. – Kees Mulder Jan 26 '18 at 14:07
  • I agree with @Kees, but would like to suggest--upon staring at the plot a little harder--that a good fit to a mixture of a von Mises and a uniform distribution might be possible. It would be routine to estimate the parameters using maximum likelihood (treating these as interval data). Whether your data are rich enough to accommodate fitting three parameters (location, spread, and mixture coefficient) is a matter to consider. In effect, the uniform distribution component models "noise" arising from phenomena like guessing or erroneous recording of observations. – whuber Jan 26 '18 at 14:33
  • I think you can assume that participants chose the discrete response that is the closest to what they would have reported with a continuous response method. For example, say a participant choses $\pi$, you can assume that the continuous value that they remembered (and would have reported) lies in the interval $\left[ {\pi - \frac{\pi }{8},\pi + \frac{\pi }{8}} \right)$. So by integrating the probability over this interval, you get the likelihood of that particular response. – matteo Jan 26 '18 at 14:42
  • Wow- thanks for all of the insightful responses! 1) Olivier: can you clarify what you mean by a "discretized von Mise?" I can't find any real reference to such a method. 2) Nick Cox: that's a good point, but I think that I agree with Matteo about the intervals allowing for a total integral of 1. 3) Kees: google is failing me- what corrections are you referring to? 4) @whuber: I'm going ahead with a mixed model from scratch (this was generated by a toolbox). Can you clarify how I would go about treating these as interval data? I assume that this toolbox does not, and this causes my issue? – BC1 Jan 26 '18 at 22:44
  • I have shown how to do this in answers at https://stats.stackexchange.com/questions/56015 and https://stats.stackexchange.com/questions/49443. It's no different with circular data. – whuber Jan 26 '18 at 23:20
  • @whuber: I'm sorry to keep bugging you, but I think I'm getting close. I've been adapting your implementation (from the first link), and I've stumbled across an issue. Basically, I take every observation (angle from the set {-3pi/4:pi/4:pi}), and set left = observations - pi/8 and right = observations + pi/8. The problem with circular data is that now I've got an interval that crosses my limit of [-pi,pi]. Indeed, when I did an MLE fit to a simple von mises, I end up with distributions that are all skewed rightward (means well above 0). Is there a nifty solution to this interval issue? – BC1 Jan 28 '18 at 04:25
  • There's no problem. The Von Mises distribution function uniquely determines a probability for any interval, regardless of how it happens to be described. How you actually go about computing that probability depends on your software. – whuber Jan 28 '18 at 18:55
  • The code formatting in comments is terrible, but essentially this is what I've done (in MATLAB) to fit **only a von mises**. **(1)** Created a vm function vm_pdf(mu, kappa, x). **(2)** LL_vm = sum(log(vm_pdf(mu, kappa, right)) - log(vm_pdf(mu, kappa, right))). **(3)** Used circular mean and 1/circular variance for my x0 values. **(4)** Created a 3rd function f(x) = -LL_vm(x(1), x(2), left, right). **(5)** used fminsearch() to find the _minimum_ of f(x) (which is the negative LL_vm)... the function f(x) approaches -infinity, but the fit given by those params is comically bad. Is there an error? – BC1 Jan 28 '18 at 20:07
  • I've posted another question following up my current issues [here](https://stats.stackexchange.com/questions/325517/maximum-likelihood-estimation-to-fit-von-mises-to-grouped-interval-circular-da) – BC1 Jan 28 '18 at 20:34

0 Answers0