Coefficient of variation of circular data

Question

The definition of coefficient of variation is as follows:

coefficient of variation= standard deviation/mean.

I am using circular (directional) statistics to find the mean and standard deviation. My question is: can I use the same definition of the coefficient of variation (given above) for circular data as well? Thanks in advance

Nick Cox · Answer 1 · 2021-06-30T12:28:25.030

This answer argues that there is no need for any version of coefficient of variation for circular data. Worse, attempts to define or calculate one lead to absurdity.

For a self-contained introduction to the topic setting out some notation and definitions this Wikipedia article will do fine.

Circular data take values on an outcome space that is a circle. Direction as a bearing relative to North or some other standard direction is a common example. Let's suppose for concreteness that the units of measurement are degrees, ranging from $0$ to $360^\circ$, noting that the key point is that $0$ and $360$ degrees are identical directions. Nothing depends on a choice of units: radians or time of day or time of year could make as much or more sense for particular applications.

Let's back up and consider that any kind of coefficient of variation is the ratio of a standard deviation, or at least some measure of scale or variability, and a mean, or at least some measure of level or location.

The standard definition of a mean for circular data is the vector mean. A standard mean is very often a poor choice, as for example if you have directions that are $1$ and $359^\circ$ feeding them to any ordinary mean routine will return $180^\circ$. The vector mean is the arctangent of the sum of sines of directions divided by the sum of cosines, or more plainly the resultant from adding directions as vectors geometrically, namely end to end. The vector mean of $1$ and $359^\circ$ is $0$ or $360^\circ$, which as a diagram will suggest is a natural solution. More generally, the vector mean is well defined in all but a few pathological cases (such as data that are two opposite directions, which cancel).

Variability of directions on a circle is most simply measured as mean resultant length. (That name is the most popular among several in use; in my view the rarely used name consistency is much more evocative.) The mean resultant length is at its largest when all directions are identical (hence the alternative name consistency) and at its smallest when directions are equally common in opposite directions (a circular uniform distribution is one but not the only possibility). Hence mean resultant length is an inverse measure of variability, and some have defined circular variance as its complement (in 1 or in 100%). This is likely to seem confusing at first sight, as either measure is reported on a scale from 0 to 1 (or 0 to 100%). But any convention about using degrees, radians or other units for the original data does not bite, as those units wash out of the definition and calculation. Further, a circular standard deviation has been defined but, again confusingly at first sight, not as the square root of the variance. For details, see for example the article cited above.

To the point: Dividing any measure of circular variability by any mean is neither needed nor even justifiable. The vector mean could be zero, i.e. it could coincide with the reference direction, whether say North, or South, or midnight, or the start of a year. If the absurdity of dividing by zero is avoided by rotation, then one absurdity becomes another as now the result of dividing any measure of variability by any vector mean is utterly dependent on a convention about the reference direction.

Coefficients of variation have often been oversold in statistical applications, but their merit when they are well defined and useful is as measures of relative variability of counts or measurements of variables with positive mean and an unequivocal zero point. See for example this thread on CV. There is no analogue of that set-up in circular statistics, which concerns a quite different outcome space.

Thanks for your detailed answer Nick. I have a quick question to clarify sth for myself: We are calculating our circular standard deviation using this formula: S=sqrt(-2*log(r)), where r is mean resultant vector length. What is the unit of this standard deviation? or is it unit-less? — Aep, Jun 29 '21 at 18:51
I did more search on this and I found that circular standard deviation has units (it can be in degrees or radians). You can see it in the following paper for instance: "Cell directional spread determines accuracy, precision, and length of the neuronal population vector". Am I missing something? — Aep, Jun 30 '21 at 00:16
I've simplified my answer to remove an assertion that was over-stated and indeed incorrect. — Nick Cox, Jun 30 '21 at 00:26
This is interesting, Nick, & it's a bit outside of my wheelhouse. But consider a counterexample: The number of new patients arriving at the ER by hour of the day. Hour should be circular, but the number of patients is a count & it's not too outrageous to imagine that the SD scales w/ the mean. — gung - Reinstate Monica, Jun 30 '21 at 00:47
@gung But it is only a convention to have a zero at midnight. If your data are patient counts in time bins, then that's a different outcome space -- and in practice one analysis could be in terms of sine and cosine of hour of day as covariates, together with any others. Oddly, this territory is usually ignored in texts on circular statistics. But however you parametrise your set-up, it is not what is generally considered as circular data. Although I disagree, partly or wholly, with the other answers to date, neither answer seems to understand the question in that way. — Nick Cox, Jun 30 '21 at 07:34

score 2 · Answer 2 · edited Jun 28 '21 at 15:40

2

This type of coefficient of variation is generally only useful if the mean is not 'arbitrary', that is, the data means something very different if the mean is artificially raised. An example would be count data, but there are many practical situations where the coefficient of variation could be sensible with continuous data as well.

For circular data, however, we rarely have this situation; in most cases we expect to be able to rotate the data without changing its meaning too much. That is, the point where we put $0^\circ$ is arbitrary. So, in most cases, a coefficient of variation is not useful; on the flip side, the standard deviation is more directly interpretable, as it has a 'maximum' if the data is circular uniform.

An exception could be 'accuracy' style data, where $0^\circ$ is 'hitting the target'. In these cases, the $0^\circ$ point is not arbitrary, and the coefficient of variation can be useful. The exception to this exception is if the mean is very close to $0^\circ$ most of the time; this would make the coefficient of variation unstable due to the division with a small number.

edited Jun 28 '21 at 15:40

Nick Cox

48,377
8
110
156

answered May 28 '21 at 10:50

Kees Mulder

1,414
1
10
10

Thanks for your response @keesMulder. But I have a question, why you are saying that our mean is arbitrary when we use circular statistics (that takes periodicity of the data into account). In this case, can't we use the coefficient of variation (when mean is not zero) the way that I mentioned to show the variability of a specific data? – Aep May 31 '21 at 02:50
It's arbitrary in the sense that it is not 'natural' but a choice made by the analyst. A good example is the compass; there is no reason why would set north to be 0 degrees, or west, but we have to choose one. Therefore, the coefficient of variation would depend on this arbitrary choice of where is the zero. As I mentioned, this is not always the case, but is quite often with circular data. – Kees Mulder May 31 '21 at 07:29
1

Thanks for your explanation @keesmulder. I see what you mean by arbitrary mean. One thing that comes to my mind is that this arbitrary mean may happen in conventional (not circular) statistics as well. For example, in many problems the choice of origin is arbitrary. So maybe it is not only a problem in circular statistics. Is this correct? – Aep Jun 20 '21 at 16:01
1

Also, I think the standard deviation shows how close (or far) is our data to its mean (is this definition correct in the circular standard deviation as well?). Therefore, the coefficient of variation can show the variability in relation to the mean of the population. Is my interpretation correct? – Aep Jun 20 '21 at 16:02
Yes, you are right, this happens in non-circular statistics as well. And your interpretation of the coefficient of variation is correct as well. – Kees Mulder Jun 20 '21 at 20:03
1

This starts well but I think even your exception strains credulity. Please see my comment on a later answer. I can't see that there is ever a meaning to dividing by a vector mean using scalar arithmetic. The point is that measures of variability of circular data are already relative to the entire circle; there is no further scaling either needed or sensible. Even when 0 deg is a natural reference point, the CV is absurdly unstable. – Nick Cox Jun 28 '21 at 15:43
I agree, @NickCox, thanks for your answer below extending this side of the story. I still think in bounded settings (ie. I ran across a vet at some point measuring the range of movements of horse joints) it might still be possible to carefully employ a cv-like measurement on angles, but I think we would both agree that those situations are barely circular statistics-settings at all. So I think the conclusion that cv-measurements should not be used in circular settings holds. – Kees Mulder Jul 06 '21 at 09:54

score -2 · Answer 3 · answered Jun 28 '21 at 15:08

-2

Given that you estimate the mean value and the standard deviation taking into account the circular statistics I suppose that you can estimate the CV with the same formula sd/mean. For example in the case of wind direction the well-known formula it works fine.

answered Jun 28 '21 at 15:08

Marz

1
3

Not so. Suppose your mean direction is 0 deg N, except that it could be 359 or 1 deg. Dividing a standard deviation by 359, 0, 1 will give you variously a very small number, no quotable result at all, or a very large number, which is absurd. The same problem applies with directions in radians, or any other units. This is even supposing that the SD of circular data has been calculated in a way that doesn't depend on the origin, as you do state. – Nick Cox Jun 28 '21 at 15:37
Yes but in this case, you can convert the degrees to negative values for instance 359,0,1 could be written as -1,0,1 and hence you can estimate the CV without having a problem. – Marz Jun 28 '21 at 20:54
You make my point for me but more strongly. Not only does your version of the CV depend on your convention about origin it can be positive, negative or not determined. – Nick Cox Jun 28 '21 at 21:08
I understand your point and thanks for the answer, but I think with a simple code the problem could be resolved. – Marz Jun 28 '21 at 21:13

Coefficient of variation of circular data

3 Answers3