Is binning of continuous data always bad for statistical tests?

Question

I was always thinking that binning of data if data is naturally continuous is bad. However, here is the case. The goal of study was to find if there is an association between a biomarker and disease progression.

I have 40 data points, measured in percents (disease progression). I have one biomarker which is factor (yes or no). I perform Wilcox test between groups with and without biomarker, comparing disease progression, I got p-value 0.07.

Doctors have clinically significant thresholds (symptoms decrease, symptoms remain more or less the same or symptoms highly increase). They divide data into these 3 categories and count number of times this biomarker occur within the subgroup (so it form contingency table $2\times 3$). They perform exact Chi squared test in this binned data (this is actually the case) and get p-value of 0.03. Expected values in contingency tables are not super-low - 40 data points are nicely distributed across 6 cells of the contingency table.

So, 2 questions: 1) in this particular case - does it mean that exact chi squared on binned data is more powerful test than Wilcoxon test? or it is just a coincidence? (I am not surprised that tests gave different p-values - I am surprised by the magnitude of difference in p-values) Or it is likely a violation of parametric assumption of this "exact chi-squared test" (I am unfamiliar with, but I think this test was assumed)? Of course I understand that Wilcox test between groups and proportion test within 3 subgroups are answering different questions - but the initial goal is to find if there is association between biomarker and disease progression and it seems that both statistical tests are answering it.

2) Is binning preferred over continuous data in some situations? In which?

@FransRodenburg it is kinda totally different, as for me. I am not breaking up a continuous predictor variable - actually I could answer the question you referred to, but I can not answer the one I've asked. — German Demidov, Aug 13 '19 at 06:12

score 1 · Answer 1 · answered Aug 13 '19 at 06:19

Related answer here from just yesterday. If the distribution you observe is the distribution you wish to test, then binning forfeits information and will thus on average reduce your ability to (in this case) reject the hypothesis that the distributions are different between the two groups. There are 3 important caveats, though.

1) the tests you use matter. For example, if you bin and do a chi-square, you might still expect more power to reject the null of independence than if you don't bin and test for stochastic dominance, because it's very difficult to reject the null hypothesis of no stochastic ordering. However, it should be easy to see that if you bin and do a t test, you'll get worse power than if you do a t test on the original data. I suspect that this is what's occurring in your example, since it seems easier to reject independence than to reject the bizarre Wilcoxon null (though I don't have a proof of this, maybe someone else can jump in).

2) finite samples might yield anomalous results sometimes.

3) if you believe there is significant measurement error and the binned data are "closer" to the actual thing you'd like to have measured, then binning might be a good idea. If the percents you have are imprecise/ad-hoc, then doctors might feel that the bins are a more faithful representation of the disease's progression. However, they might just feel more comfortable with the chi-square, which by itself is not a good reason.

Try a t test on the original percents. Your sample size is small enough that it shouldn't be completely trusted unless the percentages look very normally distributed, but I'll bet it gets a lower p-value than the chi-square.

I see what you mean, yes, thanks, these questions are definetly related. t-test can not be applied here - normality is hardly violated and number of points per group are approx 20, so no CLT sanctuary to hide in =( — German Demidov, Aug 13 '19 at 06:27

score 1 · Answer 2 · answered Aug 13 '19 at 06:21

1

My initial reaction is that 0.03 vs. 0.07 is quite a small difference in such a noisy random variable as a p-value - even if we assume the particular dichotomization was pre-specified (it is certainly true that shopping around for the optimal cutpoints dichotomizing will have a lower "p-value" - except that it will be invalid).

In any case, power is an average long-run property, not something one can judge from a single dataset. With a single pre-specified set of cut-offs - even if it is the optimal choices - you would expect more power with a continuous variable. You can of course try this and do a simulation.

answered Aug 13 '19 at 06:21

Björn

21,227
2
26
65

1

That's what I thought and I am satisfied with this explanation - but these explanations sound super weak for my colleagues-doctors who will ALWAYS do binning of data from now on - since they have seen how good it is in this particular example. I was looking for "explain-to-granny" explanation so they will not use binning as a general practice...Unfortunately thresholds were pre-specified - these clinical thresholds are standards, they just took it without playing around - and voila – German Demidov Aug 13 '19 at 06:24
The "granny" explanation is probably just "throwing away information is usually a bad idea--look, we can do a t test with enough continuous data that will be powerful and interpretable." – Sheridan Grant Aug 13 '19 at 06:28
1

can not do a t-test =( highly variable and non-normal data. log(data) may make it more normal, but then negative percents step out XD other transformation games lead to outliers – German Demidov Aug 13 '19 at 06:30
and yes, throwing away information is bad - but they got something < 0.05, that's it, they don't care about anything else, they have result now - they haven't had it when I analysed data with Wilcox – German Demidov Aug 13 '19 at 06:31

Is binning of continuous data always bad for statistical tests?

2 Answers2

Linked