I have a data set with a sample size over three million numeric values. Close to 20% are either 0 or 1, with the maximum being nearly 18500. So the data is clearly quite heavily positively skewed.
I am trying to categorize some of this data by putting it into bins of equal width, to use the Chi-sqaure test and Cramers V to look for associations between this variable and a categorical one, so I decided to try and find the optimal number of bins. Using the Freedman-Diaconis rule it gave me a value of 126044.0262335108, this is clearly a ridiculously large number of bins for the data.
Breaking the set into the Inter-decile range also proved fruitless giving me [0, 1, 1, 2, 3, 5, 8, 17, 47]
Reading elsewhere the square root of the sample size was suggested, this gave 1732.05081 which is more reasonable. However the method is quite crude.
I also looking into Doane's formula given here. But reading up on this method it seems to have been based on an incorrect hypothesis.
How should I deal with this level of skew in the data?
What is the best way to categorize this data?