Classification of real values

Question

I have values of attribute between 0 and 1 which i want to predict. The distribution of values is shown in fig.

I want to predict this attribute. The problem is there are around 15 classes in this attribute . I want to label 15 classes to 6 like value of attribute greater than 0.9 describe as excellent. Between 0.7 and 0.9 very good and between 0.5 and 0.7 is good etc. But the problem is i am unable to find the standard way of doing this type of labeling. Like why i have given 0.9 a excellent label etc. Someone told me it has something to do with mean and stddev. But i am clueless about it. Also mean and stddev of column is shown in fig.

I'm having trouble understand the question. It sounds like you're talking about binning continuous values in $[0, 1]$ into discrete categories like "good", "excellent", etc. based on pre-specified ranges for each bin. Are you asking what the the justification is for this procedure? — user20160, May 30 '16 at 06:30
There isn't any inherent justification. If you have some downstream system that requires discrete categories, that could be a valid reason. Absent something like that, you'll be throwing away information, and many people actually recommend against it. Search for "discretization" to read more. This post may be useful: http://stats.stackexchange.com/questions/104402/what-is-the-justification-for-unsupervised-discretization-of-continuous-variable — user20160, May 30 '16 at 06:58
Why not using **regression** and round the prediction value to the nearest **class** (which doesn't throw away information trough binning)? — sascha, May 30 '16 at 10:24
@sascha how should i go about it? Could you please elaborate it more? — james, May 30 '16 at 11:06

Classification of real values

0 Answers0