0

So, I have been pondering on how I can select the number of bins in a dataset? I know we have different methods for selecting number of bins for histogram, but how do I select number of bins when working with dataframe?

The dataset I am trying to work on is:

128 face encodings sample

Values in the dataframes are that of face encodings. I want to discretise these values by each rows(each rows of 128 values indicate face encodings of each different individual person). My question is, without knowing the number of bins, I can't use python pd.cut function to discretise the values so what approach should I take to understand how many bins I need?

  • 2
    [Don't bin your continuous data](https://stats.stackexchange.com/q/68834/1352). Feed them into your algorithm as-is; potentially transform them using (e.g.) restricted cubic splines (see, e.g., Frank Harrell's *Regression Modeling Strategies*) to capture any nonlinearity. – Stephan Kolassa Sep 29 '20 at 09:54
  • Thank you @StephanKolassa – Shikhar Ghimire Sep 29 '20 at 13:39

0 Answers0