I understood How binning of numerical feature would help build correlations between the feature & the predictor. For example
For a regression problem, we can bucketize "population" feature into the following 3 buckets (for instance):
bucket_0 (< 5000): corresponding to less populated blocks
bucket_1 (5000 - 25000): corresponding to mid populated blocks
bucket_2 (> 25000): corresponding to highly populated blocks
Given the preceding bucket definitions, the following population vector:
[[10001], [42004], [2500], [18000]]
becomes the following bucketized feature vector:
[[1], [2], [0], [1]]
I took the example from here, and they suggest in the same setting if create bins for 3 different features such as latitude, longitude, roomsperperson, then we can enable the model to learn nonlinear relationships within every single feature!
Three separate binned features:
[binned latitude], [binned longitude], [binned roomsPerPerson]
I can't understand the learning of "nonlinear relationships within every single feature" because at a time while training only one of the bin(for each feature would be available) would be visible to learning algorithm & not the whole vector, as vector symbolize the input space. Correct me if I am wrong!