I have a large data set with many discrete and continuous variables. All the variables are present in every observation. I want to explain (the log of) one continuous variable using all the other variables I have selected. For this purpose I wish to divide the independent continuous variables into bins so as to maximize the between-bins variation in the dependent variable relative to the within-bin bin variation, subject to the constraint that the break-points in the binned variables must be the same for all observations. Within- and between-bin variation should be given a multivariate interpretation, i.e. single bins are formed as the cross product of all the binning cuts. I'd also like to assure that every bin includes some minimal number of observations, but I am guessing that I will have to do this "by hand," e.g. by setting a maximum number of bins for each variable individually.
Can anyone recommend an algorithm or package for this purpose? I expect to do the work in R.
I should be clear that I am not requiring that bin widths are equal.
I don't know if this makes any difference, but my purpose in doing the binning is to set up pseudo-strata for a complex survey where the stratification is not published for confidentiality reasons. I have replicate weights for recent years, but I want to come up with something I can uses in every year and see if the variance estimates maintain a constant ratio to the replicate variances.