I'm trying to understand the gradient boosting regression example using the Boston housing data (http://scikit-learn.org/stable/modules/ensemble.html) hoping to apply it on a different task, but I'm a novice Python user and beginner to ML. I have a general understanding of what the program is doing, but I would like to know what the following codes &/or arguments are doing:
- Line 2: Why do we have to shuffle the data & what is
random_state=13
doing? - Lines 4-6: The train & tests variables are obvious & the brackets, i.e., subsetting of a list/dictionary, but I don't understand the significance of creating offset integers & the arguments used to create it)
boston = datasets.load_boston()
X, y = shuffle(boston.data, boston.target, random_state=13)
X = X.astype(np.float32)
offset = int(X.shape[0] * 0.9)
X_train, y_train = X[:offset], y[:offset]
X_test, y_test = X[offset:], y[offset:]