The official documentation states that "The learning rate shrinks the contribution of each regressor by learning_rate."
. Thus, basically we need to understand three concepts:
1. Weak Classifier
A model whose error rate is only slightly better than random guessing, that is to say, 50% accuracy.
2. Boosting
This technique has the objective to apply $K$ different times (sequentially) a model to modified versions of the data. So, suppose at each iteration $i \in \{1,2,..., K\}$ you build a new tree model $T_{i}$
\begin{align}
T_{i+1}(x) = T_{i}(x) + \alpha M(x),
\end{align}
where $$M(x) = \sum_{j=1}^{J} t(x, \theta_{j})$$
is the sum of trees with different paramaters $\theta_{j}$ and $\alpha$ is the learning rate between 0 and 1.
3. Learning Rate
This parameter controls how much I'm going to contribute with the new model to the existing one. Normally there is trade off between the number of iterations $K$ and the value of $\alpha$. In other words, when taking smaller values of alpha ($\alpha \approx 0$) you should consider more $K$ iterations, so that your base model (the weak classifier) continues to improve. According to Jerome Friedman, it is suggested to set $\alpha$ to smaller values ($\alpha < .1$).