I expect the training time of a random forest to be : linear in the number of trees (obviously), linear (or square-root) in the number of columns (depending on the choice for the subsampling of the columns) and, when it comes to the maximum depth of the trees... I am lost.
I thought that finding a split (inside a leaf) was linear in the number of elements in the leaf.
Therefore, increasing the maximum depth of a tree is just finding $n$ more splits, where $n$ was the previous number of leaves. The additional operation is ($n_i$ is the size of the leaf $L_i$) : $$\sum_{i=1}^n \text{split}(L_i)=\sum_{i=1}^n\theta(n_i)=\theta(n)$$
Which is more or less equivalent to finding the first split.
Therefore, I thought that the training time of a random forest was linear in the maximum depth. However the numerical example shows (performed with sk-learn) that I am completely wrong (hopefully I will be able to post the depth 20 timings):
TIME 0.15m (1-fold)
{'n_estimators': 10, 'max_depth': 5}
TIME 0.60m (1-fold)
{'n_estimators': 50, 'max_depth': 5}
TIME 0.69m (1-fold)
{'n_estimators': 10,'max_depth': 10}
TIME 3.18m (1-fold)
{'n_estimators': 50, 'max_depth': 10}
TIME 3.85m (1-fold)
{'n_estimators': 10, 'max_depth': 15}
TIME 15.59m (1-fold)
{'n_estimators': 50, 'max_depth': 15}