I have implemented the tree building part of CART, but I'm wondering if it can be optimized. According to this quora answer, the complexity should be $\mathcal O(v \cdot n \log n)$ where $v$ is the number of features and $n$ the number of records.
Now my implementation looks as follows (in python, with some implementation details left out):
best_cost = float('inf')
best_split = None
for i in range(n_features):
for features, target in data:
left, right = list(), list()
for x, y in data:
if x[i] < features[i]:
left.append((x, y))
else:
right.append((x,y))
gini = 0
for node in [left, right]:
p = sum([1 if target == 1 else 0 for (feature, target) in node]) / len(node)
gini += calc_gini(p)
if gini < best_cost:
best_cost = gini
best_split = (left, right)
If I'm not mistaken the first loop (for i...) is the $v$, the second loop (for feature, target in...) is $n$ but then the third loop is identical to the second, making it $n^2$. So together $\mathcal O(vn^2)$.
I have a feeling that by sorting the data in some way, some loops can be stopped early. But to do that you have to order the data first which is at best $\mathcal O(vn \log n)$. Perhaps by sorting the data once, the whole tree can be build in the $nv \log n$ time order, tuhs the whole tree is built in $vn \log n + mvn \log n = (vn \log n)(1+m)$ where $m$ is the total resulting number of nodes, but this is pure speculation on my part.
How does the CART algorithm find the greedy best split in only $\mathcal O(v\cdot n \log n)$ time?