Questions tagged [cart]

'Classification And Regression Trees'. CART is a popular machine learning technique, and it forms the basis for techniques like random forests and common implementations of gradient boosting machines.

CART stands for Classification And Regression Trees. This is a technique for developing a tree model (T) to predict categories (C) and/or continuous values (R) by recursive partitioning. It does not make restrictive parametric assumptions.

(Note that "CART" is a synecdoche for the general data mining technique of using decision trees to predict outcomes. Strictly speaking, "CART" refers to a specific algorithm for forming trees that was popularized by the work of Leo Breiman. However, CART is commonly used to refer to any predictive tree algorithm, and the tag may be used similarly on Cross Validated.)

1203 questions

162

votes

3 answers

Gradient Boosting Tree vs Random Forest

Gradient tree boosting as proposed by Friedman uses decision trees as base learners. I'm wondering if we should make the base decision tree as complex as possible (fully grown) or simpler? Is there any explanation for the choice? Random Forest is…

asked Sep 20 '15 at 20:44

FihopZz

1,923
4
11
9

139

votes

9 answers

Obtaining knowledge from a random forest

Random forests are considered to be black boxes, but recently I was thinking what knowledge can be obtained from a random forest? The most obvious thing is the importance of the variables, in the simplest variant it can be done just by calculating…

machine-learning data-mining interaction random-forest cart

asked Jan 16 '12 at 11:09

Tomek Tarczynski

3,854
7
29
37

106

votes

1 answer

Conditional inference trees vs traditional decision trees

Can anyone explain the primary differences between conditional inference trees (ctree from party package in R) compared to the more traditional decision tree algorithms (such as rpart in R)? What makes CI trees different? Strengths and…

r machine-learning cart

asked Jun 20 '11 at 21:45

B_Miner

7,560
20
81
144

votes

2 answers

Practical questions on tuning Random Forests

My questions are about Random Forests. The concept of this beautiful classifier is clear to me, but still there are a lot of practical usage questions. Unfortunately, I failed to find any practical guide to RF (I've been searching for something like…

random-forest cart

asked Mar 25 '13 at 15:53

lithuak

votes

3 answers

How to actually plot a sample tree from randomForest::getTree()?

Anyone got library or code suggestions on how to actually plot a couple of sample trees from: getTree(rfobj, k, labelVar=TRUE) (Yes I know you're not supposed to do this operationally, RF is a blackbox, etc etc. I want to visually sanity-check a…

r data-visualization random-forest cart

asked Oct 29 '12 at 19:43

smci

1,456
1
13
20

votes

5 answers

Training a decision tree against unbalanced data

I'm new to data mining and I'm trying to train a decision tree against a data set which is highly unbalanced. However, I'm having problems with poor predictive accuracy. The data consists of students studying courses, and the class variable is the…

classification cart unbalanced-classes accuracy

asked May 08 '12 at 16:13

chrisb

votes

3 answers

What is Deviance? (specifically in CART/rpart)

What is "Deviance," how is it calculated, and what are its uses in different fields in statistics? In particular, I'm personally interested in its uses in CART (and its implementation in rpart in R). I'm asking this since the wiki-article seems…

r cart rpart deviance

asked Jan 26 '11 at 16:27

Tal Galili

19,935
32
133
195

votes

3 answers

How are Random Forests not sensitive to outliers?

I've read in a few sources, including this one, that Random Forests are not sensitive to outliers (in the way that Logistic Regression and other ML methods are, for example). However, two pieces of intuition tell me otherwise: Whenever a decision…

random-forest bootstrap outliers cart

asked Dec 17 '15 at 06:23

makansij

1,919
5
27
38

votes

6 answers

Why do I get a 100% accuracy decision tree?

I'm getting a 100% accuracy for my decision tree. What am I doing wrong? This is my code: import pandas as pd import json import numpy as np import sklearn import matplotlib.pyplot as plt data =…

machine-learning python cart accuracy

asked Mar 22 '18 at 11:54

Nadjla

votes

1 answer

Relative variable importance for Boosting

I'm looking for an explanation of how relative variable importance is computed in Gradient Boosted Trees that is not overly general/simplistic like: The measures are based on the number of times a variable is selected for splitting, weighted by the…

machine-learning data-mining predictive-models cart boosting

asked Jul 19 '15 at 13:29

Antoine

5,740
7
29
53

votes

3 answers

Why are Decision Trees not computationally expensive?

In An Introduction to Statistical Learning with Applications in R, the authors write that fitting a decision tree is very fast, but this doesn't make sense to me. The algorithm has to go through every feature and partition it in every way possible…

cart

asked Jul 24 '17 at 02:05

matt_js

votes

4 answers

What is the weak side of decision trees?

Decision trees seems to be a very understandable machine learning method. Once created it can be easily inspected by a human which is a great advantage in some applications. What are the practical weak sides of Decision Trees?

machine-learning nonparametric cart

asked Aug 05 '10 at 10:42

Łukasz Lew

1,312
2
14
24

votes

1 answer

What are some useful guidelines for GBM parameters?

What are some useful guidelines for testing parameters (i.e. interaction depth, minchild, sample rate, etc.) using GBM? Let's say I have 70-100 features, a population of 200,000 and I intend to test interaction depth of 3 and 4. Clearly I need to do…

r hypothesis-testing cart boosting

asked Apr 03 '12 at 03:27

Ram Ahluwalia

3,003
6
27
38

votes

5 answers

Are decision trees almost always binary trees?

Nearly every decision tree example I've come across happens to be a binary tree. Is this pretty much universal? Do most of the standard algorithms (C4.5, CART, etc.) only support binary trees? From what I gather, CHAID is not limited to binary…

machine-learning data-mining cart

asked Jun 21 '11 at 21:29

Michael McGowan

4,561
3
31
46

votes

4 answers

How to measure/rank "variable importance" when using CART? (specifically using {rpart} from R)

When building a CART model (specifically classification tree) using rpart (in R), it is often interesting to know what is the importance of the various variables introduced to the model. Thus, my question is: What common measures exists for…

r classification model-selection cart rpart

asked Jan 23 '11 at 22:06

Tal Galili

19,935
32
133
195

2 3

…

80 81 Next