Questions tagged [scikit-learn]

A machine-learning library for Python. Use this tag for any on-topic question that (a) involves scikit-learn either as a critical part of the question or expected answer, & (b) is not just about how to use scikit-learn.

A machine learning framework for Python.

scikit-learn is a machine-learning library for Python that provides simple and efficient tools for data analysis and data mining. It is accessible to everybody and reusable in various contexts. It is built on NumPy, SciPy, and matplotlib. The project is open source and commercially usable (BSD license).

1653 questions

votes

1 answer

How to split the dataset for cross validation, learning curve, and final evaluation?

What is an appropriate strategy for splitting the dataset? I ask for feedback on the following approach (not on the individual parameters like test_size or n_iter, but if I used X, y, X_train, y_train, X_test, and y_test appropriately and if the…

asked Apr 30 '14 at 10:44

tobip

1,450
4
14
11

votes

3 answers

One-hot vs dummy encoding in Scikit-learn

There are two different ways to encoding categorical variables. Say, one categorical variable has n values. One-hot encoding converts it into n variables, while dummy encoding converts it into n-1 variables. If we have k categorical variables, each…

regression categorical-data data-transformation scikit-learn data-preprocessing

asked Jul 16 '16 at 04:26

Munichong

1,645
3
15
26

votes

5 answers

How does one interpret SVM feature weights?

I am trying to interpret the variable weights given by fitting a linear SVM. (I'm using scikit-learn): from sklearn import svm svm = svm.SVC(kernel='linear') svm.fit(features, labels) svm.coef_ I cannot find anything in the documentation that…

svm feature-selection python scikit-learn

asked Oct 11 '12 at 20:48

Austin Richardson

votes

3 answers

Logistic Regression: Scikit Learn vs Statsmodels

I am trying to understand why the output from logistic regression of these two libraries gives different results. I am using the dataset from UCLA idre tutorial, predicting admit based on gre, gpa and rank. rank is treated as categorical variable,…

regression logistic python scikit-learn statsmodels

asked Mar 25 '16 at 22:01

hurrikale

votes

2 answers

Pandas / Statsmodel / Scikit-learn

Are Pandas, Statsmodels and Scikit-learn different implementations of machine learning/statistical operations, or are these complementary to one another? Which of these has the most comprehensive functionality? Which one is actively developed…

machine-learning python scikit-learn statsmodels pandas

asked Jan 17 '13 at 01:02

Nik

1,279
2
13
19

votes

1 answer

what does the numbers in the classification report of sklearn mean?

I have below an example I pulled from sklearn 's sklearn.metrics.classification_report documentation. What I don't understand is why there are f1-score, precision and recall values for each class where I believe class is the predictor label? I…

machine-learning python scikit-learn precision-recall

asked Oct 02 '14 at 18:26

jxn

votes

2 answers

Mean absolute percentage error (MAPE) in Scikit-learn

How can we calculate the Mean absolute percentage error (MAPE) of our predictions using Python and scikit-learn? From the docs, we have only these 4 metric functions for Regressions: metrics.explained_variance_score(y_true,…

predictive-models python scikit-learn mape

asked May 07 '13 at 16:52

Nyxynyx

votes

2 answers

Area under Precision-Recall Curve (AUC of PR-curve) and Average Precision (AP)

Is Average Precision (AP) the Area under Precision-Recall Curve (AUC of PR-curve) ? EDIT: here is some comment about difference in PR AUC and AP. The AUC is obtained by trapezoidal interpolation of the precision. An alternative and usually…

scikit-learn precision-recall auc average-precision

asked Jun 15 '15 at 09:37

mrgloom

1,687
4
25
33

votes

4 answers

Polynomial regression using scikit-learn

I am trying to use scikit-learn for polynomial regression. From what I read polynomial regression is a special case of linear regression. I was hopping that maybe one of scikit's generalized linear models can be parameterised to fit higher order…

regression machine-learning large-data polynomial scikit-learn

asked May 11 '13 at 20:00

Mihai Damian

votes

4 answers

Ensemble of different kinds of regressors using scikit-learn (or any other python framework)

I am trying to solve the regression task. I found out that 3 models are working nicely for different subsets of data: LassoLARS, SVR and Gradient Tree Boosting. I noticed that when I make predictions using all these 3 models and then make a table of…

regression scikit-learn ensemble-learning

asked Feb 24 '15 at 14:29

Maksim Khaitovich

votes

2 answers

PCA in numpy and sklearn produces different results

Am i misunderstanding something. This is my code using sklearn import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D from sklearn import decomposition from sklearn import datasets from sklearn.preprocessing…

pca python scikit-learn

asked Sep 20 '16 at 04:45

aceminer

votes

3 answers

XGBoost vs Python Sklearn gradient boosted trees

I am trying to understand how XGBoost works. I already understand how gradient boosted trees work on Python sklearn. What is not clear to me is if XGBoost works the same way, but faster, or if there are fundamental differences between it and the…

scikit-learn boosting

asked May 30 '17 at 05:21

Fairly Nerdy

votes

4 answers

Multilabel classification metrics on scikit

I am trying to build a multi-label classifier so as to assign topics to existing documents using scikit I am processing my documents passing them through the TfidfVectorizer the labels through the MultiLabelBinarizer and created a…

scikit-learn multi-class multilabel

asked Sep 04 '16 at 07:22

mobius

votes

3 answers

How to systematically remove collinear variables (pandas columns) in Python?

Thus far, I have removed collinear variables as part of the data preparation process by looking at correlation tables and eliminating variables that are above a certain threshold. Is there a more accepted way of doing this? Additionally, I am aware…

python multicollinearity scikit-learn

asked Jun 01 '15 at 18:47

orange1

votes

2 answers

Why is Python's scikit-learn LDA not working correctly and how does it compute LDA via SVD?

I was using the Linear Discriminant Analysis (LDA) from the scikit-learn machine learning library (Python) for dimensionality reduction and was a little bit curious about the results. I am wondering now what the LDA in scikit-learn is doing so that…

python scikit-learn dimensionality-reduction discriminant-analysis svd

asked Jul 28 '14 at 19:37

user39663

2 3

…

99 100 Next