Questions tagged [python]

Python is a programming language commonly used for machine learning. Use this tag for any *on-topic* question that (a) involves `Python` either as a critical part of the question or expected answer, & (b) is not *just* about how to use `Python`.

Python (Wikipedia page) is a general purpose programming language designed for ease of use. It is a commonly used platform for machine learning. Two very popular threads concerned with using Python for statistics and machine learning are:

Be aware that Python-based questions are frequently migrated between Cross Validated (CV) and Stack Overflow (SO). CV fields questions with statistical / machine learning content, and SO fields questions of programming and implementation. Python questions can be on topic here when they are centrally about statistics / ML while involving Python either as a critical part of the question or expected answer. However, questions that are just about how to use Python / why it works a certain way, etc., are off topic here. Many such questions can be on topic on SO if they have a reproducible example.

We maintain a list of Python resources available on the internet in our Internet Support for Statistics Software meta.CV thread.

There is an extensive wiki for Python on SO here.

4198 questions

376

votes

26 answers

Python as a statistics workbench

Lots of people use a main tool like Excel or another spreadsheet, SPSS, Stata, or R for their statistics needs. They might turn to some specific package for very special needs, but a lot of things can be done with a simple spreadsheet or a general…

r spss stata python

asked Aug 12 '10 at 10:46

Fabian Fagerholm

270

votes

6 answers

What is batch size in neural network?

I'm using Python Keras package for neural network. This is the link. Is batch_size equals to number of test samples? From Wikipedia we have this information: However, in other cases, evaluating the sum-gradient may require expensive evaluations…

neural-networks python terminology keras

asked May 22 '15 at 09:15

user2991243

3,621
4
22
48

113

votes

6 answers

What loss function for multi-class, multi-label classification tasks in neural networks?

I'm training a neural network to classify a set of objects into n-classes. Each object can belong to multiple classes at the same time (multi-class, multi-label). I read that for multi-class problems it is generally recommended to use softmax and…

neural-networks python loss-functions keras cross-entropy

asked Apr 17 '16 at 14:28

aKzenT

1,231
2
8
5

111

votes

2 answers

What is an embedding layer in a neural network?

In many neural network libraries, there are 'embedding layers', like in Keras or Lasagne. I am not sure I understand its function, despite reading the documentation. For example, in the Keras documentation it says: Turn positive integers (indexes)…

machine-learning neural-networks python word-embeddings

asked Nov 20 '15 at 16:43

Francesco

1,213
2
9
8

votes

9 answers

What algorithm should I use to detect anomalies on time-series?

Background I'm working in Network Operations Center, we monitor computer systems and their performance. One of the key metrics to monitor is a number of visitors\customers currently connected to our servers. To make it visible we (Ops team) collect…

machine-learning time-series python computational-statistics anomaly-detection

asked May 16 '15 at 21:10

Ilya Khadykin

votes

1 answer

How to split the dataset for cross validation, learning curve, and final evaluation?

What is an appropriate strategy for splitting the dataset? I ask for feedback on the following approach (not on the individual parameters like test_size or n_iter, but if I used X, y, X_train, y_train, X_test, and y_test appropriately and if the…

machine-learning cross-validation python scikit-learn

asked Apr 30 '14 at 10:44

tobip

1,450
4
14
11

votes

7 answers

Why is the validation accuracy fluctuating?

I have a four layer CNN to predict response to cancer using MRI data. I use ReLU activations to introduce nonlinearities. The train accuracy and loss monotonically increase and decrease respectively. But, my test accuracy starts to fluctuate wildly.…

machine-learning python deep-learning

asked Jan 08 '17 at 02:37

Raghuram

votes

5 answers

How does one interpret SVM feature weights?

I am trying to interpret the variable weights given by fitting a linear SVM. (I'm using scikit-learn): from sklearn import svm svm = svm.SVC(kernel='linear') svm.fit(features, labels) svm.coef_ I cannot find anything in the documentation that…

svm feature-selection python scikit-learn

asked Oct 11 '12 at 20:48

Austin Richardson

votes

9 answers

How do R and Python complement each other in data science?

In many tutorials or manuals the narrative seems to imply that R and python coexist as complementary components of the analysis process. To my untrained eye, however, it seems that both languages sort of do the same thing. So my question is if there…

r python software

asked Oct 06 '16 at 08:57

BioHazZzZard

votes

3 answers

Logistic Regression: Scikit Learn vs Statsmodels

I am trying to understand why the output from logistic regression of these two libraries gives different results. I am using the dataset from UCLA idre tutorial, predicting admit based on gre, gpa and rank. rank is treated as categorical variable,…

regression logistic python scikit-learn statsmodels

asked Mar 25 '16 at 22:01

hurrikale

votes

10 answers

Machine Learning using Python

I am considering using Python libraries for doing my Machine Learning experiments. Thus far, I had been relying on WEKA, but have been pretty dissatisfied on the whole. This is primarily because I have found WEKA to be not so well supported (very…

machine-learning python

asked Mar 27 '11 at 04:00

Andy

1,583
3
21
19

votes

2 answers

Pandas / Statsmodel / Scikit-learn

Are Pandas, Statsmodels and Scikit-learn different implementations of machine learning/statistical operations, or are these complementary to one another? Which of these has the most comprehensive functionality? Which one is actively developed…

machine-learning python scikit-learn statsmodels pandas

asked Jan 17 '13 at 01:02

Nik

1,279
2
13
19

votes

7 answers

Survival Analysis tools in Python

I am wondering if there are any packages for python that is capable of performing survival analysis. I have been using the survival package in R but would like to port my work to python.

python survival mortality

asked Aug 16 '10 at 12:10

MarkSAlen

2,559
5
24
25

votes

2 answers

How to interpret p-value of Kolmogorov-Smirnov test (python)?

I have Two samples that I want to test (using python) if they are drawn from the same distribution. To do that I use the statistical function ks_2samp from scipy.stats. It returns 2 values and I find difficulties how to interpret them. Help please!

python

asked May 02 '13 at 09:16

meri

votes

1 answer

what does the numbers in the classification report of sklearn mean?

I have below an example I pulled from sklearn 's sklearn.metrics.classification_report documentation. What I don't understand is why there are f1-score, precision and recall values for each class where I believe class is the predictor label? I…

machine-learning python scikit-learn precision-recall

asked Oct 02 '14 at 18:26

jxn

2 3

…

99 100 Next