Questions tagged [python]

Python is a programming language commonly used for machine learning. Use this tag for any *on-topic* question that (a) involves `Python` either as a critical part of the question or expected answer, & (b) is not *just* about how to use `Python`.

Python (Wikipedia page) is a general purpose programming language designed for ease of use. It is a commonly used platform for machine learning. Two very popular threads concerned with using Python for statistics and machine learning are:

Be aware that Python-based questions are frequently migrated between Cross Validated (CV) and Stack Overflow (SO). CV fields questions with statistical / machine learning content, and SO fields questions of programming and implementation. Python questions can be on topic here when they are centrally about statistics / ML while involving Python either as a critical part of the question or expected answer. However, questions that are just about how to use Python / why it works a certain way, etc., are off topic here. Many such questions can be on topic on SO if they have a reproducible example.

We maintain a list of Python resources available on the internet in our Internet Support for Statistics Software meta.CV thread.

There is an extensive wiki for Python on SO here.

4198 questions
376
votes
26 answers

Python as a statistics workbench

Lots of people use a main tool like Excel or another spreadsheet, SPSS, Stata, or R for their statistics needs. They might turn to some specific package for very special needs, but a lot of things can be done with a simple spreadsheet or a general…
Fabian Fagerholm
  • 215
  • 3
  • 6
  • 7
270
votes
6 answers

What is batch size in neural network?

I'm using Python Keras package for neural network. This is the link. Is batch_size equals to number of test samples? From Wikipedia we have this information: However, in other cases, evaluating the sum-gradient may require expensive evaluations…
user2991243
  • 3,621
  • 4
  • 22
  • 48
113
votes
6 answers

What loss function for multi-class, multi-label classification tasks in neural networks?

I'm training a neural network to classify a set of objects into n-classes. Each object can belong to multiple classes at the same time (multi-class, multi-label). I read that for multi-class problems it is generally recommended to use softmax and…
aKzenT
  • 1,231
  • 2
  • 8
  • 5
111
votes
2 answers

What is an embedding layer in a neural network?

In many neural network libraries, there are 'embedding layers', like in Keras or Lasagne. I am not sure I understand its function, despite reading the documentation. For example, in the Keras documentation it says: Turn positive integers (indexes)…
Francesco
  • 1,213
  • 2
  • 9
  • 8
79
votes
9 answers

What algorithm should I use to detect anomalies on time-series?

Background I'm working in Network Operations Center, we monitor computer systems and their performance. One of the key metrics to monitor is a number of visitors\customers currently connected to our servers. To make it visible we (Ops team) collect…
75
votes
1 answer

How to split the dataset for cross validation, learning curve, and final evaluation?

What is an appropriate strategy for splitting the dataset? I ask for feedback on the following approach (not on the individual parameters like test_size or n_iter, but if I used X, y, X_train, y_train, X_test, and y_test appropriately and if the…
tobip
  • 1,450
  • 4
  • 14
  • 11
65
votes
7 answers

Why is the validation accuracy fluctuating?

I have a four layer CNN to predict response to cancer using MRI data. I use ReLU activations to introduce nonlinearities. The train accuracy and loss monotonically increase and decrease respectively. But, my test accuracy starts to fluctuate wildly.…
Raghuram
  • 763
  • 1
  • 6
  • 10
60
votes
5 answers

How does one interpret SVM feature weights?

I am trying to interpret the variable weights given by fitting a linear SVM. (I'm using scikit-learn): from sklearn import svm svm = svm.SVC(kernel='linear') svm.fit(features, labels) svm.coef_ I cannot find anything in the documentation that…
Austin Richardson
  • 928
  • 1
  • 8
  • 10
56
votes
9 answers

How do R and Python complement each other in data science?

In many tutorials or manuals the narrative seems to imply that R and python coexist as complementary components of the analysis process. To my untrained eye, however, it seems that both languages sort of do the same thing. So my question is if there…
BioHazZzZard
  • 319
  • 1
  • 4
  • 5
56
votes
3 answers

Logistic Regression: Scikit Learn vs Statsmodels

I am trying to understand why the output from logistic regression of these two libraries gives different results. I am using the dataset from UCLA idre tutorial, predicting admit based on gre, gpa and rank. rank is treated as categorical variable,…
hurrikale
  • 853
  • 1
  • 8
  • 7
53
votes
10 answers

Machine Learning using Python

I am considering using Python libraries for doing my Machine Learning experiments. Thus far, I had been relying on WEKA, but have been pretty dissatisfied on the whole. This is primarily because I have found WEKA to be not so well supported (very…
Andy
  • 1,583
  • 3
  • 21
  • 19
53
votes
2 answers

Pandas / Statsmodel / Scikit-learn

Are Pandas, Statsmodels and Scikit-learn different implementations of machine learning/statistical operations, or are these complementary to one another? Which of these has the most comprehensive functionality? Which one is actively developed…
Nik
  • 1,279
  • 2
  • 13
  • 19
49
votes
7 answers

Survival Analysis tools in Python

I am wondering if there are any packages for python that is capable of performing survival analysis. I have been using the survival package in R but would like to port my work to python.
MarkSAlen
  • 2,559
  • 5
  • 24
  • 25
46
votes
2 answers

How to interpret p-value of Kolmogorov-Smirnov test (python)?

I have Two samples that I want to test (using python) if they are drawn from the same distribution. To do that I use the statistical function ks_2samp from scipy.stats. It returns 2 values and I find difficulties how to interpret them. Help please!
meri
  • 461
  • 1
  • 4
  • 3
44
votes
1 answer

what does the numbers in the classification report of sklearn mean?

I have below an example I pulled from sklearn 's sklearn.metrics.classification_report documentation. What I don't understand is why there are f1-score, precision and recall values for each class where I believe class is the predictor label? I…
jxn
  • 749
  • 2
  • 7
  • 15
1
2 3
99 100