Questions tagged [statsmodels]

A Python module for exploration, testing, and estimation. Do not use this tag for general statistical modeling questions! Nb, questions only about the module itself, Python, or coding will likely be off topic.

Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests (http://www.statsmodels.org/)

383 questions
56
votes
3 answers

Logistic Regression: Scikit Learn vs Statsmodels

I am trying to understand why the output from logistic regression of these two libraries gives different results. I am using the dataset from UCLA idre tutorial, predicting admit based on gre, gpa and rank. rank is treated as categorical variable,…
hurrikale
  • 853
  • 1
  • 8
  • 7
53
votes
2 answers

Pandas / Statsmodel / Scikit-learn

Are Pandas, Statsmodels and Scikit-learn different implementations of machine learning/statistical operations, or are these complementary to one another? Which of these has the most comprehensive functionality? Which one is actively developed…
Nik
  • 1,279
  • 2
  • 13
  • 19
25
votes
3 answers

Analyse ACF and PACF plots

I want to see if I am on the right track analysing my ACF and PACF plots: Background: (Reff: Philip Hans Franses, 1998) As both ACF and PACF show significant values, I assume that an ARMA-model will serve my needs The ACF can be used to estimate…
Peter Knutsen
  • 367
  • 1
  • 3
  • 8
23
votes
4 answers

Difference between statsmodel OLS and scikit linear regression

I have a question about two different methods from different libraries which seems doing same job. I am trying to make linear regression model. Here is the code which I using statsmodel library with OLS : X_train, X_test, y_train, y_test =…
Batuhan B
  • 573
  • 2
  • 5
  • 13
19
votes
2 answers

Ordinal logistic regression in Python

I would like to run an ordinal logistic regression in Python - for a response variable with three levels and with a few explanatory factors. The statsmodels package supports binary logit and multinomial logit (MNLogit) models, but not ordered logit.…
Hadi
  • 199
  • 1
  • 1
  • 4
14
votes
2 answers

Logistic regression with binomial data in Python

This is probably trivial but I couldn't figure it out. I want to fit a logistic regression model, where my dependent variable is not a Bernoulli variable, but a binomial count. Namely, for each $X_i$, I have $s_i$, the number of successes, and…
R S
  • 507
  • 1
  • 5
  • 15
13
votes
4 answers

Statsmodels says ARIMA is not appropriate because series is not stationary, how is it testing that?

I have a time series that I am trying to model with Python's statsmodels ARIMA api. When I apply the following: from statsmodels.tsa.arima_model import ARIMA model = ARIMA(data['Sales difference'].dropna(), order=(2, 1, 2)) results_AR =…
Skander H.
  • 10,602
  • 2
  • 33
  • 81
13
votes
1 answer

Cause of a high condition number in a python statsmodels regression?

I'm pretty new to regression analysis, and I'm using python's statsmodels to look at the relationship between GDP/health/social services spending and health outcomes (DALYs) across the OECD. Just to give an idea of the data I'm using, this is a…
pst0102
  • 131
  • 1
  • 1
  • 5
12
votes
2 answers

The identity link function does not respect the domain of the Gamma family?

I am using using a gamma generalized linear model (GLM) with an identity link. The independent variable is the compensation of a particular group. Python's statsmodels summary is giving me a warning about the identity link function ("DomainWarning:…
12
votes
1 answer

Why does statsmodels.api.OLS over-report the r-squared value?

I am using statsmodels.api.OLS to fit a linear regression model with 4 input-features. The shape of the data is: X_train.shape, y_train.shape Out[]: ((350, 4), (350,)) Then I fit the model and compute the r-squared value in 3 different…
dhrumeel
  • 281
  • 1
  • 2
  • 8
10
votes
1 answer

Why is forecasting of ARMA models performed by Kalman filter

What are the advantages of expressing an ARMA model as a state-space-model and do forecasting using a Kalman filter? This methodology is for example used in the SARIMAX implementation of…
10
votes
1 answer

Assessing the Contribution of each Predictor in Linear Regression

Say I build a linear regression model to identify linear dependencies between variables in my data. Some of these variables are categorical variables. If I want to evaluate the contribution of a given predictor, how do I evaluate it? Can I compare…
Amelio Vazquez-Reina
  • 17,546
  • 26
  • 74
  • 110
9
votes
2 answers

Dummy/baseline models for time series forecasting

I am working on an evaluation of time series forecasting models in Python, more specifically with statsmodels, scikit-learn and tensorflow. I think it makes sense to first compare the model performance to a set of "trivial" models. What are examples…
clstaudt
  • 243
  • 1
  • 6
9
votes
2 answers

Can we say 50% of data will be between 25th-75th percentile?

Let's say we have the following dataframe: TY_MAX 141 1.004622 142 1.004645 143 1.004660 144 1.004672 145 1.004773 146 1.004820 147 1.004814 148 1.004807 149 1.004773 150 1.004820 151 1.004814 152 1.004834 153 1.005117 154 …
Don Coder
  • 435
  • 4
  • 10
9
votes
1 answer

Proving similarities of two time series

Let's assume an analytical model predicts an epidemic trend over time, i.e. number of infections over time. We also have a computer simulation results over time to verify the performance of the model. The goal is to prove the simulation results and…
Moe
  • 91
  • 1
  • 1
  • 3
1
2 3
25 26