Questions tagged [scipy]

SciPy is a Python-based ecosystem of open-source software for mathematics, science, and engineering.

SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. In particular, these are some of the core packages:

  • NumPy: Base N-dimensional array package
  • SciPy library: Fundamental library for scientific computing
  • Matplotlib: Comprehensive 2D Plotting
  • IPython: Enhanced Interactive Console
  • Sympy: Symbolic mathematics
  • pandas: Data structures & analysis
290 questions
21
votes
2 answers

Difference between scikit-learn implementations of PCA and TruncatedSVD

I understand the relation between Principal Component Analysis and Singular Value Decomposition at an algebraic/exact level. My question is about the scikit-learn implementation. The documentation says: "[TruncatedSVD] is very similar to PCA, but…
drake
  • 312
  • 1
  • 2
  • 9
19
votes
1 answer

Beta distribution fitting in Scipy

According to Wikipedia the beta probability distribution has two shape parameters: $\alpha$ and $\beta$. When I call scipy.stats.beta.fit(x) in Python, where x is a bunch of numbers in the range $[0,1]$, 4 values are returned. This strikes me as…
Peter Smit
  • 458
  • 1
  • 3
  • 10
13
votes
2 answers

Kolmogorov–Smirnov test: p-value and ks-test statistic decrease as sample size increases

Why do p-values and ks-test statistics decrease with increasing sample size? Take this Python code as an example: import numpy as np from scipy.stats import norm, ks_2samp np.random.seed(0) for n in [10, 100, 1000, 10000, 100000, 1000000]: x =…
Oliver Angelil
  • 1,129
  • 1
  • 11
  • 24
13
votes
2 answers

Chi-squared test with scipy: what's the difference between chi2_contingency and chisquare?

I'd like to run a chi-squared test in Python with scipy. I've created code to do this, but I don't know if what I'm doing is right, because the scipy docs are quite sparse. Background first: I have two groups of users. My null hypothesis is that…
11
votes
4 answers

Fitting log-normal distribution in R vs. SciPy

I've fitted a lognormal model using R with a set of data. The resulting parameters were: meanlog = 4.2991610 sdlog = 0.5511349 I'd like to transfer this model to Scipy, which I've never used before. Using Scipy, I was able to get a shape and scale…
8
votes
0 answers

ks_2samp test in Python scipy - low D statistic, low p-value?

As the heading says, I'm getting both D statistic and p-value to be low in ks_2samp test. More specificaly: Ks_2sampResult(statistic=0.049890046265079313, pvalue=0.0011365796735152277) I think these two results seem kind of contradictory. If the…
8
votes
2 answers

Can we really sample from a Continuous distribution (Scipy function) and what does it mean?

I've seen this answer: How is it logically possible to sample a single value from a continuous distribution?, but it's still not very clear to me. In Scipy, there's a function scipy.stats.norm.rvs() which samples from a normal distribution. I was…
8
votes
1 answer

Why does scipy.stats.anderson_ksamp give a p-value of over a million for these data?

When I run from scipy.stats import anderson_ksamp a = [-1.8, -2.4, -2.4, -0.0, -1.5, -2.7, -1.8, -3.0, -1.8, -1.2, -3.0, -3.0, -2.8, -3.0, -2.1, -0.0, 0.6, -2.5, -2.4, -0.0, -2.7, -0.0, -2.5, -2.1, -0.9, -3.0, -0.6, -0.6, -1.5, -2.2, -1.2, -2.4,…
8
votes
2 answers

How do I force the L-BFGS-B to not stop early? Projected gradient is zero

I'm trying to use the SciPy implementation of the fmin_l_bfgs_b algorithm using the following code: imgOpt, cost, info = fmin_l_bfgs_b(func, x0=img, args=(spec_layer, spec_weight, regularization), approx_grad=1,bounds=constraintPairs, iprint=2) The…
pir
  • 4,626
  • 10
  • 38
  • 73
8
votes
1 answer

Understanding scipy Kolmogorov-Smirnov test

I'm trying to understand the Kolmogorov-Smirnov test using a very simple example. I generate a set of random, uniform values between 0 and 1.0. I then test that these values are from a uniform distribution by using the scipy kstest function. I'm…
user17426
  • 183
  • 1
  • 1
  • 4
7
votes
1 answer

Interpretation of weights in non-linear least squares regression

I am conducting a non-linear least squares regression fit using the python scipy.optimize.curve_fit function, and am trying to better understand the weights that go into this method. I have a distribution of raw data points that I wish to fit to a…
7
votes
1 answer

How to Compare the Data Distribution of 2 datasets?

I'm having trouble to understand how to compare 2 sets of data by their distribution . For Example, how can I understand that column X100 has the same distribution as column Y1? Also, is there a way to express the distribution comparison of all…
Sahar Millis
  • 173
  • 1
  • 1
  • 6
7
votes
2 answers

What exactly does scipy.stats.ttest_ind test?

From the description: "This is a two-sided test for the null hypothesis that 2 independent samples have identical average (expected) values." Taken literally, this seems to be saying that we're testing $H_0: \bar{x} = \bar{y}$, but since we know…
jeremy radcliff
  • 826
  • 1
  • 6
  • 15
7
votes
1 answer

When should I use `scipy.stats.wilcoxon` instead of `scipy.stats.ranksums`?

I've been using http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wilcoxon.html but then I realized that there is http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ranksums.html They sound like they are pretty much the…
7
votes
0 answers

Goodness-of-fit for Discrete Distributions

I've been doing some data analysis with Scipy. So far I accomplished this with continuous distributions: I can fit a probability distribution to a set of data points using a maximum likelihood fit. For example using stats.chi2.fit(data_points). I…
1
2 3
19 20