Questions tagged [numpy]

NumPy is the fundamental package for scientific computing with Python.

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

  • a powerful N-dimensional array object
  • sophisticated (broadcasting) functions
  • tools for integrating C/C++ and Fortran code
  • useful linear algebra, Fourier transform, and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

146 questions
18
votes
1 answer

How does NumPy solve least squares for underdetermined systems?

Let's say that we have X of shape (2, 5) and y of shape (2,) This works: np.linalg.lstsq(X, y) We would expect this to work only if X was of shape (N,5) where N>=5 But why and how? We do get back 5 weights as expected but how is this problem…
13
votes
5 answers

How to calculate a Gaussian kernel effectively in numpy

I have a numpy array with m columns and n rows, the columns being dimensions and the rows datapoints. I now need to calculate kernel values for each combination of data points. For a linear kernel $K(\mathbf{x}_i,\mathbf{x}_j) = \langle…
Peter Smit
  • 2,030
  • 3
  • 23
  • 36
12
votes
4 answers

Best way to seed N independent random number generators from 1 value

In my program I need to run N separate threads each with their own RNG which is used to sample a large dataset. I need to be able to seed this entire process with a single value so I can reproduce results. Is it sufficient to simply sequentially…
EricR
  • 223
  • 2
  • 4
11
votes
4 answers

Fitting log-normal distribution in R vs. SciPy

I've fitted a lognormal model using R with a set of data. The resulting parameters were: meanlog = 4.2991610 sdlog = 0.5511349 I'd like to transfer this model to Scipy, which I've never used before. Using Scipy, I was able to get a shape and scale…
10
votes
1 answer

How to calculate mutual information?

I am a bit confused. Can someone explain to me how to calculate mutual information between two terms based on a term-document matrix with binary term occurrence as weights? $$ \begin{matrix} & 'Why' & 'How' & 'When' & 'Where' \\ Document1…
user18075
  • 617
  • 1
  • 6
  • 14
7
votes
2 answers

Mean centering for PCA in a 2D array...across rows or cols?

I'm pretty new at this and I'm picking my way through the steps for running PCA on a 2D numpy array. Each subarray represents all pixels of an image (all rows & cols flattened). Example: a = np.array([ [1,2,3], [4,5,6], [7,8,9] ]) # so, a[0],…
vulture
  • 213
  • 1
  • 2
  • 4
6
votes
1 answer

Are 1-dimensional numpy arrays equivalent to vectors?

I'm new to both linear algebra and numpy, so please bear with me. I'm taking a course on linear regression, where I learned that we can express our hypothesis as $\theta^TX$ where $\theta$ is our coefficient vector (written in math notation as a…
user153009
5
votes
1 answer

PCA principal components in sklearn not matching eigen-vectors of covariance calculated by numpy

I was trying to replicate PCA in sklearn's PCA API using numpy using PCA in numpy and sklearn produces different results. I noticed that: eigenvalues are same as the PCA object's explained_variance_ atribute along with the order eigenvectors are…
Piyush Singh
  • 188
  • 1
  • 8
5
votes
1 answer

Log Transformation Instead of Z-Score Normalizatrion For Machine Learning

I almost always used Numpy's StandardScaler to normalize my data for machine learning. I noticed however that simply taking the log of the variables that I wanted to normalize often resulted in better accuracy compared to when I used the…
5
votes
3 answers

How is the poisson distribution a distribution? It seems more like a formula

I just watched this video: https://www.youtube.com/watch?v=Fk02TW6reiA It shows a formula to calculate an answer for the following problem: There are 2 customers expected every 3 minutes in a store Therefore there are 6 customers expected every 9…
4
votes
5 answers

Neural network based on twitter followers, what would be my features?

I was thinking of training a neural network that would be able to classify twitter users according to their followers. For example, I would like to know if a user is "gamer" or not by the people they follow (not the number, but the list of the…
Sharki
4
votes
1 answer

QR Factorization to Solve Least Squares Without Using an Inverse

I'm playing around with different ways to solve least squares, and am using numpy to derive values for $\beta$ in a regression problem. I know that if you do a $QR$ factorization of $X$ such that $ X = QR $ where Q is an $m x n$ orthonormal…
4
votes
0 answers

James Stein Estimator for more than one Sample

I have a hard time understanding the James-Stein Estimator. I show you how I tried to comprehend it by using a python example. I take a normally distributed random vector with mean $(0.1, 0.2, 0.3, 0.15, 0.11, 0.87)$ and variance 1 for each vector…
4
votes
1 answer

Implementing Lasso Regression in Numpy

I'm doing a little self study project, and am trying to implement OLS, Ridge, and Lasso regression from scratch using just Numpy, and am having problems getting this to work with Lasso regression. To check my results I'm comparing my results with…
4
votes
2 answers

python computing likelihood causing exp overflow

I am using numpy to compute the likelihood of a variable $Z$ using numpy. $Z$ is a Bernoulli random variable which has two outcomes $[0,1]$. I compute the log likelihood of observing $Z$ given the parameter is $x=[ -3146,-1821]$. These numbers are…
JYY
  • 697
  • 5
  • 13
1
2 3
9 10