Most Popular
1500 questions
54
votes
3 answers
Interpretation of log transformed predictor and/or response
I'm wondering if it makes a difference in interpretation whether only the dependent, both the dependent and independent, or only the independent variables are log transformed.
Consider the case of
log(DV) = Intercept + B1*IV + Error
I can…

upabove
- 2,657
- 10
- 30
- 37
54
votes
1 answer
How large should the batch size be for stochastic gradient descent?
I understand that stochastic gradient descent may be used to optimize a neural network using backpropagation by updating each iteration with a different sample of the training dataset. How large should the batch size be?

Simon Kuang
- 2,051
- 3
- 17
- 18
54
votes
8 answers
How to tell the probability of failure if there were no failures?
I was wondering if there is a way to tell the probability of something failing (a product) if we have 100,000 products in the field for 1 year and with no failures? What is the probability that one of the next 10,000 products sold fail?

melonfresh
- 541
- 1
- 5
- 3
54
votes
4 answers
Replicating Stata's "robust" option in R
I have been trying to replicate the results of the Stata option robust in R. I have used the rlm command form the MASS package and also the command lmrob from the package "robustbase". In both cases the results are quite different from the "robust"…

user56579
- 541
- 1
- 5
- 4
54
votes
3 answers
Do we have a problem of "pity upvotes"?
I know, this may sound like it is off-topic, but hear me out.
At Stack Overflow and here we get votes on posts, this is all stored in a tabular form.
E.g.:
post id voter id vote type datetime
------- -------- --------- …

Sam Saffron
- 619
- 4
- 7
53
votes
10 answers
Machine Learning using Python
I am considering using Python libraries for doing my Machine Learning experiments. Thus far, I had been relying on WEKA, but have been pretty dissatisfied on the whole. This is primarily because I have found WEKA to be not so well supported (very…

Andy
- 1,583
- 3
- 21
- 19
53
votes
5 answers
Correct spelling (capitalization, italicization, hyphenation) of "p-value"?
I realize this is pedantic and trite, but as a researcher in a field outside of statistics, with limited formal education in statistics, I always wonder if I'm writing "p-value" correctly. Specifically:
Is the "p" supposed to be capitalized?
Is the…

gotgenes
- 913
- 2
- 8
- 9
53
votes
2 answers
Why is a Bayesian not allowed to look at the residuals?
In the article "Discussion: Should Ecologists Become Bayesians?" Brian Dennis gives a surprisingly balanced and positive view of Bayesian statistics when his aim seems to be to warn people about it. However, in one paragraph, without any citations…

Mankka
- 633
- 5
- 8
53
votes
2 answers
Linear kernel and non-linear kernel for support vector machine?
When using support vector machine, are there any guidelines on choosing linear kernel vs. nonlinear kernel, like RBF? I once heard that non-linear kernel tends not to perform well once the number of features is large. Are there any references on…

user3269
- 4,622
- 8
- 43
- 53
53
votes
2 answers
Why are MA(q) time series models called "moving averages"?
When I read "moving average" in relation to a time series, I think something like $\frac{(x_{t-1} + x_{t-2} + x_{t-3})}3$, or perhaps a weighted average like $0.5x_{t-1} + 0.3x_{t-2} + 0.2x_{t-3}$.
(I realize these are actually AR(3) models, but…

Stats newb
- 583
- 1
- 5
- 7
53
votes
4 answers
Is a sample covariance matrix always symmetric and positive definite?
When computing the covariance matrix of a sample, is one then guaranteed to get a symmetric and positive-definite matrix?
Currently my problem has a sample of 4600 observation vectors and 24 dimensions.

Morten
- 918
- 1
- 9
- 11
53
votes
4 answers
Multinomial logistic regression vs one-vs-rest binary logistic regression
Lets say we have a dependent variable $Y$ with few categories and set of independent variables.
What are the advantages of multinomial logistic regression over set of binary logistic regressions (i.e. one-vs-rest scheme)? By set of binary logistic…

Tomek Tarczynski
- 3,854
- 7
- 29
- 37
53
votes
2 answers
Pandas / Statsmodel / Scikit-learn
Are Pandas, Statsmodels and Scikit-learn different implementations of machine learning/statistical operations, or are these complementary to one another?
Which of these has the most comprehensive functionality?
Which one is actively developed…

Nik
- 1,279
- 2
- 13
- 19
53
votes
2 answers
How to simulate artificial data for logistic regression?
I know I'm missing something in my understanding of logistic regression, and would really appreciate any help.
As far as I understand it, the logistic regression assumes that the probability of a '1' outcome given the inputs, is a linear combination…

zorbar
- 727
- 1
- 7
- 9
53
votes
4 answers
Fast linear regression robust to outliers
I am dealing with linear data with outliers, some of which are at more the 5 standard deviations away from the estimated regression line. I'm looking for a linear regression technique that reduces the influence of these points.
So far what I did is…

Matteo Fasiolo
- 3,134
- 2
- 20
- 29