Most Popular

1500 questions
48
votes
2 answers

Logistic regression model does not converge

I've got some data about airline flights (in a data frame called flights) and I would like to see if the flight time has any effect on the probability of a significantly delayed arrival (meaning 10 or more minutes). I figured I'd use logistic…
Daniel Standage
  • 1,109
  • 3
  • 13
  • 21
48
votes
3 answers

Why is there a difference between manually calculating a logistic regression 95% confidence interval, and using the confint() function in R?

Dear everyone - I've noticed something strange that I can't explain, can you? In summary: the manual approach to calculating a confidence interval in a logistic regression model, and the R function confint() give different results. I've been going…
Andrew
  • 5,478
  • 5
  • 21
  • 21
48
votes
13 answers

Can machine learning decode the SHA256 hashes?

I have a 64 character SHA256 hash. I'm hoping to train a model that can predict if the plaintext used to generate the hash begins with a 1 or not. Regardless if this is "Possible", what algorithm would be the best approach? My initial…
John
  • 521
  • 1
  • 4
  • 3
48
votes
1 answer

What is the difference between a loss function and an error function?

Is the term "loss" synonymous with "error"? Is there a difference in definition? Also, what is the origin of the term "loss"? NB: The error function mentioned here is not to be confused with normal error.
Dan Kowalczyk
  • 591
  • 1
  • 4
  • 8
48
votes
5 answers

How do I test a nonlinear association?

For plot 1, I can test the association between x and y by doing a simple correlation. For plot 2, where the relationship is nonlinear yet there is a clear relation between x and y, how can I test the association and label its nature?
48
votes
8 answers

Pitfalls in time series analysis

I am just starting out self-learning in time series analysis. I have noticed that there are a number of potential pitfalls that are not applicable to general statistics. So, building on What are common statistical sins?, I would like to ask: What…
naught101
  • 4,973
  • 1
  • 51
  • 85
48
votes
14 answers

Why is median age a better statistic than mean age?

If you look at Wolfram Alpha Or this Wikipedia page List of countries by median age Clearly median seems to be the statistic of choice when it comes to ages. I am not able to explain to myself why arithmetic mean would be a worse statistic.…
Lazer
  • 583
  • 1
  • 4
  • 6
48
votes
11 answers

Can simple linear regression be done without using plots and linear algebra?

I'm completely blind and come from a programming background. What I'm trying to do is to learn machine learning, and to do this, I first need to learn about linear regression. All the explanations on the Internet I am finding about this subject plot…
Parham Doustdar
  • 583
  • 4
  • 8
48
votes
3 answers

Bootstrap vs. permutation hypothesis testing

There are several popular resampling techniques, which are often used in practice, such as bootstrapping, permutation test, jackknife, etc. There are numerous articles & books discuss these techniques, for example Philip I Good (2010) Permutation,…
Tu.2
  • 2,627
  • 6
  • 26
  • 26
48
votes
3 answers

How to interpret Mean Decrease in Accuracy and Mean Decrease GINI in Random Forest models

I'm having some difficulty understanding how to interpret variable importance output from the Random Forest package. Mean decrease in accuracy is usually described as "the decrease in model accuracy from permuting the values in each feature". Is…
FlacoT
  • 740
  • 2
  • 7
  • 8
48
votes
7 answers

Where to start with statistics for an experienced developer

During the first half of 2015 I did the coursera course of Machine Learning (by Andrew Ng, GREAT course). And learned the basics of machine learning (linear regression, logistic regression, SVM, Neuronal Networks...) Also I have been a developer for…
48
votes
6 answers

How do I avoid overlapping labels in an R plot?

I'm trying to label a pretty simple scatterplot in R. This is what I use: plot(SI, TI) text(SI, TI, Name, pos=4, cex=0.7) The result is mediocre, as you can see (click to enlarge): I tried to compensate for this using the textxy function, but it's…
slhck
  • 787
  • 2
  • 8
  • 20
48
votes
5 answers

What can we say about population mean from a sample size of 1?

I am wondering what we can say, if anything, about the population mean, $\mu$ when all I have is one measurement, $y_1$ (sample size of 1). Obviously, we'd love to have more measurements, but we can't get them. It seems to me that since the sample…
thedu
  • 505
  • 4
  • 6
47
votes
3 answers

Logistic regression vs. LDA as two-class classifiers

I am trying to wrap my head around the statistical difference between Linear discriminant analysis and Logistic regression. Is my understanding right that, for a two class classification problem, LDA predicts two normal density functions (one for…
user1885116
  • 2,128
  • 3
  • 23
  • 26
47
votes
3 answers

Why is polynomial regression considered a special case of multiple linear regression?

If polynomial regression models nonlinear relationships, how can it be considered a special case of multiple linear regression? Wikipedia notes that "Although polynomial regression fits a nonlinear model to the data, as a statistical estimation…