Most Popular
1500 questions
51
votes
5 answers
Prediction in Cox regression
I am doing a multivariate Cox regression, I have my significant independent variables and beta values. The model fits to my data very well.
Now, I would like to use my model and predict the survival of a new observation.
I am unclear how to do this…

Marja
- 513
- 1
- 5
- 4
51
votes
8 answers
What is a good resource on table design?
I've seen various theoretical treatments of graphics, such as the Grammar of Graphics. But I have seen nothing equivalent with regards to tables. Over the while I have developed an informal model of good practice in table design.
However, I'd like…

Jeromy Anglim
- 42,044
- 23
- 146
- 250
51
votes
7 answers
When conducting a t-test why would one prefer to assume (or test for) equal variances rather than always use a Welch approximation of the df?
It seems like when the assumption of homogeneity of variance is met that the results from a Welch adjusted t-test and a standard t-test are approximately the same. Why not simply always use the Welch adjusted t?

russellpierce
- 17,079
- 16
- 67
- 98
51
votes
4 answers
Cumming (2008) claims that distribution of p-values obtained in replications depends only on the original p-value. How can it be true?
I have been reading Geoff Cumming's 2008 paper Replication and $p$ Intervals: $p$ values predict the future only vaguely, but confidence intervals do much better [~200 citations in Google Scholar] -- and am confused by one of its central claims.…

amoeba
- 93,463
- 28
- 275
- 317
51
votes
6 answers
Understanding LSTM units vs. cells
I have been studying LSTMs for a while. I understand at a high level how everything works. However, going to implement them using Tensorflow I've noticed that BasicLSTMCell requires a number of units (i.e. num_units) parameter.
From this very…
user124589
51
votes
2 answers
Choosing the right linkage method for hierarchical clustering
I am performing hierarchical clustering on data I've gathered and processed from the reddit data dump on Google BigQuery.
My process is the following:
Get the latest 1000 posts in /r/politics
Gather all the comments
Process the data and compute an…

Kevin Eger
- 611
- 1
- 6
- 4
51
votes
3 answers
Different ways to write interaction terms in lm?
I have a question about which is the best way to specify an interaction in a regression model. Consider the following data:
d <- structure(list(r = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),…

Manuel Ramón
- 2,045
- 4
- 15
- 16
51
votes
3 answers
How does centering make a difference in PCA (for SVD and eigen decomposition)?
What difference does centering (or de-meaning) your data make for PCA? I've heard that it makes the maths easier or that it prevents the first PC from being dominated by the variables' means, but I feel like I haven't been able to firmly grasp the…

Zenit
- 1,586
- 2
- 17
- 19
51
votes
1 answer
How to determine whether or not the y-axis of a graph should start at zero?
One common way to "lie with data" is to use a y-axis scale that makes it seem as if changes are more significant than they really are.
When I review scientific publications, or students' lab reports, I am often frustrated by this "data visualization…

ff524
- 727
- 1
- 5
- 9
51
votes
4 answers
If the t-test and the ANOVA for two groups are equivalent, why aren't their assumptions equivalent?
I'm sure I've got this completely wrapped round my head, but I just can't figure it out.
The t-test compares two normal distributions using the Z distribution. That's why there's an assumption of normality in the DATA.
ANOVA is equivalent to linear…

Chris Beeley
- 5,465
- 5
- 36
- 40
51
votes
3 answers
How are we defining 'reproducible research'?
This has come up in a few questions now, and I've been wondering about something. Has the field as a whole moved toward "reproducibility" focusing on the availability of the original data, and the code in question?
I was always taught that the core…

Fomite
- 21,264
- 10
- 78
- 137
51
votes
2 answers
Why does frequentist hypothesis testing become biased towards rejecting the null hypothesis with sufficiently large samples?
I was just reading this article on the Bayes factor for a completely unrelated problem when I stumbled upon this passage
Hypothesis testing with Bayes factors is more robust than frequentist hypothesis testing, since the Bayesian form avoids model…

Louis Thibault
- 643
- 6
- 6
50
votes
5 answers
Probability distribution for different probabilities
If I wanted to get the probability of 9 successes in 16 trials with each trial having a probability of 0.6 I could use a binomial distribution. What could I use if each of the 16 trials has a different probability of success?

Greg
- 683
- 2
- 6
- 7
50
votes
7 answers
Logistic Regression in R (Odds Ratio)
I'm trying to undertake a logistic regression analysis in R. I have attended courses covering this material using STATA. I am finding it very difficult to replicate functionality in R. Is it mature in this area? There seems to be little…

SabreWolfy
- 1,101
- 2
- 15
- 25
50
votes
7 answers
Why is "statistically significant" not enough?
I have completed my data analysis and got "statistically significant results" which is consistent with my hypothesis. However, a student in statistics told me this is a premature conclusion. Why? Is there anything else needed to be included in my…

Jim Von
- 611
- 6
- 7