Highest Voted Questions - Statistical Analysis Stack Exchange

51

votes

5 answers

Prediction in Cox regression

I am doing a multivariate Cox regression, I have my significant independent variables and beta values. The model fits to my data very well. Now, I would like to use my model and predict the survival of a new observation. I am unclear how to do this…

regression survival prediction cox-model

asked Sep 10 '12 at 13:12

Marja

513
1
5
4

51

votes

8 answers

What is a good resource on table design?

I've seen various theoretical treatments of graphics, such as the Grammar of Graphics. But I have seen nothing equivalent with regards to tables. Over the while I have developed an informal model of good practice in table design. However, I'd like…

tables

asked Oct 13 '10 at 01:57

Jeromy Anglim

42,044
23
146
250

51

votes

7 answers

When conducting a t-test why would one prefer to assume (or test for) equal variances rather than always use a Welch approximation of the df?

It seems like when the assumption of homogeneity of variance is met that the results from a Welch adjusted t-test and a standard t-test are approximately the same. Why not simply always use the Welch adjusted t?

variance t-test heteroscedasticity

asked Jul 20 '10 at 14:19

russellpierce

17,079
16
67
98

51

votes

4 answers

Cumming (2008) claims that distribution of p-values obtained in replications depends only on the original p-value. How can it be true?

I have been reading Geoff Cumming's 2008 paper Replication and $p$ Intervals: $p$ values predict the future only vaguely, but confidence intervals do much better [~200 citations in Google Scholar] -- and am confused by one of its central claims.…

hypothesis-testing p-value statistical-power replicability

asked Dec 07 '16 at 21:06

amoeba

93,463
28
275
317

51

votes

6 answers

Understanding LSTM units vs. cells

I have been studying LSTMs for a while. I understand at a high level how everything works. However, going to implement them using Tensorflow I've noticed that BasicLSTMCell requires a number of units (i.e. num_units) parameter. From this very…

neural-networks terminology lstm recurrent-neural-network tensorflow

asked Oct 23 '16 at 23:37

user124589

51

votes

2 answers

Choosing the right linkage method for hierarchical clustering

I am performing hierarchical clustering on data I've gathered and processed from the reddit data dump on Google BigQuery. My process is the following: Get the latest 1000 posts in /r/politics Gather all the comments Process the data and compute an…

clustering distance unsupervised-learning hierarchical-clustering

asked Feb 13 '16 at 22:09

Kevin Eger

611
1
6
4

51

votes

3 answers

Different ways to write interaction terms in lm?

I have a question about which is the best way to specify an interaction in a regression model. Consider the following data: d <- structure(list(r = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),…

r regression interaction

asked Dec 02 '11 at 20:23

Manuel Ramón

2,045
4
15
16

51

votes

3 answers

How does centering make a difference in PCA (for SVD and eigen decomposition)?

What difference does centering (or de-meaning) your data make for PCA? I've heard that it makes the maths easier or that it prevents the first PC from being dominated by the variables' means, but I feel like I haven't been able to firmly grasp the…

r pca svd eigenvalues centering

asked Jan 08 '16 at 10:57

Zenit

1,586
2
17
19

51

votes

1 answer

How to determine whether or not the y-axis of a graph should start at zero?

One common way to "lie with data" is to use a y-axis scale that makes it seem as if changes are more significant than they really are. When I review scientific publications, or students' lab reports, I am often frustrated by this "data visualization…

data-visualization

asked Dec 01 '15 at 21:12

ff524

727
1
5
9

51

votes

4 answers

If the t-test and the ANOVA for two groups are equivalent, why aren't their assumptions equivalent?

I'm sure I've got this completely wrapped round my head, but I just can't figure it out. The t-test compares two normal distributions using the Z distribution. That's why there's an assumption of normality in the DATA. ANOVA is equivalent to linear…

distributions regression normality-assumption t-test anova

asked Aug 13 '10 at 09:41

Chris Beeley

5,465
5
36
40

51

votes

3 answers

How are we defining 'reproducible research'?

This has come up in a few questions now, and I've been wondering about something. Has the field as a whole moved toward "reproducibility" focusing on the availability of the original data, and the code in question? I was always taught that the core…

reproducible-research philosophical

asked Aug 31 '11 at 03:39

Fomite

21,264
10
78
137

51

votes

2 answers

Why does frequentist hypothesis testing become biased towards rejecting the null hypothesis with sufficiently large samples?

I was just reading this article on the Bayes factor for a completely unrelated problem when I stumbled upon this passage Hypothesis testing with Bayes factors is more robust than frequentist hypothesis testing, since the Bayesian form avoids model…

hypothesis-testing frequentist

asked Jul 22 '14 at 20:06

Louis Thibault

643
6
6

50

votes

5 answers

Probability distribution for different probabilities

If I wanted to get the probability of 9 successes in 16 trials with each trial having a probability of 0.6 I could use a binomial distribution. What could I use if each of the 16 trials has a different probability of success?

distributions probability binomial-distribution

asked Apr 13 '11 at 13:34

Greg

683
2
6
7

50

votes

7 answers

Logistic Regression in R (Odds Ratio)

I'm trying to undertake a logistic regression analysis in R. I have attended courses covering this material using STATA. I am finding it very difficult to replicate functionality in R. Is it mature in this area? There seems to be little…

r logistic odds-ratio

asked Mar 23 '11 at 09:59

SabreWolfy

1,101
2
15
25

50

votes

7 answers

Why is "statistically significant" not enough?

I have completed my data analysis and got "statistically significant results" which is consistent with my hypothesis. However, a student in statistics told me this is a premature conclusion. Why? Is there anything else needed to be included in my…

hypothesis-testing statistical-significance spss p-value

asked Dec 11 '13 at 04:43

Jim Von

611
6
7

Most Popular