Most Popular

1500 questions
36
votes
1 answer

What concepts/objects are "wrongly" formed in probability and statistics?

Some background: There is a wonderful mathematical article which argues that mathematicians have been wrong to frame mathematical formulae in terms of the constant $\pi$, and that they should have framed these things in terms of $2 \pi$ (the…
Ben
  • 91,027
  • 3
  • 150
  • 376
36
votes
1 answer

How exactly is the "effectiveness" in the Moderna and Pfizer vaccine trials estimated?

As in the title. Is this "a risk ratio"? How is it calculated, if you could provide an example with numbers for both trials, please? I am not a statistician, but I am familiar with the binomial distribution - I suppose it is used here to calculate…
36
votes
3 answers

I know the 95% confidence interval for ln(x), do I also know the 95% confidence interval of x?

Suppose the 95% confidence interval for $\ln(x)$ is $[l,u]$. Is it true that the 95% CI for $x$ is simply $[e^l, e^u]$? I have the intuition the answer is yes, because $\ln$ is a continuous function. Is there some theorem that supports/refutes my…
Tamay
  • 485
  • 3
  • 8
36
votes
3 answers

What's the relation between hierarchical models, neural networks, graphical models, bayesian networks?

They all seem to represent random variables by the nodes and (in)dependence via the (possibly directed) edges. I'm esp interested in a bayesian's point-of-view.
36
votes
2 answers

Understanding p-value

I know that there are lots of materials explaining p-value. However the concept is not easy to grasp firmly without further clarification. Here is the definition of p-value from Wikipedia: The p-value is the probability of obtaining a test…
JDL
  • 441
  • 5
  • 5
36
votes
4 answers

Determine different clusters of 1d data from database

I have a database table of data transfers between different nodes. This is a huge database (with nearly 40 million transfers). One of the attributes is the number of bytes (nbytes) transfers which range from 0 bytes to 2 tera bytes. I would like to…
Shaun
  • 361
  • 1
  • 3
  • 3
36
votes
2 answers

Calculating confidence intervals for a logistic regression

I'm using a binomial logistic regression to identify if exposure to has_x or has_y impacts the likelihood that a user will click on something. My model is the following: fit = glm(formula = has_clicked ~ has_x + has_y, data=df, …
celenius
  • 1,324
  • 4
  • 15
  • 26
36
votes
4 answers

X and Y are not correlated, but X is significant predictor of Y in multiple regression. What does it mean?

X and Y are not correlated (-.01); however, when I place X in a multiple regression predicting Y, alongside three (A, B, C) other (related) variables, X and two other variables (A, B) are significant predictors of Y. Note that the two other (A, B)…
Behacad
  • 4,916
  • 8
  • 30
  • 48
36
votes
4 answers

Data has two trends; how to extract independent trendlines?

I have a set of data that is not ordered in any particular way but when plotted clearly has two distinct trends. A simple linear regression would not really be adequate here because of the clear distinction between the two series. Is there a simple…
jonathanbsyd
  • 463
  • 4
  • 6
36
votes
4 answers

What test can I use to compare slopes from two or more regression models?

I would like to test the difference in response of two variables to one predictor. Here is a minimal reproducible example. library(nlme) ## gls is used in the application; lm would suffice for this example m.set <- gls(Sepal.Length ~ Petal.Width,…
Abe
  • 3,561
  • 7
  • 27
  • 45
36
votes
3 answers

Is a "hurdle model" really one model? Or just two separate, sequential models?

Consider a hurdle model predicting count data y from a normal predictor x: set.seed(1839) # simulate poisson with many zeros x <- rnorm(100) e <- rnorm(100) y <- rpois(100, exp(-1.5 + x + e)) # how many zeroes? table(y == 0) FALSE TRUE 31 …
Mark White
  • 8,712
  • 4
  • 23
  • 61
36
votes
4 answers

Are inconsistent estimators ever preferable?

Consistency is obviously a natural and important property of estimators, but are there situations where it may be better to use an inconsistent estimator rather than a consistent one? More specifically, are there examples of an inconsistent…
MånsT
  • 10,213
  • 1
  • 46
  • 65
36
votes
1 answer

Mathematical differences between GBM, XGBoost, LightGBM, CatBoost?

There exist several implementations of the GBDT family of model such as: GBM XGBoost LightGBM Catboost. What are the mathematical differences between these different implementations? Catboost seems to outperform the other implementations even by…
Metariat
  • 2,376
  • 4
  • 21
  • 41
36
votes
3 answers

Is it possible to calculate AIC and BIC for lasso regression models?

Is it possible to calculate AIC or BIC values for lasso regression models and other regularized models where parameters are only partially entering the equation. How does one determine the degrees of freedom? I'm using R to fit lasso regression…
Jota
  • 804
  • 1
  • 10
  • 21
36
votes
3 answers

Things to consider about masters programs in statistics

It is admission season for graduate schools. I (and many students like me) am now trying to decide which statistics program to pick. What are some things those of you who work with statistics suggest we consider about masters programs in…