Questions tagged [regression-strategies]

Regression Modeling Strategies

The purpose of this category is to refer to questions and discussions about regression modeling strategies, especially when multiple methods are being combined. For example how much data reduction should be done before using $Y$? What is best practice for model validation for specific model types? How does the choice of predictive accuracy measures impact model validation? How should parameters be assigned for various parts of a model, and how does the number of parameters assigned to one part of the model affect the number of parameters to assign to another part? What is the best way to detect that parameters in a model are hard to disentangle and how could pre-modeling data reduction have helped? What is a good strategy for getting a complex model accepted by non-statisticians? When does one use traditional multivariable regression modeling vs. a black box?

290 questions

votes

8 answers

What is the benefit of breaking up a continuous predictor variable?

I'm wondering what the value is in taking a continuous predictor variable and breaking it up (e.g., into quintiles), before using it in a model. It seems to me that by binning the variable we lose information. Is this just so we can model…

asked Aug 31 '13 at 05:32

Tom

1,511
1
12
17

votes

4 answers

Can a random forest be used for feature selection in multiple linear regression?

Since RF can handle non-linearity but can't provide coefficients, would it be wise to use random forest to gather the most important features and then plug those features into a multiple linear regression model in order to obtain their coefficients?…

regression machine-learning feature-selection random-forest regression-strategies

asked Jul 30 '15 at 21:52

Hidden Markov Model

votes

5 answers

Overfitting a logistic regression model

Is it possible to overfit a logistic regression model? I saw a video saying that if my area under the ROC curve is higher than 95%, then its very likely to be over fitted, but is it possible to overfit a logistic regression model?

logistic overfitting regression-strategies

asked Oct 04 '13 at 22:57

carlosedubarreto

votes

3 answers

Should final (production ready) model be trained on complete data or just on training set?

Suppose I trained several models on training set, choose best one using cross validation set and measured performance on test set. So now I have one final best model. Should I retrain it on my all available data or ship solution trained only on…

machine-learning validation regression-strategies

asked Nov 29 '15 at 11:40

Yurii

1,724
14
26

votes

1 answer

Appropriate residual degrees of freedom after dropping terms from a model

I am reflecting on the discussion around this question and particularly Frank Harrell's comment that the estimate for variance in a reduced model (ie one from which a number of explanatory variables have been tested and rejected) should use Ye's…

r regression model-selection regression-strategies

asked Feb 14 '12 at 00:05

Peter Ellis

16,522
1
44
82

votes

3 answers

Evaluating logistic regression and interpretation of Hosmer-Lemeshow Goodness of Fit

As we all know, there are 2 methods to evaluate the logistic regression model and they are testing very different things Predictive power: Get a statistic that measures how well you can predict the dependent variable based on the independent…

r logistic goodness-of-fit regression-strategies model-evaluation

asked Aug 31 '15 at 03:26

Samoth

votes

5 answers

When is quantile regression worse than OLS?

Apart from some unique circumstances where we absolutely must understand the conditional mean relationship, what are the situations where a researcher should pick OLS over Quantile Regression? I don't want the answer to be "if there is no use in…

least-squares econometrics regression-strategies quantile-regression semiparametric

asked Oct 09 '12 at 12:41

user14281

votes

2 answers

Bayesian thinking about overfitting

I've devoted much time to development of methods and software for validating predictive models in the traditional frequentist statistical domain. In putting more Bayesian ideas into practice and teaching I see some key differences to embrace. …

bayesian cross-validation predictive-models validation regression-strategies

asked Apr 29 '18 at 12:16

Frank Harrell

74,029
5
148
322

votes

2 answers

Does LASSO suffer from the same problems stepwise regression does?

Stepwise algorithmic variable-selection methods tend to select for models which bias more or less every estimate in regression models ($\beta$s and their SEs, p-values, F statistics, etc.), and are about as likely to exclude true predictors as…

regression feature-selection lasso regression-strategies stepwise-regression

asked May 31 '19 at 18:31

Alexis

26,219
5
78
131

votes

4 answers

How should I check the assumption of linearity to the logit for the continuous independent variables in logistic regression analysis?

I am confused with the assumption of linearity to the logit for continuous predictor variables in logistic regression analysis. Do we need to check for the linear relationship while screening for potential predictors using univariable logistic…

regression logistic assumptions splines regression-strategies

asked Aug 30 '15 at 05:01

Sze Lin Tan

votes

1 answer

What does it mean to make the sample size a random variable?

Frank Harrell has started a blog (Statistical Thinking). In his premier post, he lists some key features of his statistical philosophy. Among other items, it includes: Make the sample size a random variable when possible What does it mean…

sample-size random-variable regression-strategies

asked Jan 17 '17 at 03:45

gung - Reinstate Monica

132,789
81
357
650

votes

5 answers

Can I ignore coefficients for non-significant levels of factors in a linear model?

After seeking clarification about linear model coefficients over here I have a follow up question concerning non-signficant (high p value) for coefficients of factor levels. Example: If my linear model includes a factor with 10 levels, and only 3 of…

statistical-significance linear-model model-selection regression-coefficients regression-strategies

asked Mar 08 '12 at 03:12

Trees4theForest

votes

2 answers

Can we use categorical independent variable in discriminant analysis?

In discriminant analysis, the dependent variable is categorical, but can I use a categorical variable (e.g residential status: rural, urban) along with some other continuous variable as independent variable in linear discriminant analysis?

logistic categorical-data discriminant-analysis regression-strategies

asked Jun 26 '15 at 10:51

kuwoli

votes

3 answers

Model building and selection using Hosmer et al. 2013. Applied Logistic Regression in R

This is my first post on StackExchange, but I have been using it as a resource for quite a while, I will do my best to use the appropriate format and make the appropriate edits. Also, this is a multi-part question. I wasn't sure if I should split…

r logistic model-selection regression-strategies

asked Oct 07 '14 at 20:28

GNG

votes

4 answers

Why does propensity score matching work for causal inference?

Propensity score matching is used for make causal inferences in observational studies (see the Rosenbaum / Rubin paper). What's the simple intuition behind why it works? In other words, why if we make sure the probability of participating in the…

causality regression-strategies propensity-scores confounding

asked Apr 11 '16 at 23:28

max

1,254
1
12
29

2 3

…

19 20 Next