1

I have two questions I hope you could help me with.

I am doing a stepwise logistic regression.

  1. I have a variable that includes information other variables include already. For example "price_missing" ($1$ means price missing) and "price" ($0$ means price). Would it be a normal process to drop these variables before doing a regression? It seems to make the model worse.
  2. Some of the data is skewed. Would it make sense to do a transformation of the variable before doing a stepwise logit regression?
gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Senf
  • 11
  • 1

1 Answers1

1

Agreed with @gung on the stepwise LR. Here is my personal thought:

  1. Difficult to answer based on the information you provided. From the statistical point of view, you may consider the collinearity problem. But In the building phase of the model, we should not consider only the statistical/scientific point of view, the independent variables should be chosen also based on experience and based on the meaning of the variable in the reality. Consequently, based on the same strategy of selecting variables, two different people with different backgrounds may end up choosing different models. It is rarely evident to say which one is better.

  2. Before transforming the variable, lets first look at the relationship between the independent variable you want to transform, and the dependent variable, the goal is to see what kind of relationship exists between them. Base on that, you may better understand if it's necessary to transform the variable or not. If yes, it helps you to choose the transforming function as well.

Metariat
  • 2,376
  • 4
  • 21
  • 41