1

In a question I'm given

Construct a linear model and see how well the fat content can be estimated. That is, estimate the generalization error with a linear model. Optimize the number of principal components, i.e. determine how many and which components you need to get good generalization performance. It makes sense to report both an MSE and an RMS error. Do a residual analysis (this comment goes for all items).

The dataset I'm using includes 2 class variables as 'democratic' and 'republican'.

What I need to know is.. Can I use Logistic regression for this analysis? Can it be considered as a linear model or do I need to use the linear regression model?

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
Hiru
  • 155
  • 7
  • 2
    I feel like I'm missing something. For the fat content that you are predicting is it like a 0,1 variable? Like the democratic/republican variables don't seem to force a logistic regression since you aren't predicting that (or are you?). – Kitter Catter May 15 '19 at 03:16
  • Yes.. I'm going to use predict democratic and republic... Is Logistic regression suitable for this? I done have a proper idea on what the mean by fat content estimation. – Hiru May 15 '19 at 03:24
  • 2
    I mean if you are estimating 0,1 for fat content then I guess logistic regression makes the most sense. – Kitter Catter May 15 '19 at 03:28
  • Can you kindly explain what it means by fat content prediction? – Hiru May 15 '19 at 03:30
  • 1
    I meant estimating. sorry for the confusion. – Kitter Catter May 15 '19 at 03:33
  • Ok.. Thanks alot for the explanation :) – Hiru May 15 '19 at 03:38
  • 1
    This still seems very unclear to me. You are trying to predict fat content from Democratic and Republican? Please show us an example of your data. If the dataset is small, show all of it. (If this is `self-study` please use that tag.) – Nick Cox May 15 '19 at 07:54

1 Answers1

1

Go on with logistic regression (LR.) Logistic regression is a generalized linear model (glm) and is the (first, at least) way to go when your response variable is binary.

To read up on LR Why isn't Logistic Regression called Logistic Classification?, Intuition behind logistic regression.

EDIT

Answer to additional question in comment: Your binary response variable democrat/republican can be modeled as binomial, which has a nonconstant variance which is a function of the mean. So the constant variance assumption of usual linear regression is not fulfilled. Also, linear regression may fit probabilities larger than one or smaller than zero, which does not make sense. Still, this is sometimes used as an approximation under the name linear probability model. Don't do it without some very good reason.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467