4

I've just run a linear regression on an entire data set, but now I need to run the regression with data just from females within the data.

Females are denoted under the female column of the data set by a 1. Males are denoted by a 0 under the same column. I don't know how to remove the male data so I can run the regression on female data only.

zakrapovic
  • 103
  • 4
user41218
  • 53
  • 1
  • 1
  • 3
  • 1
    This looks like routine bookwork. Is this for some subject? If you do need (for some non-obvious reason) to actually run a regression on a subset (see Andre's comment for why in general it's not necessary), the following approach is pretty typical. If `mydata` is a data frame containing `y,x1,x2` and `sex`, then instead of `lm(y~x1+x2,data=mydata)` you replace the `data=` argument with one that appropriately subsets the rows of `mydata` by the relevant value of `sex`. – Glen_b Mar 03 '14 at 00:55
  • I don't really need to remove the male data, I just need to run the regression on only females. I have a "female" column in which females are denoted 1, and males are denoted 0. My current code looks like this: lm(y~x,data=mydata) I just don't know what to add to that to make it only run for rows that have female =1 – user41218 Mar 03 '14 at 01:07
  • I thought I'd just go ahead and answer, because this is quite easy. However, the others are right; simple code requests belong on Stack Overflow, and in this case, you could probably Google it very quickly (that's why I'm not flagging to migrate – it's a little *too* simple – though I don't mind if it gets migrated). In fact, [the first Google hit for "code for excluding conditions in r"](http://www.statmethods.net/management/subset.html) contains my answer, and several alternatives (hence I've edited it in)! – Nick Stauner Mar 03 '14 at 01:27
  • 5
    This question appears to be off-topic because it is about how to use r. – gung - Reinstate Monica Mar 03 '14 at 02:08
  • 1
    Your question is then actually how do I *subset a data frame in R*? which can be very helpful if you google it. Using `mydata[as.logical(mydata$female),]` as your data set in `lm` is one of half a dozen obvious ways (`mydata[mydata$female==1,]` is another; as is using `subset`) – Glen_b Mar 03 '14 at 07:37

1 Answers1

6

lm(y~x,data=subset(mydata,female==1)). subset() allows you to set a variety of conditions for retaining observations in the object nested within, such as >, !=, and ==. The last of these excludes all observations for which the value is not exactly what follows. != would do the opposite.

For a variety of other alternatives, see Quick-R on subsetting data. Some of these may better serve different or more advanced coding goals.

Nick Stauner
  • 11,558
  • 5
  • 47
  • 105