2

How do you solve the following problem?

A Simulation Study (Probit Regression).

Assume $y|x\sim {\rm Binary}(p)$, where $p= E(y|x)$, and $Φ^{-1}(\pi)=-1+5.1x_{1i}-0.3x_{2i}$ Generate data with $x_{1i}\sim{\rm Unif}(0,1)$, $x_{2i}=1$ for $i$ odd and $x_{2i}=0$ for $i$ even, and sample size $n=500$. Try generalized linear model (GLM) with logistic and probit links.

Here is what I did, I know there is a problem, but I don't know what:

n         <- 500
beta0     <- -1
beta1     <- 5.1
beta2     <- -0.3
x1        <- runif(n=n, min=0, max=1)
x2        <- (1:n)%%2
y         <- pnorm(beta0 + beta1*x1 + beta2*x2)
prob.glm  <- glm(y~x1+x2, family=binomial(link=probit))
logit.glm <- glm(y~x1+x2, family=binomial)

I know y is a probability here, but how do you simulate a binary variable from the probability?

HelenZ
  • 45
  • 1
  • 3
  • 4
    You might want to investigate the `rbinom` function very carefully. It may take some thought. – Glen_b Jan 31 '14 at 07:08
  • 3
    in exactly the same way as in the answer to your earlier question: http://stats.stackexchange.com/questions/83682/logit-link-notation – Maarten Buis Jan 31 '14 at 10:58

1 Answers1

5

Your title aside, it doesn't seem like you need to know how to simulate with a probit link. You've already done it (it's the pnorm() in your code). What you seem to need to know is how to get the manifest dichotomous response variable from your generated latent probabilities. The link function (e.g., logit or probit) has to do with how you get those probabilities. For more on this topic, it may help you to read my answer here: Difference between logit and probit models.

For what it's worth, I would not use y to store the probabilities, since you want to use y for your response variable in the fitted models on the next two lines. I would probably use p, which also seems consistent with the problem statement. Then I would convert my p's to the needed y's.


Since this is a homework problem, I'll simply give you some hints:

  • First, assume the probability were exactly $.5$, and you wanted to generate a set of outcomes from a random process with that probability, what could you do? You could flip a fair coin the requisite number of times. Flipping a coin is a physical simulation of a Bernoulli trial. Now what is the R analog of flipping a coin?*
  • If $p \ne .5$, how could you generate a set of successes and failures? There might be several options (e.g., if the probability is just right, perhaps flipping a coin a certain number of times and seeing if you got all heads, or getting a fair die with the right number of sides), but perhaps the most general would be to get a spinner and mark off the proportion of the total that corresponds to $p$. Then you would give it a whirl, and see which side of the line the pointer rests on. Now what is the R analog of spinning a needle through a continuous range from $(0,\ 1)$?*

*These are meant to be evocative of certain R commands that will do what you need, but in fact, both functions could be used to match either of these scenarios.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • Thank you for the thorough answer. I am very new either in statistics or R.Then how to convert p's to the needed y's using R? – HelenZ Jan 31 '14 at 05:14
  • 4
    Our policy is not to just give people answers to homework problems. I would be upset if my students were simply getting their answers off the internet. These are hints that are intended to help you get past the point where you are stuck. Think about how you would do this physically & what that amounts to logically. Then you just need either of the two R functions to do that. They are not that hard to find; in fact, you are already using one of them. – gung - Reinstate Monica Jan 31 '14 at 05:19