regression with non-independent data

Question

I will be performing regression on subjects total scores from 2 player games (prisoners dilemma) that they will be playing. I am aware that including both players score from a game will cause problems due to non-independence. Is there a way that to deal with this apart from randomly picking one subject from each game for the analysis (and so losing half the data). Is there a way to introduce this into the model instead, perhaps as a random effect?

score 2 · Answer 1 · answered Feb 15 '13 at 19:44

2

I assume what you have in mind is score as the response and then some player attributes as the predictors. E.g find out if blonds score higher.

Why not perform the regression with game as your sample unit. A game of N points must distribute those points between A and B so you can just take player A score for each game as a binomial response and then include both players attributes as predictors.

answered Feb 15 '13 at 19:44

Corvus

4,573
1
27
58

yes that is right. I'm not quite sure what you mean by take the score as a binomial response. If it is Binomial would it not have to be 0 or 1 rather than the score? – Jonathan Bone Apr 15 '13 at 16:24
No, bernoulli distribution is 0 or 1. Binomial distribution is a number of successes in N trials, so if the game is played for 50 points, and the result is 30-20 to player A, then this could be modelled as a binomial with 50 trials and 30 successes. – Corvus Apr 17 '13 at 08:43
I did think about this but as in each round you can get +2 ,1, 0 or -3. I did not think it would work? Each player plays 2 games each against a different subject. Would it make sense to model it as a lmm with Subject and Pairing as random intercepts such that each pairing contains 2 subjects and each Subject is in 2 different pairings with different subjects. This would mean there is 80 pairings and 80 subjects? – Jonathan Bone Apr 19 '13 at 16:21
Prisoner's dilemma is not zero-sum, i.e., it does not distribute a fixed number of points. – Juho Kokkala Apr 24 '13 at 15:42

score 1 · Answer 2 · answered Feb 15 '13 at 21:27

You have a system of simultaneous equations to deal with, which should have been talked about in your econometrics class (you are an economist, right?) You will be estimating the system using 2SLS or 3SLS methods, provided that you have decent exogeneous variables that affect only one of the outcomes, i.e., demographics such as the color of their hair, per Corone's suggestion. You would need to impose symmetry restrictions, so that both equations have the same coefficients.

You can also try approaching this as the analysis of dyadic data problem, where the dyads are, of course, the pairs of interacting players. The existing literature on dyadic data tends to come from psychologists who do not care about endogeneity the way economists do, so you may need to take their suggestions with a grain of salt. Modeling dyadic data in a multilevel way, i.e., with a random effect, is a popular approach. If you have say 15 people, and each person played with say 6 other people, then you have additional problems with lack of independence across your data set, and multilevel/random effect model seems even more appropriate.

sorry I have taken along time to get back to you on this. I am already using a random effect model with subject as the random effect. What I am concerned about is that since in the prisoners dillema ones payoff depends on the others payoff is there not some non-independence issue there? apologies if this is what you are explaining. — Jonathan Bone, Apr 15 '13 at 16:31

score 1 · Answer 3 · answered Apr 24 '13 at 15:41

Model the actions of the players instead of the payoffs. That is, predict the probability that a player selects to cooperate at a particular round as a function of previous rounds (if the game is repeated in your setting) and your covariates. I think this makes more causal sense, as the players actually select the actions influenced by whatever, and the payoffs are just a deterministic function of the actions. Furthermore, this makes the output variables binary, which simplifies the analysis, as you do not have to think about the potentially difficult dependence between total payoffs.

I guess it is also probably fine to treat the strategies selected by each player as conditionally independent given the covariates&history, which makes the analysis just simple prediction of a binary variable. On the other hand, one could argue that unobserved variables might lead to dependence.

Angel Sanchéz has applied logistic regression to modeling the probability of cooperating in Prisoner's dilemma. Their setting is probably somewhat different as it involves multiple players in a network, but you should still take a look to see if their approach can be modified to your setting.

regression with non-independent data

3 Answers3