I am trying to fit a Poisson regression in some soccer matches. I want to be able to predict matches of the first league for a new season, which means that there will be some new teams that have been promoted from last seasons second league. Hence I am creating a data frame with all matches from last year's league 1 and League 2, and then I am doing a Poisson regression. My problem is that there is a team whos coefficient cannot be estimated (NA). How can I overcome this?
My code:
prepare_data = function(dataframe) {
dataframe = dataframe[c("HomeTeam", "AwayTeam", "FTHG", "FTAG")]
dataframe.temp = dataframe[, c(2,1,4)]
names(dataframe.temp) = c("HomeTeam", "AwayTeam", "Goals")
dataframe = dataframe[c("HomeTeam", "AwayTeam", "FTHG")]
names(dataframe) = c("HomeTeam", "AwayTeam", "Goals")
dataframe = rbind(dataframe, dataframe.temp)
dataframe$Home<- rep(c(1,0), each = nrow(dataframe) / 2)
dataframe
}
# try to train model using two leagues
mydata3a <- read.csv("https://www.football-data.co.uk/mmz4281/1718/E0.csv", header = TRUE, stringsAsFactors = TRUE)
mydata3b <- read.csv("https://www.football-data.co.uk/mmz4281/1718/E1.csv", header = TRUE, stringsAsFactors = TRUE)
mydata3a = prepare_data(mydata3a)
mydata3b = prepare_data(mydata3b)
mydata3 = rbind(mydata3a, mydata3b)
model <- glm(Goals ~ Home + HomeTeam + AwayTeam, family=poisson, data=mydata3)
and my problem is
AwayTeamWolves
NA
I cannot predict matches of Wolves.