I try to modelize princing datas where the price depends on 3 parameters : the profession and the city of the user.
The model is very simple : Price = $avgPrice_{profession}\cdot\beta_{city} $ : for each profession, we have a average price, corrected by a coefficient for each city.
With R, I used lm
in the following way : lm(Price ~ factor(Profession):factor(City),data)
. But R change the factors in dummies variables, and create all interaction combinaisons.
Example : let say we have 4 cities (NYC, Boston, Chicago, Miami) and 3 professions (Doctor, Lawyer, Driver). R try to solve all the interactions : factor(city)NYC:factor(profession)Doctor
, factor(city)NYC:factor(profession)Lawyer
, factor(city)NYC:factor(profession)Driver
, factor(city)Boston:factor(profession)Doctor
, factor(city)Boston:factor(profession)Lawyer
, etc.
Instead, I would like R to find the following coefficients : factor(city)NYC
, factor(city)Boston
, factor(city)Chicago
, factor(city)Miami
and factor(profession)Doctor
, factor(profession)Lawyer
, factor(profession)Driver
Is it possible and if so, how should I configure my formula and lm parameters ?
Train data :
train_data = structure(list(Profession = c("Doctor", "Lawyer", "Driver",
"Doctor", "Doctor", "Doctor"), City = c("Miami ", "Miami ", "Miami ", "Boston",
"Chicago", "NYC"), Tarif = c(25.48, 29.99, 33.23, 25.49, 24.24,
28.08)), .Names = c("Profession", "City", "Tarif"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -6L))
Test data :
test_data = structure(list(Profession = c("Doctor", "Lawyer", "Driver", "Doctor",
"Lawyer", "Driver", "Doctor", "Lawyer", "Driver", "Doctor", "Lawyer",
"Driver"), City = c("Miami ", "Miami ", "Miami ", "Boston", "Boston",
"Boston", "Chicago", "Chicago", "Chicago", "NYC", "NYC", "NYC"
), Tarif = c(25.48, 29.99, 33.23, 25.49, 30, 33.23, 24.24, 28.53,
31.61, 28.08, 33.13, 36.77)), .Names = c("Profession", "City",
"Tarif"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-12L))