All,
I have a dataset that contains more than 45k rows and 2 columns. Below is dput()
of first 50 rows. One thing to consider is that category
column contains 34 factor of different levels. I am using Logistic regression model to predict the category
based on other two columns. Below is my model. However, I am getting warning in my model. I Googled the warning, and found that there might be linear relationship between my DV(Dependent Variable) and IV (Independent Variable). I am not sure how to deal with this warning. Some post suggested to perform Log transformation, but not sure How to perform in my model. Being a newbie to R if you could provide an explanation of how to deal with the warning that will be great.
> dput(droplevels(head(new_df1, 10)))
structure(list(category = structure(c(1L, 5L, 7L, 8L, 9L, 10L,
2L, 3L, 4L, 6L), .Label = c("", "baking", "canned", "crackers",
"DELI", "dessert", "MEAT", "NUTRITION", "PASTRY", "PRODUCE"), class = "factor"),
quantity = c(5L, 27L, 3L, 1L, 29L, 94L, 70L, 20L, 12L, 122L
), sales_value = c(11.6, 86.83, 13.46, 2, 52.4, 133.75, 160.15,
38.81, 29.91, 208.75)), row.names = c(NA, 10L), class = "data.frame")
> dput(droplevels(head(new_df1, 50)))
structure(list(category = structure(c(1L, 5L, 20L, 21L, 24L,
27L, 2L, 3L, 4L, 6L, 7L, 8L, 10L, 11L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 19L, 22L, 23L, 25L, 26L, 28L, 1L, 5L, 20L, 21L, 24L,
27L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L,
17L, 18L, 19L, 22L), .Label = c("", "baking", "canned", "crackers",
"DELI", "dessert", "drinks", "drug", "ethnic", "food", "food add-ons",
"frozen dessert", "frozen food", "frozen meat", "fruit", "health",
"household", "instant dinner", "meat", "MEAT", "NUTRITION", "other",
"packaged foods", "PASTRY", "personal care", "produce", "PRODUCE",
"seasonal"), class = "factor"), quantity = c(5L, 27L, 3L, 1L,
29L, 94L, 70L, 20L, 12L, 122L, 81L, 1L, 78L, 82L, 30L, 7L, 1L,
33L, 5L, 56L, 4L, 66L, 5L, 45L, 37L, 36L, 3L, 1L, 41L, 2L, 18L,
20L, 115L, 83L, 32L, 24L, 118L, 72L, 2L, 1L, 73L, 92L, 44L, 16L,
21L, 1L, 57L, 1L, 68L, 14L), sales_value = c(11.6, 86.83, 13.46,
2, 52.4, 133.75, 160.15, 38.81, 29.91, 208.75, 204.38, 3.99,
128.27, 193.84, 56.27, 11.75, 1.5, 41.59, 33.51, 140.42, 7, 170.11,
14.08, 84.93, 111.53, 33.62, 2.07, 2.99, 125.34, 4.45, 46.33,
42.91, 132.35, 181.04, 51.64, 59.91, 260.86, 189.15, 12.68, 1.09,
115.18, 210.44, 111.53, 31.4, 25.16, 2.29, 142.57, 2.5, 179.86,
59.28)), row.names = c(NA, 50L), class = "data.frame")
My model
fit_glm = glm(category~.,new_df1,family = 'binomial')
Warning:
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred