3

I have a dataset of auto thefts that has the date, day, time the thefts occurred on. My independent variables would be day of the week, month, hour of the day, etc.

I want to see if auto thefts is dependent upon day of the week, month, and time of the day. I am not sure if I am framing the question right, but what should be my dependent variable? Can I do regression without a dependent variable?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
user203504
  • 33
  • 2
  • 1
    Whether or not an auto theft occurs is your dependent variable. Probably best to use logistic regression. – Michael R. Chernick Apr 08 '18 at 19:17
  • 1
    But all my rows of data are for the event that theft has occurred, so my dependent variable will always be 1. – user203504 Apr 08 '18 at 19:24
  • 1
    Then you have no information on how your independent variables affects auto thefts if you don't know their values when an auto theft doesn't occur! – Michael R. Chernick Apr 08 '18 at 19:33
  • 2
    You could model the seasonality in the number of thefts per unit time. – mkt Apr 08 '18 at 19:52
  • Logistic regression would be you. Dependent value is 1 of theft happened, 0 for none. – Stenga Apr 08 '18 at 19:20
  • Or you could try regressing against the number of thefts, which could be 0, 1, or some other integer value of summing the number of thefts by hour of the day, week, month, etc. if that data is available. To my knowledge fractional car thefts do not happen, so probably best to use inter integer sums. – James Phillips Apr 08 '18 at 19:34

2 Answers2

0

Your dependent variable is theft. How exactly that gets measured depends on exactly what data you have, but possibly logistic regression or maybe Poisson regression or some other count model.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
  • Thank you. When i do logistic regression i get the warning "glm.fit: algorithm did not converge", and it doesn't show any effect of independent variables on the theft. Because my dataset only has data where Theft=1 – user203504 Apr 08 '18 at 19:32
  • 1
    You need to add in the cases where it is 0. How exactly to do that depends on what data you have, what software you are using and so on. – Peter Flom Apr 08 '18 at 19:33
0

It sounds like you are try to establish if there is (and the frequency of) seasonality in your data. Regression is probably not the right choice for you. Other answers have suggested that your dependent variable is whether or not an auto-theft occurred but I don't think that applies in your case. 100% of your data are auto-thefts which means you would have no negative cases so logistic regression isn't going to help you. What you want to know is if the event of an auto-theft is more likely to reoccur at certain intervals and your data are time-series data so you should use time-series techniques such as auto-correlations and not plain old regression.

I suggest you read this answer on detecting seasonality.

If you really want to use regression then you could consider something like the time between thefts as a variable to estimate or the frequency of thefts within some resolution (i.e. number of thefts per day) and then maybe you could fit an exponential family GLM.

Dan
  • 1,288
  • 2
  • 12
  • 30