0

My project is focused on Crime Prevention Through Environmental Design, AKA CPTED. CPTED is a theory that certain characteristics of the built environment, such as level of maintenance, level of visibility, and personal touches, can help prevent crime.

I want to see if areas (neighborhood blocks) with more CPTED characteristics also have a lower crime count. My hypothesis is that the areas with higher CPTED rating will have a lower crime count, and areas with a lower CPTED rating will have a higher crime count.

The data collection has already been done. For my research, I made observations of houses of 6 neighborhoods, and rated all of the houses based on CPTED characteristics. The rating system is from 0 (worst) to 4 (best). Higher CPTED ratings are meant to be “better,” meaning these houses have a lower risk of being targeted for property crime.

Once all of the houses in each of the 6 neighborhoods were rated and recorded in a spreadsheet, I assigned each house to a neighborhood block, and the blocks were given the combined average CPTED score.

I then obtained data of crime incidents from the local police department and assigned the crime incidents to the blocks, because most crime reports nowadays only have block-level accuracy, instead of giving specific addresses. This is the reason that the individual house lots had to be aggregated into larger neighborhood blocks.

To recap, I have:

A spreadsheet which has a column for Average CPTED Rating (the X variable), and a column for Crimes Per 1,000 Houses (The Y variable). A screenshot of this spreadsheet is here:

CPTED Rating/Crime Count

I already ran a regression analysis on this data, and the result shows that as the CPTED rating increases, the crime count decreases. However, this seems to be a somewhat weak inverse relationship, with an R-Squared value of .18. But I now am questioning whether or not a regression analysis is appropriate for this type of study.

Could I perform some sort of non-parametric test? I am still a bit of a novice when it comes to these types of statistics. Are there any non-parametric tests that I could use on this data to determine the relationship between CPTED rating and crime count? I have been encouraged to use a non-parametric test but I am not sure which one, if any, would be appropriate. I have had one statistics “expert” in a forum recommend using the Spearman’s Rho test, and that seemed like a good idea until another “expert” told him that was incorrect.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Creggj86
  • 1
  • 1
  • 1
    I can barely read your graph but it seems evident that a linear regression can't match your outcome variable, which is necessarily zero or positive. I'd recommend Poisson regression considering the kind of argument outlined e.g. in https://blog.stata.com/2011/08/22/use-poisson-rather-than-regress-tell-a-friend/ Which forum you're referring to and indeed how far people advising you were "experts" isn't be clear, but I see no reason why Spearman correlation isn't a relevant descriptor here: however, it can't take you further towards a good analysis than the scatter plot can. – Nick Cox Jun 22 '18 at 18:03

1 Answers1

1

You have count data, and should probably use Poisson regression (possibly with correction for overdispersion, a so-called quasi-Poisson model). This is discussed in many posts here. A thorough comparison with the usual linear regression that you have used is: Goodness of fit and which model to choose linear regression or Poisson

You really want Poisson rate regression (search this site). I see that you have a variable "crime per 1000 houses". You really need the number of crimes (Y-variable) and number of houses separately, then you would use $\log(\text{number of houses})$ as an offset. Discussion can be found here: Scaling vs Offsetting in Quasi-Poisson GLM

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467