Crime Incident reports data
Total number of instances in the data since 2015 was over 720k. While this data was obviously large, it was also not clean enough to use yet
The crime data had a lot of incident reports that were ambiguous in description. The most common offense description was “investigate person”. To clean the data, I started by manually going through the various offense codes in the website and made a Tier list of crimes that I will use for further analysis. The tier list I have created is as follows:
Tier 1 – Murder, Manslaughter, Rape
Tier 2 – Arson, Aggravated assault and Battery
Tier 3 – Non violent crimes like Larceny, Burglary ,Robbery and Breaking & Entering
Tier 4 – Vandalism and Vehicular accidents
Alongside these I created a separate tier for drug related crimes.
After this step I eventually had a dataset of 216k instances of incidents belonging to one of these 5 tiers.