Latitude and longitude missing data:
There are missing values in latitude and longitude data, 840 missing values to be exact. These values cannot be simply imputed with the average of all latitudes and longitudes because that would just be some place central in the US.
To deal with these missing values, I performed the following steps. First, I started by combining the city and state columns into one column in the format – “city, state”. This is done to avoid repetition when some cities in different states have the same name.
Then, I grouped the data by city. Any missing values in each group (meaning each individual city) were imputed with the mean value of latitude and longitude of that group. This would effectively be some average location within the city where the shooting occurred. This is a much better approximation for missing values since the city data is utilized to fill in the location data. However, some cities with just one shooting had missing location values. The total number of missing values after this process dropped from 840 to 300.