I did some exploratory analysis on the WashingtonPost Fatal police shootings data. There are many missing values in features like “armed”, “age”, “race” and other seemingly important features. This discrepancy needs to be dealt with either by finding data elsehwere on the internet or by imputing data in place of the missing data by using the other features and the feature statistics.
My initial plan for this project is to use the features available and build a classification model (for example: logistic regression) to predict the race of the shooting victim by using the remaining features. The age distributions of the shootings vary quite a bit. It is seen that the age distribution of black American shooting victims has its peak at a much younger age than that of other races. This indicates that younger black Americans are shot more than other races.
Some additional data like the area’s demographics might be useful for my analysis plan.