In August of 2020, following the repeal of 50-a (a law which kept the citizen complaints private), the NYPD released a dataset containing records of over 300,000 misconduct allegations spanning back to 1984.
Since its release there’s been a bit of reporting on it but I haven’t seen many deep dives into the data and was curious to look for myself. But before I get into the exploration I’d like to note that this dataset represents a record of citizens’ complaints against officers of the NYPD, many of which were not substantiated by the NYPD for a variety of reasons. Lots of the press surrounding the data’s release has focused specifically on the numbers- the huge number of complaints and the small number of corrective actions to be specific. While the low percentage of actions taken in proportion to the number of complaints does seem significant, I’m hoping the data has a bit more of a story to tell.
I didn’t set out to explore this data with any political agenda, though I did start with a few questions:
- What are the most common types of complaints?
- Of the different complaints, are some more likely to be substantiated or lead to corrective actions?
- How are the penalties distributed over different ranks within the force?
- Is there any relationship between the number of penalties in a given time period and the number of complaints?
- Which jurisdictions have the most and least complaints?
After a look at the data, I noticed there were some extremely vague classifications and dozens of different subcategories of complaint, many of them extremely similar. I started by binning the subcategories into new groups. Some are a bit difficult to interpret, but I did my best to group similar things together.
df['alleg'] = df.Allegation.replace(['Physical force', 'Push/Shove', 'Punch/Kick', 'Dragged/Pulled', 'Chokehold', 'Nightstick as club (incl asp & baton)', 'Beat', 'Other - Force', 'Nightstick/Billy/Club', 'Pepper spray', 'Hit against inanimate object', 'Mace', 'Radio As Club', 'Slap', 'Other blunt instrument as a club', 'Gun fired', 'Gun as club', 'Flashlight as club', 'Restricted Breathing', 'Gun Fired', 'Gun As Club', 'Flashlight As Club', 'Radio as club'], 'Force')df['alleg'] = df.alleg.replace(['Word', 'Curse', 'Nasty Words', 'Demeanor/tone', 'Discourtesy', 'Rude Gesture', 'Gesture', 'Other- Discourtesy', 'Profane Gesture'], 'Verbal')df['alleg'] = df.alleg.replace(['Black','Race', 'Ethnicity', 'Other - Ethnic Slur', 'Hispanic', 'Oriental', 'Other Asian', 'Ethnic Slur', 'Questioned immigration status'],'Racism')df['alleg'] = df.alleg.replace(['Premises entered and/or searched', 'Vehicle search', 'Strip-searched', 'Person Searched', 'Search (of person)', 'Vehicle stop', 'Frisk and/or search', 'Stop', 'Frisk', 'Premise Searched', 'Entry of Premises', 'Vehicle Searched', 'Search of Premises', 'Refusal to show search warrant', 'Body Cavity Searches'], 'Illegal Search')df['alleg'] = df.alleg.replace(['Threat of force', 'Threat of summons', 'Threat of force (verbal or physical)', 'Threat of arrest', 'Threat of Arrest', 'Threat to damage/seize property', 'Gun Drawn', 'Gun Pointed', 'Threat of Summons', 'Threat to Property', 'Threat re: removal to hospital', 'Gun pointed/gun drawn', 'Threat re: immigration status', 'Gun pointed', 'Threat to notify ACS'], 'Threat')df['alleg'] = df.alleg.replace(['Gender', 'Gay/Lesbian Slur', 'Sexist Remark', 'Sexual orientation', 'Gender Identity'], 'Sexism/Homophobia')df['alleg'] = df.alleg.replace(['Jewish', 'Religion'], 'Religious Slur')df['alleg'] = df.alleg.replace(['Refusal to obtain medical treatment', 'Forcible Removal to Hospital', 'Improper dissemination of medical info'], 'Medical Treatment/Info(Refused or Unwanted')df['alleg'] = df.alleg.replace(['Police shield', 'Refusal to provide name/shield number', 'Refusal to provide name', 'Refusal to provide shield number', 'Failure to provide RTKA card', 'Refusal to show arrest warrant'], 'Refused to identify self or show warrant')df['alleg'] = df.alleg.replace(['Sexual Misconduct (Sexual Humiliation)', 'Sex Miscon (Sexual/Romantic Proposition)', 'Sex Miscon (Sexual Harassment, Verbal)', 'Sex Miscon (Sexually Motivated Frisk)', 'Sex Miscon (Sexually Motiv Strip-Search)', 'Sex Miscon (Sexual Harassment, Gesture)'], 'Sexual Harassment/Abuse')df['alleg'] = df.alleg.replace(['Electronic device information deletion', 'Photography/Videography', 'Search of recording device', 'Interference with recording'], 'Search, Seizure or Destruction of Recording Device')df['alleg'] = df.alleg.replace(['Question and/or stop', 'Retaliatory summons', 'Detention', 'Handcuffs too tight', 'Arrest/Onlooker', 'Arrest/D. A. T.', 'Retaliatory arrest', 'Question', 'Nonlethal restraining device'], 'Improper Stop or Arrest')df['alleg'] = df.alleg.replace(['Property Seized', 'Seizure of property', 'Property damage', 'Property damaged', 'Property Damaged'], 'Property Seizure/Destruction')df['alleg'] = df.alleg.replace(['Failed to Obtain Language Interpretation', 'Animal', 'Physical disability', 'Vehicle', 'Sh Refuse Cmp', 'Other - Abuse', 'Action', 'White'], 'Other')
After binning everything, I was able to better see the distribution:
Illegal Search 50336
Improper Stop or Arrest 11604
Refused to identify self or show warrant 10040
Abuse of Authority 5598
Property Seizure/Destruction 5290
Medical Treatment/Info(Refused or Unwanted 2609
Refusal to process civilian complaint 991
Search, Seizure or Destruction of Recording Device 464
Religious Slur 338
Sexual Harassment/Abuse 124
Next, I made a dummy column for values in the PenaltyDesc column- most of the column was null values, and after removing the ‘No penalty’ instances there were a little fewer than 9000 remaining.
In the block below, the 0 column represents a null value or ‘No penalty,’ and a 1 for any penalty. The vast majority of the penalties themselves were loss of vacation days or instruction. There were 32 terminations as a result of complaints between 1984 and 2020 and a handful more resignations and retirements, as well as a good number of suspensions or combinations of suspension and lost vacation days.
penalty_dummies 0 1
Abuse of Authority 5598 0
Force 95733 727
Illegal Search 46585 3751
Improper Stop or Arrest 10711 893
Medical Treatment/Info(Refused or Unwanted 2503 106
Other 10011 454
Property Seizure/Destruction 5166 124
Racism 8672 82
Refusal to process civilian complaint 854 137
Refused to identify self or show warrant 9370 670
Religious Slur 330 8
Search, Seizure or Destruction of Recording Dev 428 36 Sexism/Homophobia 1504 47
Sexual Harassment/Abuse 104 20
Threat 33270 570
Verbal 50690 1074
I broke down penalties by rank next. I dropped the ranks with the smallest number of complaints and no penalties at all, within the remaining ranks most are 2 or 3 percent. The few that are higher have such small sample sizes that it may be a result of them only existing in a time when penalties were more likely, or they could just be outliers because of the small sample size.
0 1 penalty_percentage
AC 358 11 3
CCC 22 3 12
CD 37 4 10
CPT 4351 139 3
DC 943 13 1
DI 2232 48 2
DT1 7441 133 2
DT2 1573 288 2
DT3 50670 1671 3
DTS 9548 297 3
INS 1629 29 2
LCD 3001 84 3
LSA 1559 27 2
LT 21298 598 3
POF 17693 382 2
POM 119810 3062 2
PSA 249 6 2
SDS 6215 206 3
SGT 49040 1592 3
SSA 3132 106 3
The largest numbers are from male Police Officers (POM), Lieutenants and Detective 3rd Grade, but again, there isn’t a lot of variation even in the higher ranks.
I’d like to continue exploring this dataset and eventually look at some time series modeling and see if there are any trends in terms of board dispositions, penalties and overall number of complaints. This dataset has a lot of interesting features and I think it’s well worth exploring, plus I still have lots of unanswered questions.
Over the past several months, policing has been a major discussion point across ideologies. The premise and purpose of it has been disputed, and while I think the data speaks best for itself here, I would like to pose one question: if the purpose of policing is to deter crime, then why would minimal penalization deter misconduct?