protecting joke twitter
flagging abusive tweets using natural language processing
[grocery produce aisle]— Todd 'Papi' Carlos (@TheToddWilliams) March 19, 2016
ME: Hi, are these genetically modified carrots?
CLERK: No, why do you ask?
CARROT: Yeah, why do you ask?
Joke Twitter, a community of silly people who share silly jokes, is near and dear to my heart. But Joke Twitter has a big problem: it is on twitter. Twitter can can be a wonderful place, but is also filled with many hateful people who will attack and harass you if they don't like what you have to say. These trolls can be especially cruel to women.
Bess Kalb is a writer and funny person I love following on Twitter. She is attacked by Trump Trolls when she makes fun of Trump, by Bernie Bros when she makes fun of Bernie, and by men in general when she makes any joke that implies women also human beings.
If men got pregnant they'd sell birth control pills in tubs at Costco.— Bess Kalb (@bessbell) October 6, 2017
Bess will periodically suspend or lock her twitter account to seek refuge from these trolls. It's aways upsetting when comedians I follow are driven off Twitter. They shouldn't have to endure such harassment. Plus, I don't get to read their jokes anymore.
In order train a classifier to detect abusive tweets, I need a labelled collection of terrible tweets. Researchers at Cornell University have conducted similar work on this issue and published the paper Automated Hate Speech Detection and the Problem of Offensive Language. The authors were kind enough to share their data on GitHub.