Facebook recently released statistics that revolves around abusive behaviour on its social media platform, deleting more than 22 million posts for violating its rules against pornography and hate speech.
Many of these posts were detected by automated systems monitoring users' activity, in line with CEO Mark Zuckerberg's statement to US Congress that his company would use artificial intelligence to identify social media posts that might violate the company's policies.
The task of detecting abusive posts and comments on social media is not entirely technological. Even Facebook's human moderators have trouble to define what is classified as hate speech, inconsistently applying the company's guidelines and even reversing their decisions (especially when they made headlines).
There are even more complications that can come to the foreground, for instance, what if attackers try to use the machine learning system against itself – tainting the data and the algorithms to learn from to influence the result. There is a phenomenon called "Google Bombing", in which people create websites and construct sequences of web links in an effort that will affect the results of Google's search algorithms. A similar "data poisoning" attack could limit Facebook's effort to identify hate speech.
How to trick these machines into learning?
Machine learning, which is a form of artificial intelligence, has proven to be very useful in detecting many kinds of fraud and abuse, including email spam, phishing spam, credit card fraud and fake product reviews.
It works best when there are large amounts of data in which to identify patterns that can reliably separate normal, benign behaviour from malicious activity. For example, if people use their email systems to report as spam large numbers of messages that contain the words "urgent", “investment” and “payment,” then a machine learning algorithm will be more likely to label as spam future messages including those words.
Detecting abusive posts and comments on social media is a similar problem: An algorithm would look for text patterns that are correlated with abusive or nonabusive behaviour. This is much faster than reading each comment, more flexible than simply performing keyword searches for slurs, and more proactive than waiting for complaints.
In addition to the text itself, there are often clues from context, including the user who posted the content and their other actions on the platforms. A verified Twitter account with a million followers would likely be treated differently than a newly created account with no followers.
But as those algorithms are developed, abusers adapt, changing their patterns of behaviour to avoid detection. Since the dawn of letter substitution in email spam, every new medium has spawned its own version: People buy Twitter followers, favourable Amazon reviews, as well as Facebook Likes, all to fool algorithms and other humans into thinking they're more reputable.
As a result, a big piece of detecting abuse involves creating a stable definition of what is a problem, even as the actual text expressing the abuse changes. This presents an opportunity for artificial intelligence to effectively enter an arms race against itself. If an AI system is possible of detecting what a possible attacker might do, it could be adapted to simulate performing that behaviour.
Another AI system could analyse those actions, learning to detect abusers' efforts to sneak hate speech past the automated filters. Once both the attackers and defenders can be simulated, game theory can identify their best strategies in this competition.
Abusers don't just have to change their own behaviour, by substituting different characters for letters or using words or symbols in coded ways, they can also change the machine learning system itself.
Due to the algorithms that are trained on data that are generated by humans, if enough people change their behaviour in particular ways, the system will learn a different lesson than its creators intended. In 2016, for instance, Microsoft unveiled "Tay", a Twitter bot that was supposed to engage a meaningful conversation with other Twitter users. Instead, trolls fooled the bot with hateful and abusive messages. As the bot analysed that text, it began to reply in kind and was quickly shut down.
However, no machine learning system will ever be perfect. Like humans, computers should be used as part of a larger effort to fight abuse. Even email spam, a major success for machine learning, relies on more than just a good algorithm.