Wikipedia AI Software Looks to Banish Trolls


Wikipedia’s new AI software, available as a web service, uses machine learning in an attempt to let quality edits pass but keep trolls and vandals out.

Wikipedia is edited 500,000 times per day by an army of volunteers, but that opens tremendous opportunities for literary vandals, spammers, and trolls to derail articles and edits.

Enter new AI software built on machine learning: the Objective Revision Evaluation Service to assist with editorial quality control. ORES, which is open source and available through application programming interfaces, is a web service that uses machine learning to train against article-quality assessments made by Wikipedians. The system also uses predictive analytics to generate automated scores for every single edit and article.

> Related: The “Donald Trump” problem with text analytics

According to a Nov. 30 blog post from Wikipedia, ORES can retrieve a score in as little as 50 milliseconds. Users can select one of three edit quality models: damaging; goodfaith; or reverted (whether an edit will be reverted to a previous state); and one article quality model–WP10 (which makes predictions about article quality).

However, “due to limitations in the field of natural language processing, sarcasm and other types of cleverness in vandalism are likely to fool the model,” Wikipedia states.

The website is next working on supporting more wikis; categorizing edits by the  type of work performed; and bias detection, as subjective predictions from ORES can perpetuate bias, Wikipedia stated.

Revision scores are released under Creative Commons Zero, and Wikipedia has made all facets of the tool (source code, data, performance statistics, and project documentation) available under open source licenses.

> Free white paper: Frontiers in artificial intelligence for the Internet of Things

Want more? Check out our most-read content:

What’s Behind the Attraction to Apache Spark
The Value of Bringing Analytics to the Edge
Preventing Downtime With Predictive Analytics
IoT Hacking: Three Ways Data and Devices Are Vulnerable

Liked this article? Share it with your colleagues!

Chris Raphael

About Chris Raphael

Chris Raphael (full bio) covers fast data technologies and business use cases for real-time analytics. Follow him on Twitter at raphaelc44.

Leave a Reply

Your email address will not be published. Required fields are marked *