Using Social Media and Machine Learning to Spot Child Sexual Exploitation
Category Machine Learning Saturday - April 29 2023, 15:25 UTC - 10 months ago Social media researchers are creating a machine learning program that can detect unwanted sexual advances on Instagram. The program was trained on data from over 5 million direct messages, annotated by a group of young users who experienced conversations that felt unsafe or uncomfortable. The data is helping to increase protection for young users by identifying and cutting off risky interactions sooner.
Saturday - April 29 2023, 15:25 UTC - 10 months ago
Social media researchers are creating a machine learning program that can detect unwanted sexual advances on Instagram. The program was trained on data from over 5 million direct messages, annotated by a group of young users who experienced conversations that felt unsafe or uncomfortable. The data is helping to increase protection for young users by identifying and cutting off risky interactions sooner.
In a first-of-its-kind effort, social media researchers from Drexel University, Vanderbilt University, Georgia Institute of Technology and Boston University are turning to young social media users to help build a machine learning program that can spot unwanted sexual advances on Instagram. Trained on data from more than 5 million direct messages—annotated and contributed by 150 adolescents who had experienced conversations that made them feel sexually uncomfortable or unsafe—the technology can quickly and accurately flag risky DMs.
The project, which was recently published by the Association for Computing Machinery in its Proceedings of the ACM on Human-Computer Interaction, is intended to address concerns that an increase of teens using social media, particularly during the pandemic, is contributing to rising trends of child sexual exploitation.
"In the year 2020 alone, the National Center for Missing and Exploited Children received more than 21.7 million reports of child sexual exploitation—which was a 97% increase over the year prior. This is a very real and terrifying problem," said Afsaneh Razi, Ph.D., an assistant professor in Drexel's College of Computing & Informatics, who was a leader of the research.
Social media companies are rolling out new technology that can flag and remove sexually exploitative images and helps users to more quickly report these illegal posts. But advocates are calling for greater protection for young users that could identify and curtail these risky interactions sooner.
The group's efforts are part of a growing field of research looking at how machine learning and artificial intelligence be integrated into platforms to help keep young people safe on social media, while also ensuring their privacy. Its most recent project stands apart for its collection of a trove of private direct messages from young users, which the team used to train a machine learning-based program that is 89% accurate at detecting sexually unsafe conversations among teens on Instagram.
"Most of the research in this area uses public datasets which are not representative of real-word interactions that happen in private," Razi said. "Research has shown that machine learning models based on the perspectives of those who experienced the risks, such as cyberbullying, provide higher performance in terms of recall. So, it is important to include the experiences of victims when trying to detect the risks." .
Each of the 150 participants—who range in age from 13- to 21-years-old—had used Instagram for at least three months between the ages of 13 and 17, exchanged direct messages with at least 15 people during that time, and had at least two direct messages that made them or someone else feel uncomfortable or unsafe.
They contributed their Instagram data—more than 15,000 private conversations—through a secure online portal designed by the team. And were then asked to review their messages and label each conversation, as "safe" or "unsafe," according to how it made them feel.
"Collecting this dataset was very challenging due to sensitivity of the topic and because the data is being contributed by minors in some cases," Razi said. "Because of this, we drastically increased the precautions we took to preserve confidentiality and privacy of the participants and to ensure that their data was securely collected, stored and used." .