Thousands of companies use Amazon's Rekognition machine vision service to search for obscene images and videos uploaded by users
Amazon's controversial Rekognition technology is already used to remove penis images from food websites. At least this is one example of its use. At some point, London-based food delivery service Deliveroo ran into content moderation issues. In case of problems with food, Deliveroo customers send a photograph of the food along with a complaint. And often they do photobombing with their genitals. Or make obscene pictures of food. Yes seriously.
And so it turns out that Deliveroo employees do not always want to deal with such content. Therefore, the company uses Rekognition to recognize obscene photographs, and blurs or removes them before a person sees them.
Problem Deliveroo presents a somewhat strange facet of a progressively more complex problem. One way or another, many online companies rely on user-generated content. In recent years, we are increasingly faced with the penetration into this content of the dark side of human nature. Moderation of content has become a priority, as sites are increasingly faced with such unpleasant materials as fake news, violence, diphfeiks, bullying, aggressive rhetoric and other toxic content created by users. If you are Facebook, then to solve this problem you can develop your own AI or hire an army of moderators - or do both. However, smaller companies with few resources often do not have this capability. This is where Amazon’s content moderation service comes to their aid.
This service is part of the Rekognition computer vision services package provided by Amazon Web Services. He was
criticized a lot in the press for the fact that the company
agreed to provide
face recognition services for the US migration service. On the Rekognition website, you can find other examples of the use of the service for tracking - for example, the ability to recognize car numbers taken from different angles on a video, or to
track a person’s
path using camera records.
Perhaps in search of a more positive image of the computer vision service, Amazon first talked about using Rekognition to oversee user content in order to eliminate violence and indecency. The service allows you to recognize insecure or unpleasant content on images and videos uploaded to the site.
And this business is growing. “The role of user-generated content is growing explosively from year to year - today we already share 2-3 pictures every day on social networks with our friends and relatives,” Swami Shivasubramanyan, Amazon Vice President, Amazon, tells me. Shivasubramanyan says Amazon began offering content moderation services in response to customer requests back in 2017.
Companies may pay for Rekognition instead of hiring people to study downloadable images. Like other services with AWS, it works on a pay-per-use model, and its
cost depends on the number of images processed by the neural network.
Not surprisingly, dating services were among the first users of content management - they need to quickly process selfies uploaded to user profiles. Amazon says dating sites Coffee Meets Bagel and Shaadi use this service just for this purpose - like the Portuguese site Soul, which helps people create dating sites.
AI is not only looking for nudity. The neural network was trained to recognize any dubious content, including images of weapons or violence, or generally unpleasant images. Here is the classification menu from
the Rekognition site :
Explicit Nudity:
- naked body;
- graphic image of a naked male body;
- graphic image of a naked female body;
- sexual activity;
- Demonstration of nudity or sexual activity
- toys for adults.
Suspicious Content:
- Women's swimsuit or underwear;
- men's swimming trunks or underwear;
- partially naked body;
- candid clothes.
Content Demonstrating Violence:
- graphic representation of violence or blood;
- physical violence;
- gun violence;
- weapons;
- inflicting injuries on oneself.
Disturbing visual content:
- emaciated bodies;
- corpses;
- hanging.
How it works
Like everything on AWS, Rekognition runs in the cloud. The company can tell the service what kind of images it needs to find. Then it feeds the photos and videos received from users - which in many cases can be stored on AWS servers anyway.
The neural network processes images, searches for this content and notes any potentially unpleasant ones. The neural network produces metadata describing the contents of the images, along with a percentage of confidence in the labels issued. It looks something like this:
This data is already processed by the client-side program, which decides, based on the programmed business rules, what to do with the processed image. It can automatically delete it, skip, blur part of it, or send it to the moderator for review.
Deep image processing neural networks have many layers. Each of them evaluates the data representing various aspects of the images, performs the calculations and sends the result to the next layer. First, the network processes low-level information such as basic forms or the presence of a person in the image.
“Then she consistently refines the data more and more, the next layers become more specific, and so on,” Shivasubramanyan explains. Gradually, layer by layer, the neural network determines the content of images with ever-increasing certainty.
AWS Vice President of AI Matt Wood says his team is training in computer vision models on millions of private and publicly accessible images from multiple sets. He says that Amazon does not use images received from users for this purpose.
Frame by frame
Some of the largest Rekognition clients do not use this service to moderate user-generated content. Amazon says major media companies with huge digital video libraries want to know the contents of each frame from these videos. The Rekognition neural network can process every second of a video, describe it using metadata, and flag potentially dangerous images.
“One of the tasks that machine learning does well is to get into videos or images and provide additional context,” Wood tells me. “It can say that 'in this video, a woman walks along the shore of a lake with a dog,' or 'a partially dressed man is depicted.” In this mode, he says, the neural network is able to recognize dangerous, toxic or indecent content in images with high accuracy.
And yet this area of computer vision has not yet reached its maturity. Scientists are still discovering new ways to optimize neural network algorithms so that they can recognize images even more accurately and in more detail. “We have not yet reached a state of diminishing profits,” says Wood.
Shivasubramanyan told me that only last month the team working on computer vision reduced the number of false positives (when the image was mistakenly considered dangerous) by 68%, and the number of false negative by 36%. “We have the opportunity to improve the accuracy of these APIs,” he says.
In addition to accuracy, customers are asking for a more detailed classification of images. The AWS website says that the service provides only the main category and one subcategory of unsafe images. Therefore, for example, the system may show that the image contains nudity as the main category, and sexual acts as a subcategory. The third subcategory may contain a classification of the type of sexual activity.
“So far, the machine is prone to facts and works literally - it will tell you that 'this is shown there,'” says
Pietro Perona , a professor of computing and neural systems from Caltech, an adviser to AWS. - But scientists would like to go beyond this framework, and report not only what is depicted there, but also what these people think is happening. As a result, this area wants to develop in this direction - not just give out a list of what is shown in the picture. ”
And such subtle differences can be important for content moderation. Whether the image contains potentially offensive content or not may depend on the intentions of the people depicted there.
Even the very definitions of “unsafe” and “offensive” images are rather blurry. They may change over time and may vary by geographic region. And context is everything, Peron explains. Images of violence are a good example.
“Violence can be unacceptable in one context, such as real violence in Syria,” Perona says, “but acceptable in another, like a football match or a scene from a Tarantino movie.”
As with other AWS services, Amazon does not just sell content moderation tools to others: it is its own customer. The company says it uses this service to sort user-generated content into images and videos that are attached to store reviews.