Here’s How Facebook Uses Artificial Intelligence for Photo Description

January 22, 2021 Off By Rowena Cletus

Facebook users find all kinds of content in their News Feed, from articles to friends’ comments to invitations for events and, of course, photos. Whether it’s their new grandchild, a boat on a river, or a grainy picture of a band on stage, most people are able to instantly see what’s in these images.

Those that are blind or visually impaired (BVI) can also experience images tagged with alt-text, provided these images are properly tagged. A screen reader can then describe the contents of these images using a synthetic voice, enabling BVI to understand images in their feed.

In 2016 Facebook introduced a new technology called automatic alternative text (AAT) that replaces images with alt text when they have no alt text. This enables blind or visually-impaired individuals to utilize their social media feeds more effectively using descriptions of photos.

The number of concepts AAT is able to reliably detect and identify in a picture has increased by more than 10x, which will result in fewer photos without a description. If you want to describe something with more detail, you can identify landmarks, activities, types of animals, and so forth.

Therefore, positional information can be extracted regarding relative size and position of elements in a photo. The software will describe the content of a photo by specifying that there are two people in the center and three scattered toward the edges, suggesting that those two are the focus.

Discover The World

The model includes concepts across gender, skin tone, and age axes, which makes the models more accurate and culturally and demographically inclusive — in this example, they identify weddings all over the world, rather than just photos with white wedding dresses.

The new technology made it possible to repurpose machine learning models to train on new tasks — a process known as transfer learning. This allowed us to generate models identifying topics such as national monuments, food types (like fried rice and French fries), and selfies. This entire process would have been impossible without this technology.

Over 1,200 concepts are reliably recognized in the improved AAT, a 10 times increase over the original AAT version launched in 2016. With consulted with screen reader users regarding AAT and how best it could be improved, and their feedback was that accuracy is paramount.

With the AAT Model, there are only those concepts that can be reliably identified with well-trained models that meet a certain high threshold of precision. While there is a margin for error, that is why every description for a possibility includes “May be.” Facebook, would like to provide as much information as possible to blind users or visually impaired users about a photo’s contents, delivered in an accurate way.


Accurate Information

Facebook designed the new AAT to provide a succinct description by default for all photos, with an easy button for more detailed descriptions about specific photos. Also included in detailed descriptions are basic positional information — top/middle/bottom, left/center/right — as well as relative prominences of objects, depicted as either “primary,” “secondary,” or “minor.”

The default description on AAT is rather simple, which allows users to read and understand it quickly — and it facilitates translation, so all the alt text descriptions are available in 45 different languages, ensuring that AAT can be of use to people everywhere.

Facebook For All
The number of pictures that are taken is continually increasing, thanks to recent smartphones, which have powerful cameras. Since its close to impossible to provide captions for each picture on Facebook or Instagram, therefore, with the help of AAT we can bridge the gap. Artificial intelligence promises to be a game changer for the disabled community.