Is your Instagram training AI? Meta faces scrutiny over photo use

Meta explores purchasing the publisher Simon & Schuster to obtain additional data to train its models

Kate Mills - May 11, 2024

An undated image of Instagram and Facebook logos. — Freepik

Big tech giants are competing for artificial intelligence (AI) training data, and Meta appears to have one significant advantage over its competitors: it uses Instagram and Facebook photographs.

Chris Cox, Meta's chief product officer, stated at Tech Summit on Thursday that the company trained its text-to-image generating model, Emu, using publicly available photographs and language from the platforms.

"We don't train on private stuff, we don't train on stuff that people share with their friends, we do train on things that are public," he explained.

Meta's text-to-image approach can create "really amazing quality images" since Instagram has numerous photos of "art, fashion, culture, and also just images of people and us," according to Cox.

According to Meta AI's website, users may make photos by inputting a request beginning with the word "imagine," which will generate four images.

Moreover, AI models must be fed and educated on data to function effectively. It's been a difficult subject because there's no way to keep copyrighted content from being scraped off the internet and used to generate an LLM.

However, the US Copyright Office has been seeking to address this issue since early last year and is considering amending its regulations to reflect it.

In addition, companies are attempting to collect data by collaborating with other organisations. OpenAI, for example, has collaborated with other media sources to licence and develop their models.

Meta explored purchasing the publisher Simon & Schuster to obtain additional data to train its models. In addition, to raw data sets, organisations train their models using "feedback loops" — data acquired from previous interactions and outputs that are analysed to enhance future performance. It comprises techniques for informing AI models of errors so that they can learn from them.

Last month, Meta CEO Mark Zuckerberg said that feedback loops will be "more valuable" than any "upfront corpus."