Apple, Nvidia and Anthropic used thousands of swiped YouTube videos to train AI

Apple, Nvidia and Anthropic used thousands of swiped YouTube videos to train AI

Tech companies are turning to controversial tactics to power their data-hungry artificial intelligence models, hoovering up books, websites, photos and social media posts, often without the creators’ knowledge.

AI companies typically remain secretive about their training data sources, but an investigation by Proof News found that some of the world’s richest AI companies used material from thousands of YouTube videos to train AI. The companies did this despite YouTube’s rules prohibiting harvesting content on the platform without permission.

Our investigation found that the captions of 173,536 YouTube videos, siphoned from more than 48,000 channels, were used by Silicon Valley heavyweights, including Anthropic, Nvidia, Apple and Salesforce.

The dataset, called YouTube Subtitles, contains video transcripts from online educational and learning channels such as Khan Academy, MIT and Harvard. The Wall Street Journal, NPR and BBC also used their videos to train AI, as did The Late Show with Stephen Colbert, Last Week Tonight with John OliverAnd Jimmy Kimmel live.

Proof News also found material from YouTube megastars including MrBeast (289 million subscribers, two videos taken for training), Marques Brownlee (19 million subscribers, seven videos taken), Jacksepticeye (nearly 31 million subscribers, 377 videos taken) and PewDiePie (111 million subscribers, 337 videos taken). Some hardware used to train AI also encouraged conspiracies such as the “flat Earth theory.”

Proof News created a tool to search for creators in the YouTube AI training dataset.

“No one came to me and said, ‘We’d like to use this,'” said David Pakman, host of The David Pakman Showa left-wing political channel with over 2 million subscribers and over 2 billion views. Nearly 160 of his videos made it into the YouTube caption training dataset.

Four people work full time at Pakman’s business, which posts several videos each day in addition to producing a podcast, TikTok videos and material for other platforms. If AI companies are paid, Pakman said, he should be compensated for the use of his data. He pointed out that some media companies have recently entered into agreements providing compensation for the use of their work aimed at training AI.

“This is my livelihood, and I have dedicated time, resources, money and personnel to create this content,” Pakman said. “There’s really no shortage of work. »

Ares launches new PE fund for Australian investors

Ares launches new PE fund for Australian investors

A small Texas village is about to annex a gigantic Bitcoin mine

A small Texas village is about to annex a gigantic Bitcoin mine

Leave a Reply

Your email address will not be published. Required fields are marked *