20 Million Songs Exposed in AI Training Data

20 Million Songs Exposed in AI Training Data

@giacomo.mov ·

Four days ago, something seismic happened and it barely made a ripple outside music-industry circles. The Atlantic published four searchable databases containing the songs used to train AI music models. Not a rumor. Not a leak. A searchable, public tool where any artist on Earth can type their name and see if their catalog was fed into the machine.

And the numbers are staggering.

The scope is pretty staggering, with 12 million tracks in one database, 9 million in another, and the two final ones each containing about 100,000 songs. Together, that’s more than 20 million tracks — the largest of the four datasets contains 12 million tracks, representing 91 years of music on its own .

Within 48 hours of publication, the tool went viral. Artists across every genre — from Grammy winners to bedroom producers — started searching their names. What they found wasn’t pretty.

The Artist Reactions Are Devastating

The moment that cracked open the floodgates came from breakcore producer Sophiaaaahjkl;8901, who posted on X: “The Atlantic just published a searchable database of the music used by Suno and Udio. They used one hundred and thirty eight of my songs across two of their datasets. This is almost my entire catalogue of music.”

She’s not a superstar. She has roughly 10,000 followers across social platforms. And yet her entire creative output from 2017 to 2024 was sitting inside AI training data.

Another X user searched up Quedeca’s catalogue on AI Watchdog and found that there were “295 grabs across 8 known data sets from various releases, snippets, videos, and corresponding lyrics from Genius.” In response, the artist could only respond with a bleak note of sarcasm: “Yayyyyy!”

Backxwash, Titus Andronicus, Tre Mission, Lunice, DJ Sabrina the Teenage DJ and more are among musicians who have expressed their disdain for finding their music within the searchable database.

And record label owner Vince Valholla of Valholla Records posted a video saying: “Late last night I found out over 100+ songs from our catalogue were used to train AI models.” He continued, “To be honest, until the major labels go through their lawsuits, there’s no way for artists or labels to fight back. They literally scraped the best songs from our catalogue. I’m sick.”

What The Atlantic Actually Found

The investigation was led by Alex Reisner as part of The Atlantic’s “AI Watchdog” project, which previously documented more than “7.5 million books, 81 million research articles, 15 million YouTube videos, and writing from tens of thousands of movies and television shows” allegedly included in AI training data sets .

Now it’s music’s turn.

“Companies often claim to use only content that is freely available online, but the datasets reveal the quantity of downloadable music that developers can access even though it is not supposed to be free,” wrote journalist Alex Reisner.

The key insight: the story is less about any single track and more about the sheer volume of commercial, copyrighted music that sits inside redistributable research datasets. The Atlantic also reported that the datasets are searchable, and that one of them contained hundreds of entries tied to a single well-known recording artist.

The data names hit songs from Taylor Swift, Bad Bunny, and many other major artists. But the real revelation isn’t that Taylor Swift got scraped — it’s that everybody did. Every indie artist who ever uploaded a track to the internet is potentially in these databases.

The four datasets have been downloaded thousands of times, but the artificial intelligence industry’s secrecy about training data means we don’t know that much about which companies have used them.

Here’s where it gets really interesting. This investigation didn’t drop in a vacuum. It landed right in the middle of the most consequential legal showdown in AI music history.

AI music companies including Suno and Udio are now grappling with at least 12 lawsuits, according to The Atlantic. And the landscape has fractured into distinct camps:

The settlers: Universal Music Group settled with Udio in October 2025, announcing a “compensatory legal settlement” plus new recorded-music and publishing licenses for a jointly developed AI platform set to launch in 2026.

Warner Music settled with Suno in November 2025 and signed a licensing deal.

The fighters: Sony Music has settled with neither, and its fair-use cases against Suno and Udio are expected to produce a pivotal ruling in summer 2026 that could set legal precedent for every AI music company.

The independents getting crushed: At the level of an individual act, the instrumental duo The American Dollar alleged in a May 2026 lawsuit that Suno had cut its licensing revenue by nearly 80%. Their licensing revenue “has been nearly eliminated since the first version of Suno AI was made available to the public.”

The searchable databases change the game because for artists, searchable proof of which songs trained a model is the hard evidence these cases have lacked. It strengthens the labels’ cases against Suno and Udio, and it raises the pressure on platforms that still will not disclose what their own tools were trained on.

The July Ruling That Could Change Everything

A critical deadline looms. A key summary-judgment hearing in the Massachusetts case is scheduled for July 2026 before Chief Judge F. Dennis Saylor IV.

The stakes couldn’t be higher. If Suno wins on fair use, it blows up every licensing deal in the AI music space. If it loses, the UMG-Udio template becomes the industry standard.

Meanwhile, in Germany, oral proceedings wrapped up in March 2026 before a packed courtroom, with the court initially signalling a ruling within three months. On May 26, 2026, the Munich Regional Court issued a press release moving the decision from June 12 to July 31, 2026.

Two continents. Two rulings. Both in July. The next six weeks could define whether AI music operates under a licensing regime or a free-for-all.

The Flood Is Already Here

While the courts deliberate, the output side of the equation is already overwhelming. Deezer, the global music experiences platform, is now receiving almost 75,000 AI-generated tracks per day, representing roughly 44% of the daily uploads — more than 2 million AI-generated tracks uploaded per month.

The growth curve is terrifying if you’re a human musician: In January 2025, it reported that the figure was 10,000 songs a day; in April 2025, it noted the figure was 20,000 songs daily; and in September 2025, it reported the number had risen to 30,000 songs daily.

In January 2026, the figure was reported as 60,000 songs daily. Now it’s 75,000.

There’s a small silver lining: consumption of AI-generated music on the platform is still very low, between 1-3% of the total streams, and a majority (85%) of these streams are detected as fraudulent and demonetized by Deezer.

But here’s the quiet devastation that nobody’s talking about: ninety-seven percent of listeners couldn’t distinguish AI music from human-made tracks. The only thing standing between AI music and mass adoption is detection technology and platform policy — not listener preference.

What This Means for Music Video Creators

If you’re a musician making music videos — whether with AI tools or traditional methods — this moment matters for you. Here’s why.

Your music might already be in these databases

Go check. Seriously. The Atlantic’s AI Watchdog tool is public and searchable. If your tracks are in there, you now have documentation. Whether that translates to legal recourse depends on the July rulings, but having the evidence is step one.

The visual side is cleaner than the audio side

This is the critical distinction that makes AI music videos a fundamentally different proposition than AI-generated music. When you use an AI video tool to create visuals for your song, you’re the copyright holder of the music. The AI is generating visuals from text prompts — not scraping and regurgitating someone else’s footage.

That’s a huge difference from what Suno and Udio are doing. Your song is yours. The visuals are generated fresh. There’s no database of scraped music videos sitting behind the generation. If you want to learn how to make an AI music video the right way, the starting point is always your original music.

Differentiation just became your superpower

With 75,000 AI-generated tracks flooding Deezer every single day, standing out visually isn’t optional anymore — it’s survival. A compelling music video transforms a track from one of millions into something shareable, memorable, and algorithm-friendly. Whether you’re making hip-hop visuals, indie aesthetics, or lo-fi vibes, pairing your human-made music with striking AI-generated visuals is the smartest move in the current landscape.

The licensing landscape is splitting

June may continue to separate the AI music market into two lanes: companies fighting over past training data and companies trying to build licensed future creation systems. As a musician, tool choice is becoming part of rights strategy. Where you make the song may matter almost as much as what the song sounds like.

The same logic applies to video tools. Some AI video generators train on scraped content; others use licensed data. When you’re choosing tools for your EDM video or pop visual, knowing where the tool sits in the licensing spectrum matters.

The Bigger Picture

We’re watching the music industry go through something that’s never happened before. The raw materials of an entire art form — 20 million songs spanning 91 years of music — have been documented as training fodder for AI models, and the artists who made those songs are just now finding out.

With Google and Stability AI linked to at least one dataset, and Sony’s fair-use cases against Suno and Udio heading toward a pivotal summer ruling, the music industry’s data provenance reckoning is now documented and undeniable.

The next few weeks will be decisive. The July rulings in both the U.S. and Germany won’t just determine who pays whom — they’ll determine what kind of music industry exists in 2027 and beyond. Will it be one where training on copyrighted music requires a license, or one where it’s considered fair use?

For independent artists, the answer matters enormously. But here’s the thing: regardless of which way the rulings go, making great music and pairing it with great visuals remains the single best strategy. The AI flood creates noise. Your job is to create signal.

Make Your Music Visible

While the legal battles play out, one thing is clear: music without visuals disappears. In a world where 75,000 AI-generated tracks hit streaming platforms every day, a music video is no longer a luxury — it’s your lifeline.

OneMoreShot.ai lets you turn your original music into stunning AI-powered music videos in minutes. Upload your track, guide the visual style, and get a video that’s uniquely yours — built from your music, not scraped from someone else’s catalog. In the current climate, that distinction matters more than ever.