Dan Milmo Global technology editor 

Activist group says it has scraped 86m music files from Spotify

Platform with 700m users says it is investigating after Anna’s Archive claims to have scraped tracks and metadata
  
  

Spotify logo behind a pair of earbuds
A campaigner said: ‘This stolen music is almost certain to end up training AI models.’ Photograph: Christian Hartmann/Reuters

An activist group has claimed to have scraped millions of tracks from Spotify and is preparing to release them online.

Observers said the apparent leak could boost AI companies looking for material to develop their technology.

A group called Anna’s Archive said it had scraped 86m music files from Spotify and 256m rows of metadata such as artist and album names. Spotify, which hosts more than 100m tracks, confirmed that the leak did not represent its entire inventory.

The Stockholm-based company, which has more than 700 million users worldwide, said it had “identified and disabled the nefarious user accounts that engaged in unlawful scraping”.

“An investigation into unauthorised access identified that a third party scraped public metadata and used illicit tactics to circumvent DRM [digital rights management] to access some of the platform’s audio files,” said Spotify.

Spotify does not believe the music taken by Anna’s Archive has been released yet. Anna’s Archive, which is known for providing links to pirated books, said in a blog it wanted to create a “‘preservation archive’ for music”.

The group claimed the audio files represented 99.6% of all music listened to by Spotify users and would be shared via “torrents”, a means of sharing large digital files online.

“Of course Spotify doesn’t have all the music in the world, but it’s a great start,” said Anna’s Archive, which describes its mission as “preserving humanity’s knowledge and culture”.

“With your help, humanity’s musical heritage will be forever protected from destruction by natural disasters, wars, budget cuts and other catastrophes,” said the group.

Ed Newton-Rex, a composer and campaigner for protecting artists’ copyright, said the leaked music would probably be used for developing AI models.

“Training on pirated material is sadly common in the AI industry, so this stolen music is almost certain to end up training AI models. This is why governments must insist AI companies reveal the training data they use,” he said.

The Anna’s Archive site makes references to LibGen, a vast online archive of pirated books that has allegedly been used by Mark Zuckerberg’s Meta to train its AI models. According to a US court filing, Zuckerberg, Meta’s founder and chief executive, approved use of the LibGen dataset despite warnings within the company’s AI executive team that it was a dataset “we know to be pirated”.

Meta successfully defended a claim for copyright infringement by authors, but the plaintiffs in the case are seeking to amend their claim.

The co-founder of an AI startup wrote on LinkedIn that members of the public could in theory “create their own personal free version of Spotify”. Yoav Zimmerman, a co-founder of Third Chair, said it could also allow tech companies to “train on modern music at scale”.

He added: “The only thing stopping them is copyright law and the deterrent of enforcement.”

Spotify said it had put in place new safeguards “for these types of anti-copyright attacks” since the Anna’s Archive announcement and was “actively monitoring for suspicious behaviour”.

Copyright has become a battleground between artists, authors and creatives on one side and AI companies on the other. AI tools such as chatbots and music generators are trained on vast amounts of data taken from the open web, including copyright-protected work.

In the UK, creative professionals have protested against a government proposal to let AI companies use copyright-protected work without permission unless the owner of the copyright-protected work signals they do not want their data to be taken. Almost every respondent to a government consultation on the proposal has backed artists’ concerns.

Liz Kendall, the secretary of state for science, innovation and technology, told parliament this month there was “no clear consensus” on the issue, adding that ministers would “take the time to get this right”. The government has pledged to make policy proposals on AI and copyright by 18 March next year.

 

Leave a Comment

Required fields are marked *

*

*