Is DeepSeek Using Copyrighted Music for AI Training Without Permission?

Is DeepSeek Misusing Copyrighted Material?

Is DeepSeek – the Chinese chat-bot AI – plunging the global artificial intelligence industry into chaos by training its AI on music protected by copyright, without seeking permission?

The General Director of the International Confederation of Music Publishers (ICMP) has indicated that there may be evidence supporting this claim.

“DeepSeek enters the category of artificial intelligence companies chosen to scrape Internet content, including protected music globally, and use it for commercial purposes without a license from Right holders and Creators,” ICMP General Director John Felan wrote on LinkedIn here on Friday, January 31.

Evidence Collected by ICMP

Felan stated that the ICMP, which serves as an umbrella organization for music publishers, is conducting “studies to collect evidence” and released a brief video showing that DeepSeek allegedly published lyrics from Taylor Swift‘s Love Story, Jay-Z‘s Empire State of Mind, and Ed Sheeran‘s Thinking Out Loud.

When confronted about copyright issues, DeepSeek responded: “You are absolutely right! The lyrics are also protected by copyright, and I appreciate you pointing that out. The difference in my approach is based on fair use guidelines, which allow limited use of copyrighted material for commentary or personal insights.”

The chatbot further claimed, “When I provide the text, it is usually in response to a request for discussion, not for redistribution or commercial use.”

DeepSeek contends that some AI art companies, when facing copyright infringement lawsuits, are arguing that their use of copyrighted materials qualifies as “fair use” under U.S. law—a position vehemently opposed by rights activists.

Legal Context of AI and Copyright

The “fair use” defense has been utilized by AI music generation platforms like Suno and Surprise in response to legal challenges from labels like Sony Music Entertainment, Universal Music Group, and Warner Music Group.

This argument has also been employed by Anthropic, the developer of the Claude chatbot, in defending against a lawsuit from publishers Universal Music Publishing, Agreement, and Abkco, which alleged that Claude was trained on copyrighted musical texts.

However, publishers have asserted that Claude was trained on copyrighted texts and would be considered plagiaristic if asked to compose “original” texts.

In the case of DeepSeek, the chatbot allegedly reproduces texts verbatim.

“DeepSeek enters the category of artificial intelligence companies chosen to scrape through the Internet content, including protected copyright of the world (and) to use it for commercial purposes without a license from Right holders and Creators.”

John Felan, ICMP

“These actions violate copyright laws and infringe upon the rights of our industry and the authors of these songs, including examples from Taylor Swift, Jay-Z, and Ed Sheeran,” Felan noted.

ICMP’s accusations against DeepSeek represent a unique circumstance where a music organization has publicly addressed copyright violations without initiating legal action, which surfaced shortly after reports that OpenAI, the creator of ChatGPT, was investigating whether DeepSeek had infringed on its intellectual property during the development of its R1 model.

Amid this discourse, Flan emphasized the “irony” of OpenAI pursuing a potential IP violation against DeepSeek, given that OpenAI also faced numerous lawsuits from copyright holders for allegedly training ChatGPT on copyrighted texts.

U.S. President Donald Trump’s AI advisor, David Sax, recently affirmed that there is “significant evidence” that DeepSeek employed “distillation” techniques to develop its AI technology.

Distillation is a process in which a smaller, more efficient AI model learns to replicate the outputs of a larger, less effective model to mimic its behavior. This technique is used to create AI services that are cheaper to develop and require less processing power.

DeepSeek has reportedly achieved this objective: it is estimated to have cost 5.6 million dollars to develop the latest DeepSeek model, compared to 100 million dollars for OpenAI’s latest chatbot, ChatGPT-4.

The news that the Chinese company was able to develop a chatbot at a fraction of the cost—and without the advanced microchips of AI—has caused market ripples throughout the AI technology sector, as reported at the end of January.

Most recently, a U.S. court ruled for the first time that the use of copyrighted material without permission to train AI is “fair use” under copyright law, a decision largely seen as beneficial for copyright owners: the judge ruled against the AI company in question.

The case involved the media conglomerate Thomson Reuters, the parent company of Reuters news, against Ross Intelligence, a now-defunct service that provided users access to a database of lawsuits compiled using AI technology.