BW Communities

Former OpenAI Researcher Speaks Out, Claims AI Systems Violate Copyright Law

BW Online Bureau Oct 24, 2024

A former researcher at OpenAI, Suchir Balaji, has publicly criticised the company's use of internet data, including copyrighted material, to train artificial intelligence systems like ChatGPT,

A former researcher at OpenAI, Suchir Balaji, has publicly criticised the company's use of internet data, including copyrighted material, to train artificial intelligence systems like ChatGPT, as per a The New York Times report.

After nearly four years at OpenAI, Balaji left the company in August 2023, expressing concerns about the ethical and legal implications of the AI technologies he helped develop. His departure marks one of the first instances of a former AI company employee speaking out against practices involving the use of copyrighted data for training models.

Balaji, who worked on projects like GPT-4, initially believed that OpenAI was free to use any internet data to build its systems. However, following the launch of ChatGPT in late 2022, he reconsidered the company's practices. Balaji came to believe that OpenAI's reliance on copyrighted data without proper authorisation violated copyright laws, and that the widespread use of such technology posed significant risks to the internet ecosystem. He argued that systems like ChatGPT directly compete with the original content creators whose work was used to train the models, undermining the sustainability of the internet as a whole.

OpenAI, along with its partner Microsoft, maintains that its practices are legal under the "fair use" doctrine, which allows for the use of copyrighted material under certain conditions. The companies argue that their AI models substantially transform the data they are trained on and do not serve as direct substitutes for the original works. In a statement, OpenAI said its approach to building AI models is consistent with long-established legal principles, stressing the importance of such innovation for US competitiveness.

Balaji, however, disagrees, contending that while the outputs of models like GPT-4 are not exact copies of the data they were trained on, they are also not entirely novel. He believes the AI-generated content still draws too closely from copyrighted material, violating legal standards. This concern has led to lawsuits from a range of individuals and organizations, including artists, news outlets, and record labels, who argue that AI companies have used their work without permission. Among the plaintiffs is The New York Times, which filed a lawsuit against OpenAI and Microsoft in December, accusing them of using millions of its articles to develop chatbots that now compete with the publication as sources of reliable information.

Balaji’s critique extends beyond legal concerns. He warns that AI systems like ChatGPT are reshaping the internet for the worse, replacing existing services with flawed AI-generated content, which can sometimes be inaccurate or entirely fabricated. He has called for increased regulation to prevent further harm to content creators and to ensure AI technologies are used responsibly.

As legal debates continue, experts have noted that intellectual property laws were not designed to address the complexities of modern AI. Some, including intellectual property lawyer Bradley J Hulbert, argue that new legislation is needed to clarify the boundaries of AI development and protect content creators. Balaji agrees, asserting that regulation is the only way to address the growing issues posed by AI systems.

(Inputs from The New York Times)