BW Communities

60% GPT-3.5 Outputs Plagiarised, Raises Copyright Concerns: Study

BW Online Bureau Feb 27, 2024

A study by a plagiarism detection company Copyleaks has shown that 60 per cent of GPT-3.5 outputs contain some degree of plagiarism. The same has started a lot of discussions and considerations on the possible negative impacts of AI-driven materials on copyright and uniqueness.

The originality of a GPT-3.5 output was communicated by means of Copyleaks' own scoring system, which covered the exact copy, minor changes or paraphrasing among others. The report shows that about 45.7 out of every 100 sentences that GPT-3.5 produced were identical to some source text, another 27.4 per cent had minor word variations, while 46.5 per cent of text was largely paraphrased.

Additionally, a single study was conducted by GPT-3.5 on more than 25 subjects where it generated a few hundred pages of text on a few thousand words. Students succeeding in the same program had the same subjects in common: computer science (100 per cent), physics (92 per cent) and psychology (88 per cent). On the other hand, theatre (0.9 per cent), humanities (2.8 per cent) and English language (5.4 per cent) were revealed as the low-choice options among the university students.

OpenAI reacted to the Copyleaks report in permission that they have trained their models to be about the concept and solving a problem with a plan that is there to minimise unintentional memorisation. Suchla and Sentell also noted that their terms of use restrict students from simply paraphrasing materials.

Furthermore, this result is connected with the current lawsuit about the breach of copyright by the New York Times alleging that AI systems committed copyright infringement through 'widespread copying'. OpenAI identifies this activity as 'regurgitation' and argues that it is a rare activity and that the New York Times manipulated the prompts.

Creators within different art forms, however, feel like the training tools are fed by copyrighted materials, which makes some of the originality and uniqueness of content to be threatened. Especially earlier legal disputes tended to show the dominant role of technology firms, the latest case of the NYT implies the coming change in judicial setting with respect to AI-made content.