Allegations have surfaced accusing tech giants OpenAI and Google of utilising YouTube videos without proper authorisation to train their artificial intelligence (AI) models, as reported by The New York Times.
NYT sources claimed that both entities transcribed large volumes of YouTube content to feed their AI systems, potentially infringing upon creators' copyrights.
OpenAI reportedly employed its Whisper speech recognition tool to transcribe over one million hours of YouTube videos, subsequently using this data to train its GPT-4 model. The report suggested that OpenAI President Greg Brockman was involved in this. Google's policies explicitly prohibit "unauthorized scraping or downloading of YouTube content," according to Matt Bryant, a Google spokesperson.
Despite Google's regulations, the report implied that some within the company were aware of OpenAI's actions but chose not to intervene, possibly due to Google's own use of YouTube videos to train its AI models. Google has stated that it only uses videos from creators who have consented to such usage.
The New York Times report alleged that Google modified its privacy policy in June 2023 to encompass a broader scope of publicly available content usage, including Google Docs and Google Sheets, for AI training purposes. Google maintained that these changes were made for clarity and asserted that data usage is contingent upon user consent for experimental features tests. The updated policy included the addition of Bard as an example of potential data usage.