‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says

The developer OpenAI has said it would be impossible to create tools like its groundbreaking chatbot ChatGPT without access to copyrighted material, as pressure grows on artificial intelligence firms over the content used to train their products.

ADVERTISEMENT

OpenAI Highlights Importance of Copyrighted Material in Training AI Tools

OpenAI, the developer behind the innovative chatbot ChatGPT, has emphasized the significance of accessing copyrighted material in the creation of AI tools. Chatbots and other AI products are trained using a vast amount of data taken from the internet, much of which is protected by copyright.

OpenAI's statement comes in response to mounting pressure on AI firms regarding the content they use for training. Last month, the New York Times filed a lawsuit against OpenAI and Microsoft, accusing them of 'unlawful use' of its work for their products. OpenAI argues that training large language models like GPT-4, the technology powering ChatGPT, would be impossible without copyrighted materials.

The company further stated that restricting training data to out-of-copyright materials would result in inadequate AI systems. OpenAI believes that training AI models necessitates access to a wide range of human expression, including blogposts, photographs, forum posts, software code, and government documents.

Legal Standpoint and AI Safety Measures

OpenAI defends its use of copyrighted material by invoking 'fair use' – a legal doctrine that allows certain uses of copyrighted content without obtaining permission from the owner. In its submission, OpenAI clarified that it believes copyright law does not prohibit the training of AI models.

The company's statement follows legal complaints from various authors who sued OpenAI, alleging 'systematic theft on a mass scale'. In addition, Getty Images is suing Stability AI, the creator of Stable Diffusion, for alleged copyright breaches, while a group of music publishers is suing Anthropic, the company behind the Claude chatbot, for alleged misuse of copyrighted song lyrics.

On the issue of AI safety, OpenAI expressed its support for independent analysis of its security measures. The company endorses 'red-teaming' of AI systems, wherein third-party researchers simulate the behavior of malicious actors to test the safety of a product. OpenAI is also among the companies that have agreed to collaborate with governments in safety testing their most powerful models.

Conclusion

The use of copyrighted material plays a crucial role in the development of AI tools like ChatGPT. OpenAI maintains that without accessing copyrighted work, training leading AI models would be unfeasible. However, this reliance on copyrighted content has sparked legal disputes and raised concerns about fair use and copyright infringement.

Furthermore, OpenAI's advocacy for independent analysis and safety testing demonstrates the company's commitment to ensuring the responsible deployment of AI technologies. As the AI field continues to evolve, discussions around copyright, fair use, and AI safety will undoubtedly remain at the forefront.