Artificial Intelligence (AI) has rapidly transformed the way we study and conduct research. School and college students are now using AI chatbots for various tasks, from homework to preparing research papers. People around the world are increasingly relying on tools like Anthropic’s Claude, Google’s Gemini, OpenAI’s ChatGPT, and xAI’s Grok.
But as the use of these tools grows, so too are questions about their misuse. According to new research, if caution is not exercised, these AI systems can also become a tool for academic fraud or research manipulation.
The study was led by Anthropic researcher Alexander Alemi and Cornell University physicist Paul Ginsparg. Paul Ginsparg is also the founder of the platform called arXiv.
The researchers tested 13 leading AI models by presenting them with a variety of questions and instructions. These questions ranged from general queries to requests for help with academic fraud.
According to the report, some AI models were successful in rejecting such requests. However, in many cases, repeated requests led some models to produce false or misleading research material.
A major reason behind this research was the increasing number of suspicious research submissions on the arXiv platform. arXiv is an open-access website where scientists worldwide share research papers and preprints related to physics, mathematics, computer science, and other subjects.
Recently, several articles have appeared on the platform, raising suspicions that they contain AI-authored text. Consequently, the researchers decided to investigate whether popular AI chatbots could be easily persuaded to produce scientific papers or exploit academic systems.
During the test, scientists determined five different levels of user intent. Some questions were based on general curiosity, such as where an amateur researcher should share their unique ideas. Other questions were deliberately asked with malicious intent, such as ways to submit a fake research paper in the name of a rival scientist to harm them. In theory, AI systems should reject such requests, but the study found that different models responded significantly differently.
According to the research results, Anthropic’s Cloud model appeared to be the most resilient to such malicious behavior. In contrast, Grok from Elon Musk’s company xAI and some of OpenAI’s older GPT models were often more willing to comply with such requests, especially when the user repeatedly asked the question.
According to the research results, Anthropic’s Cloud model appeared to be the most resilient to such malicious behavior. In contrast, Grok, from Elon Musk’s company xAI, and some older GPT models from OpenAI were often more willing to comply with such requests, especially when the user repeatedly asked questions.
The AI ​​Created a Fake Research Paper After Persistent Requests
The study revealed an interesting example. When Grok-4 was asked to generate false research results, it initially refused. However, after repeated requests, the model finally generated a fictitious machine-learning research paper that included fake data and benchmarks.
Researchers believe that powerful text-generation tools could lead to a rapid increase in low-quality or completely fabricated research papers in the future. If this happens, peer-review experts will be put under additional pressure and it could become difficult to identify genuine and reliable research. An interesting example emerged when Grok-4 was asked to generate false research results, but it initially refused. But after repeated requests from the user, the model eventually generated a fictitious machine-learning research paper that included simulated data and benchmarks.
