Researchers from the USA have developed a new tool that can apparently recognise AI texts with a probability of 99 percent. They have tested the approach using scientific documents as examples. The background.
ChatGPT makes it possible for everyone to have access to artificial intelligence. However, this does not only have advantages. In the past, it has often happened that the OpenAI language model was misused.
ChatGPT used for fraud and forgery
Criminals have used the AI in some cases to commit crimes, and students have had it do their homework. ChatGPT has also been used in journalism to write articles from time to time.
“Distinguishing between human writing and artificial intelligence is now both critical and urgent,” writes a research group from the University of Kansas in a new science paper. They have developed a new method to distinguish chatGPT texts from those of human scientists – and very accurately.
AI vs. AI: New tool can recognise AI texts 99 per cent correctly
“Using a set of 20 features, we created a model that assigns the author as human or AI with over 99 per cent accuracy,” the research paper says.
The experts fed the AI model with stylistic and content criteria in which the respective texts differ. They used articles from the scientific magazine Science and products from ChatGPT.
They divided the typical differences between the two types of text into four categories: Paragraph length, differences in sentence length, differences in punctuation and popular words.
Human vs. ChatGPT: How the texts differ from each other
The research shows that the paragraphs produced by ChatGPT are significantly less complex and shorter than the texts from the science magazine. People, on the other hand, also form much longer sentences and have a preference for ambiguous language.
They also frequently use words such as “but” and “although”. The research group was also able to identify a tendency to use punctuation marks, numbers and proper names in the comparison of the two texts.
The scientists from Kansas used the software library XGBoost to train their model. To test its performance, the researchers chose a variant of leave-one-out cross-validation.
“The chosen method removed the possibility of using any examples from a particular author or a single essay, in the case of ChatGPT, to make it easier to identify other paragraphs within the same essay (by the same author). The model therefore does not rely on having previous examples of works by the human author whose essay is being classified,” the researchers said.
Not the first tool to recognise AI texts
The model created by the Kansas researchers is not the first approach to recognise AI-generated texts. In the past, even ChatGPT’s developer OpenAI has released a tool to distinguish AI texts from human products.
And still other software companies offer various tools that are supposed to do the same job. Nevertheless, the hit rate is likely to be lower than with the new model from Kansas.