Is there a definitive way to check whether a text is AI generated? Let’s test it out!
There has been growing concern among academics about the use of artificial intelligence (AI)-generated content, such as that produced by ChatGPT, to cheat on assessments. While AI-generated content might not be considered “plagiarism” in the traditional sense, as it creates completely new material, it still isn’t exactly honest for students to present an AI’s work as their own. Also, using AI to complete assignments undermines the purpose of learning and deprives students of valuable educational opportunities.
Inspired by all these concerns, 22-year-old Princeton student Edward Tian created GPTZero, a tool designed to detect AI-generated text. Tian’s technology is available for free to educators across the world, and more than 20,000 educators have already signed up to receive updates on GPTZero’s development. If you are wondering how GPTZero is able to differentiate between human and AI writing styles, here is a deep dive into the inner workings of the tool.
How was GPTZero created?
GPTZero is an AI detection tool that Tian created during his college winter break while working with Princeton’s Natural Language Processing Lab. Having a minor degree in Journalism, Tian designed the tool intending to help both educators and journalists fight against AI plagiarism.
Since its initial release on January 2, the tool has been continually expanded upon. On February 21, Tian tweeted that GPTZero is partnering with ed-tech organizations like K16 Solutions to train the tool on a larger data set, making it even more powerful.
How does GPTZero work?
According to Tian, GPTZero measures two properties to check for AI-generated text—perplexity and burstiness. Perplexity measures the randomness of a text. If a text is very random, GPTZero finds the text confusing and determines that it is written by a human being. Alternatively, texts written by AI are less random or confusing to GPTZero because it has been trained on such data.
Burstiness refers to the difference in sentence lengths in a text. When a human being writes something, their sentences will have varying lengths. However, a text written by an AI would be largely uniform. Therefore, the higher the burstiness of a text, the higher the chance that a real person wrote it.
Besides checking texts that were directly copied off AI writing tools, GPTZero is also capable of detecting and highlighting the parts of a text that were written by a language learning model like ChatGPT. Teachers can upload multiple files together to quickly check the work of their entire class and make sure that students are learning and not just copying from AI models.
GPTZero is said to correctly detect the work of ChatGPT by 98%. To test it out for myself, I took the following text written by ChatGPT and entered it into the GPTZero website.
“As AI-generated text becomes more sophisticated, it can be challenging to detect whether a piece of text has been written by a human or an AI system. However, there are a few techniques that educators can use to identify AI-generated text—
Look for inconsistencies: While AI-generated text can be impressive, it may still contain inconsistencies that a human would not make. For example, the text may lack coherence or contain errors in grammar, punctuation, or spelling.”
This is what GPTZero had to say about it:
So right from the outset, GPTZero failed to see that the entire text had been written by an AI. It gave the text a perplexity score of 44 and a burstiness score of 28.554.
I imagined I had fallen into the 2% error margin that the tool had and decided to try again. This time, I wanted to see what the tool would think of the text that I had actually written. Here is what I copied into the tool from my article “Is Using AI for Academic Writing Cheating?”:
“The first issue with AI writing is that it is largely reliant on information from the web, which can be inaccurate. So, if a student were to just take an essay written by an AI and send it to their professors as is, chances are that it would be riddled with mistakes and inaccuracies.
There is also a high likelihood that the essay would have racist and sexist undertones, given that most AI tools trained on web data tend to have that problem.”
I tried to keep the text length roughly the same to make the test as fair as possible, and this is what the tool told me—
This time, it didn’t highlight any sections that it believed were written by an AI. As much as I am glad to be recognized as a human, it was rather shocking to see that my perplexity and burstiness scores were lower than those of ChatGPT at 38.667 and 20.404, respectively.
I did one final test where I mixed my own content with that of ChatGPT. Here is what I entered into GPTZero:
“Plagiarism detection tools such as Turnitin or Grammarly can identify whether a piece of text has been copied from other sources or generated by an AI system. However, these tools are not always accurate and may generate false positives.
Besides the technical solutions, some experts suggest that if educators are worried about students cheating on their assignments using AI, they can simply change the assessment. Replacing the written submissions with group presentations or oral reports would reduce the scope of cheating and would make sure that the student puts in the work.”
The first paragraph of this text is generated by ChatGPT and the second has been written by yours truly. Here is what GPTZero thinks of it:
Again, it didn’t flag any section as “AI-generated”, but it did assign this text a higher perplexity score of 48.5 compared to the previous texts.
This experiment should make it pretty evident that this tool is still in its early stages of development, and educators should not rely solely on it to assess student work. GPTZero acknowledges this itself (as shown in the picture above), urging educators to use it as one of the many tools for grading assignments.
Tian also admits to the shortcomings of the tool. “I don’t want anybody making definitive decisions. This is something I built out over holiday break,” he says. Nevertheless, the fact that such a tool exists and that Tian is actively working towards improving it is encouraging. As technology makes it easier for students to cheat, GPTZero serves as a reminder that innovative solutions can also be developed to prevent it.
- 5 Essential Reasons Chatbots Fail—and Will ChatGPT, Too?
- Will ChatGPT-Powered Bing Finally Get a Chance to Replace Google?
Header image courtesy of Envato.