The integration of big data and artificial intelligence has enabled scientists at the University of Copenhagen to find out whether an assignment was written by the students themselves or penned by a ghostwriter—with almost 90% accuracy.
A number of studies have demonstrated that cheating on assignments is prevalent and becoming more and more common among high school students. At the University of Copenhagen’s Department of Computer Science, attempts to identify cheating on assignments through writing analysis via artificial intelligence have been ongoing for a few years. At present, based on analyses of 130,000 written Danish assignments, scientists can identify if the assignment has been written by the students themselves or composed by a ghostwriter with almost 90% accuracy.
Danish high schools presently use the Lectio platform to find if a student has submitted plagiarized work that has paragraphs copied directly from an assignment submitted earlier. High schools have found it more difficult to discover if a student has assigned someone else to compose the assignment for them, something that occurs to a relatively systematized degree through online services. The issue of the SRP, a major written assignment in the final year of Danish high school, is mainly stating this. As the assignment counts for double, students have gone to the extent of tendering out their writing assignments on the Danish classified website, Den Blå Avis.
The problem today is that if someone is hired to write an assignment, Lectio won’t spot it. Our program identifies discrepancies in writing styles by comparing recently submitted writing against a student’s previously submitted work. Among other variables, the program looks at: word length, sentence structure and how words are used. For instance, whether ‘for example’ is written as ‘ex.’ or ‘e.g.’.
Stephan Lorenzen, PhD student, Department of Computer Science
He, together with the remaining DIKU-DABAI research group, recently demonstrated their findings at a main European AI conference.
Prior to Setting the Trap, an Ethical Debate
The program, Ghostwriter, is developed based on machine learning and neural networks—sections of artificial intelligence that are specifically useful for identifying patterns in images and texts. MaCom, the company that offers Lectio to Danish high schools, has compiled a dataset of 130,000 written assignments from 10,000 different high school students accessible to Ghostwriter project scientists at the Department of Computer Science. Right now, it is still a research project.
Stephan Lorenzen doesn’t believe that it is impractical for the program to find its way into high schools in the near future because schools should continuously stay abreast of technological advancements to deal with “authorship verification.”
“I think that it is realistic to expect that high schools will begin using it at some point. But before they do, there needs to be an ethical discussion of how the technology ought to be applied. Any result delivered by the program should never stand on its own, but serve to support and substantiate a suspicion of cheating,” thinks Lorenzen.
Police and Fake News
Ghostwriter’s technological foundation can be applied to a different place in society. For instance, the program could be employed in police work to supplement fake document analysis, work performed by forensic document examiners and others.
“It would be fun to collaborate with the police, who currently deploy forensic document examiners to look for qualitative similarities and differences between the texts they are comparing. We can look at large amounts of data and find patterns. I imagine that this combination would benefit police work,” says Lorenzen, who stresses that ethical discussions are required here too.
There are a broad variety of applications for the artificial intelligence used by scientists at the Department of Computer Science to detect cheating on assignments. It has already been used to analyze Twitter tweets to find out whether they were written by actual users or composed by robots or paid imposters.