AI detectors have a bias against non-native English speakers

Bias in the system

To demonstrate the subpar accuracy of GPT detectors and how this is inadvertently penalizing individuals with limited linguistic proficiency, Zou and his colleagues had seven popular GPT detectors evaluate writing samples from both native and non-native speakers.

They fed them 91 English essays written for a standard English proficiency test called the Test of English as a Foreign Language (TOEFL) from a Chinese forum as well as 88 US eighth grade essays from the Hewlett Foundation’s ASAP dataset.

“A majority of essays written by non-native English speakers are falsely flagged by all the detectors as AI-generated,” said Zou. Over half of the non-native English writing samples were misclassified as AI generated, with one detector flagging nearly 98% of the TOEFL essays, while the accuracy for native samples remained near perfect.

According to the team, this is based on the level of “perplexity” of a given work. “Perplexity basically measures how surprising the word choices are in the text,” explained Zou. “Text with common or simple word choices tends to have lower perplexity. These detectors are more likely to flag text with low perplexity as AI-generated.”

“Moreover, they are very easy to fool,” he added. Using better prompts and asking ChatGPT to write using more sophisticated language, the detectors could be bypassed, classifying these submissions as human-written because they have a higher engineered perplexity.

“This raises a pivotal question,” wrote the authors in their paper. “If AI-generated content can easily evade detection while human text is frequently misclassified, how effective are these detectors truly?”

This could lead to significant problems as non-native speakers will be more likely to be mistakenly accused of cheating by inaccurate detectors. But the issue doesn’t end here as search engines such as Google, which drives a majority of web traffic, say AI-generated content goes against their guidelines and is subsequently characterized as spam. This would inadvertently lead to non-native English writers becoming invisible online.

A recommended hold on detectors

As with any new technology, there are benefits and pitfalls that need to be carefully navigated to minimize any detrimental effects. The benefits of language models, like ChatGPT, are only beginning to reveal themselves, and rather than banning this technology, perhaps current systems can evolve with it.

For example, having ChatGPT help spruce up a resume could level the playing field, putting more emphasis on interviews and demonstration of skill, and making recruitment more equitable. Or perhaps our educational systems could incorporate language models into their learning programs. “We could teach students and researchers how to creatively use [language models] to improve their education and work, and also how to critically evaluate their outputs,” said Zou.

The issue is, of course, more nuanced and a solution will require a careful approach. But the reality is this technology is likely not going anywhere, and so society must learn to adapt and work with it lest vulnerable people be left behind.

What the current study highlights is the dangers in applying inaccurate detectors to routing out where its been used. GPT detectors need to be trained and evaluated more rigorously on text from diverse types of users if they are to be used in future, according to Zou.

“Our current recommendation is that we should be extremely careful about and try to avoid using these detectors as much as possible,” said Zou. “It can have significant consequences.”

Feature image credit: Ralph van Root on Unsplash

This article was updated on July 13, 2023 to correct the spelling of the study author’s name from Zhou to Zou

AI detectors have a bias against non-native English speakers

Bias in the system

A recommended hold on detectors

ASN Weekly

Turning Periods into Power: Menstrual Blood a Valuable Resource for Medical Diagnostics

Walking on Air: Pneumatic System Enables Autonomous Motion in Soft Robots

Methods Perspectives: Magnetic Force Microscope Calibration Explored by Héctor Corte-León

Kirigami-inspired neural probes are a cut above

Working close to robots could be safer with cutting-edge Kirigami e-skin

Water-powered gadgets may be on the horizon thanks to new evaporation-based energy device

AI detectors have a bias against non-native English speakers

Bias in the system

A recommended hold on detectors

Popular

ASN Weekly

Turning Periods into Power: Menstrual Blood a Valuable Resource for Medical Diagnostics

Walking on Air: Pneumatic System Enables Autonomous Motion in Soft Robots

Methods Perspectives: Magnetic Force Microscope Calibration Explored by Héctor Corte-León

Kirigami-inspired neural probes are a cut above

Working close to robots could be safer with cutting-edge Kirigami e-skin

Water-powered gadgets may be on the horizon thanks to new evaporation-based energy device