Skip to content

The Race between ChatGPT and AI Detection Tools

How to Trick the Tools that Detect AI-Generated Content

As a software engineer, I am always looking for tools and technologies that can assist me in working more efficiently and effectively. So, when GitHub launched Copilot, a developer tool that uses the Codex natural language processing (NLP) model to not only complete lines of code and functions but also take text prompts in the form of comments and produce entire sections of code, I was pretty amazed at how shockingly well it worked. However, as with so many developers, it takes me longer to write straightforward, concise, and professional emails or blog posts than a new web app page—ChatGPT comes in here.

What is ChatGPT?

ChatGPT is like Copilot; both are tools using Large Language Models developed by OpenAI. Their models, GPT-3.5 and Codex, respectively, are based on the GPT-3 model and were trained on large data sets with specific goals in mind.

While Copilot is impressive for developers, it is somewhat of a niche as not everyone writes code. ChatGPT, on the other hand, was developed with human conversation in mind. Able to respond to writing prompts, answer questions or even explain complex scientific concepts at various detail levels, upon its release on November 30, 2022, ChatGPT took the world by storm. By December 5, it already had over 1 million users, and when it later broke the 100 million user mark in January of 2023, it became the fastest-growing consumer app of all time.

What it Means for the Future of AI

With this reception has come a surge of both excitement and concern.

Microsoft has already invested in OpenAI and confirmed its plans for a further $10 billion multi-year investment. At the same time, Google, which owns DeepMind, best known for AlphaFold, AlphaGo, and AlphaZero, has announced a $300 million investment in Anthropic, another AI startup. Both see the need to stay competitive in AI.

On the other side of the race, various platforms and sectors have raised concerns. Stack Overflow, the popular programming Q&A site, temporarily banned AI-generated content, saying that ChatGPT confidently generates incorrect answers. Meanwhile, the largest school districts have banned the tool, raising concerns over how students can use the system to bypass writing essays due to the inability of in-place anti-cheating/anti-plagiarism tools to detect AI writing. This concern has prompted the development of more advanced tools intended to detect AI-generated content.

It is this that I find most interesting, a kind of arms race between those that desire to use AI to generate content and those trying to detect said content. Like the systems generating the content, these new tools also use AI, though, in this case, it is to find patterns in the writing that would indicate the content is not human in origin.

ChatGPT in Action and Fooling Detection

As someone who would like to use ChatGPT and similar tools to generate content, how can you defeat these new tools and bypass detection? To start, I will give a prompt to ChatGPT to generate content. I will run this content against four free AI-content detection systems as a baseline. Then I will manually use mistake insertion, word/phrase replacement, and prompt adjustment to attempt to bypass detection.

Prompt:

Give a brief explanation of the C programming language.

ChatGPT Response:

“C is a general-purpose programming language that was developed in the 1970s by Dennis Ritchie at Bell Labs. It is a compiled language, meaning that source code written in C is translated into machine code that can be executed directly by a computer’s processor.

C is known for its efficiency, portability, and low-level programming capabilities. It allows for direct memory manipulation and has a small, simple set of keywords and operators, making it relatively easy to learn. C is often used in operating systems, embedded systems, and other low-level programming contexts.

Some of the key features of C include its ability to define and manipulate data structures, its support for pointers and direct memory access, and its strong typing system. C also includes a number of standard libraries that provide pre-written code for common tasks like input/output and string manipulation.

Despite its age, C remains a widely used programming language and is often used as a foundation for other languages and technologies. It has influenced many other programming languages, including C++, Java, and Python.”

Unsurprisingly, all four systems detected the content as AI-generated, delivering what they advertise. Three of the four systems gave a >90% prediction that the content was AI-generated, while the other gave a prediction of 62% that the content was non-human.

So, how do we make AI-generated content appear to be human-created? Let’s try a few different tricks. Results of the following tests are shown below in Test Results.

Mistake Insertion

The idea here is that to make errors is human. I will insert excessive punctuation and other syntactic or grammatical mistakes in this case. The highlighted parts indicate changes I made.

Alteration 1 (Errors)

C is a general purpose programming language that was developed in the 1970s by Dennis Ritchie at Bell Labs. It is a compiled language meaning that source code written in C is translated into machine code that can be executed directly by a computers processor.

C is known for its efficiency, portability and low level programming capabilities. It allows for direct memory manipulation and has a small, simple set of keywords and operators, making it relatively easy to be learned. C is often used in operating systems, embedded systems, and other low level programming contexts.

Some of the key features of C include, its ability to define and manipulate data structures, its support for pointers and direct memory access, and its strong typing system. C also includes a number of standard libraries that provide prewritten code for common tasks like input/output and string manipulation.

Despite its age, C remains a widely used programming language and is often used as a foundation for other languages and technologies. It has influence many other programming languages, including C++, Java, and Python.

Word/Phrase Replacement

The idea here is that AI-generated content has a robotic nature to it that is related to the word and phrase choices that it makes. The highlighted parts indicate changes I made.

Alteration 2 (Replacement)

C is a general-purpose programming language that was developed by Dennis Ritchie at Bell Labs, during the 1970s. It is a compiled language, this means that the source code, which is  written in C, is compiled (or translated) into machine code that can be executed directly by a computer’s processor.

C is known for its efficiency, portability, and low-level programming capabilities. It allows for direct memory manipulation and has a small, simple set of keywords and operators, making it straightforward to learn the basics. C is frequently used in operating systems, embedded systems, and other low-level programming circumstances.

Some of the key features of C include its ability to utilize data structures, its support for pointers for direct memory access, and its strong typing system. C also includes numerous standard libraries that provide pre-written functions for common scenarios like input/output and string manipulation.

Despite being around 50 years old, C remains a widely used programming language and is often used as a foundation for other languages and technologies. It has also been influential for many other programming languages, including C++, Java, and Python.

Prompt Adjustment/Guidance

The idea for this comes from a YouTuber (SuccessWithAI), who using additional guidance, got ChatGPT to produce a response that faired significantly better against detection.

SuccessWithAI demonstrates using specific prompts to get AI tools like ChatGPT to generate 100% human-like content.

The following is my attempt to recreate this with some additional steps. I have included each prompt in the order they were given to ChatGPT, with the first and second taken directly from SuccessWithAI and tested iteratively. To improve brevity, I only included the final response from the system.

Prompts

  1. When it comes to writing content, two factors are crucial, “perplexity” and “burstiness.” Perplexity measures the complexity of the text. Separately, burstiness compares the variations of sentences. Humans tend to write with greater burstiness, for example, with some longer or more complex sentences alongside shorter ones. AI sentences tend to be more uniform. Therefore, when writing the following content I will ask you to create, I need a good amount of perplexity and burstiness. Do you understand?
  2. Using the concepts written previously, rewrite this article with a high degree of perplexity and burstiness: <Text from original response>
  3. Rewrite with different word choices and more variation in the length of your sentences.
  4. Rewrite to include more short sentences.
  5. Rewrite using a more diverse word choice.

Final ChatGPT Response

“C, a highly versatile programming language, was crafted by the accomplished Dennis Ritchie at Bell Labs in the 1970s. C is a compiled language, indicating that code inscribed in C is translated into machine code, allowing for direct execution by a computer’s processor.

C is revered for its efficacy, portability, and low-level programming capabilities. Its succinct and uncomplicated set of keywords and operators make it easy to comprehend. C is frequently implemented in operating systems, embedded systems, and other low-level programming scenarios, where its low-level proficiencies and direct memory access are highly advantageous.

Furthermore, C encompasses a plethora of standard libraries that provide pre-written code for a variety of common tasks, like input/output and string manipulation. C also boasts the ability to define and manipulate data structures, support for pointers and direct memory access, and a robust typing system.

Despite being over four decades old, C remains a highly utilized programming language and serves as a foundation for numerous other languages and technologies. C has had a significant influence on a variety of other programming languages, including C++, Java, and Python, cementing its reputation as a pivotal language for programmers to acquire and master.

Test Results

Each alteration was run against the same four online AI detection tools as the baseline. These tools produced the following confidence levels the analyzed text was AI-generated. Content played little to no effect on the results.

The table above shows the confidence, as a percentage, of each tool (Copyleaks, Writer, Crossplag, Content at Scale) that the analyzed text is AI-generated.

Each set of tested content contained roughly the same information, but the differences in how the content was written in the text produced very different results in detection. Most importantly, though the set of detection systems used was not exhaustive, these tests seem to show that ChatGPT can create content that is not distinguishable from human-generated content when provided with the correct iterative prompts.

Wrapping Up

Based on the results above, we can see that tools for detecting AI-generated text are powerful, though there are certainly ways around it. The approaches above should also be seen as naïve and not the pinnacle of response or prompt alteration when bypassing detection. As technologies on both sides advance in this arms race, methodologies will likely need to adjust for those that seek to generate content using AI while having that content not be flagged as such.

Overall, I believe that applications such as ChatGPT and Copilot are excellent tools that could significantly improve the productivity of myself, other developers, and others from all industries. I also believe that tools will continue to see vast and growing utilization and that, unlike the last buzzword to hit the tech world, “Metaverse,” AI will not fizzle out as its utility is real. With the large investments from both Microsoft and Google, they seem to believe in this trend and the utility of these tools. It will be interesting to see which of these companies, if either, will end up on top of this larger-scale AI arms race.

Additionally, the YouTuber Tom Scott recently made a video entitled “I tried using AI. It scared me.” At the end of the video, he brought up this interesting point, great technology of years past has followed a sigmoid curve and depending on if these current tools are at the top of that curve or if they are just the beginning, similar to Napster and the rise of the utility of the internet, then we may be in for a wild ride in the years to come. Only time will tell what is to come for the future of AI.

Tom Scott describes his experience using ChatGPT.

Want More?

Curious about what else is happening during today’s tech revolution? Check out Microsoft RD Mark Michaelis’ blog on thriving in Industry 5.0.