How AI-generated text is poisoning the internet

This has been a wild year for AI. If you’ve spent much time online, you’ve probably bumped into images generated by AI systems like DALL-E 2 or Stable Diffusion, or jokes, essays, or other text written by ChatGPT, the latest incarnation of OpenAI’s large language model GPT-3.

Sometimes it’s obvious when a picture or a piece of text has been created by an AI. But increasingly, the output these models generate can easily fool us into thinking it was made by a human. And large language models in particular are confident bullshitters: they create text that sounds correct but in fact may be full of falsehoods. 

While that doesn’t matter if it’s just a bit of fun, it can have serious consequences if AI models are used to offer unfiltered health advice or provide other forms of important information. AI systems could also make it stupidly easy to produce reams of misinformation, abuse, and spam, distorting the information we consume and even our sense of reality. It could be particularly worrying around elections, for example. 

The proliferation of these easily accessible large language models raises an important question: How will we know whether what we read online is written by a human or a machine? I’ve just published a story looking into the tools we currently have to spot AI-generated text. Spoiler alert: Today’s detection tool kit is woefully inadequate against ChatGPT. 

But there is a more serious long-term implication. We may be witnessing, in real time, the birth of a snowball of bullshit. 

Large language models are trained on data sets that are built by scraping the internet for text, including all the toxic, silly, false, malicious things humans have written online. The finished AI models regurgitate these falsehoods as fact, and their output is spread everywhere online. Tech companies scrape the internet again, scooping up AI-written text that they use to train bigger, more convincing models, which humans can use to generate even more nonsense before it is scraped again and again, ad nauseam.

This problem—AI feeding on itself and producing increasingly polluted output—extends to images. “The internet is now forever contaminated with images made by AI,” Mike Cook, an AI researcher at King’s College London, told my colleague Will Douglas Heaven in his new piece on the future of generative AI models. 

“The images that we made in 2022 will be a part of any model that is made from now on.”