AI learns language from skewed sources. That could change how we humans speak – and think | Bruce Schneier

5 hours ago 8

Because of the way they are trained, large language models capture only a slice of human language. They’re trained on the written word, from textbooks to social media posts, and our speech as captured in movies and on television. These models have minimal access to the unscripted conversations we have face-to-face or voice-to-voice. This is the vast majority of speech, and a vital component of human culture.

There’s a risk to this. The increased use of large language models means we humans will encounter much more AI-generated text. We humans, in turn, will begin to adopt the linguistic patterns and behaviors of these models. This will affect not just how we communicate with one another, but also how we think about ourselves and what goes on around us. Our sense of the world may become distorted in ways we have barely begun to comprehend.

This will happen in many ways. One of the first effects we could see is in simple expression, much as texting and social media have resulted in us using shorter sentences, emojis instead of words, and much less punctuation. But with AI, the impacts may be more harmful, eroding courteousness and encouraging us to talk like bosses barking orders. A 2022 study found that children in households that used voice commands with tools like Siri and Alexa became curt when speaking with humans, often calling out “Hey, do X” and expecting obedience, especially from anyone whose voice resembled the default-female electronic voices. As we start to prompt chatbots and AI agents with more instructions, we may fall into the same habits.

Next, in the same way autocomplete has increased how much we use the 1,000 most common words in our vocabulary, talking with chatbots and reading AI-generated text may further constrict our speech. A recent University of Coruña study found that machine-generated language has a narrower range of sentence length, averaging 12-20 words, and a narrower vocabulary than human speech. Machine-generated text reads as smooth and polished, but it loses the meanders, interruptions and leaps of logic that communicate emotion.

Additionally, because large language models are primarily trained from written speech, they may not learn how to emulate the free-wheeling nature of live, natural speech. When told “I hate Beth!”, ChatGPT replies with an uninterruptable three-part formula of affirmation (“That’s completely valid”), invitation (“I’m here to listen”) and invitation (“What’s going on?”) far longer than any reply plausible in face-to-face dialog. “What’s Beth’s deal?!” elicits a bullet point list of queries that reads like a multiple choice exam question (“Is Beth * a celebrity? * a friend from school? * a fictitious character?”). No human speaks that way, at least not yet. But meeting such formulas repeatedly in a speech-like context may teach us to accept and use them, much as a child absorbs new speech patterns from spending time with a new person.

These influences will only increase with time. The writing large language models train on is increasingly produced by large language models themselves, creating a feedback loop in which they imitate their own inhuman patterns, even while teaching humans to imitate them too.

Broad use of large language models could also introduce confirmation bias, making us overconfident in our initial impulses and less open to other possible ideas – which is so vital to human discourse. Many chatbots are instructed to agree with our statements no matter how absurd, enthusiastically supporting half-formed or even incorrect notions and restating them as firm claims that we’re primed to agree with. When asked: “Cake is a healthy breakfast, right?” or “Is the post office plotting against me?”, this sycophancy can reinforce bias and even worsen psychosis. And the hyperconfident tone of AI-produced writing will also heighten impostor syndrome, making our natural, healthy doubt feel like an aberration or failing.

In my experience as a teacher, students who turn to generative AI for assignments often say they do so because they have trouble expressing what they think. The students don’t recognize that writing or speaking our thoughts is often how we realize what we think. Their unconfident and uncertain statements are actually the healthy human norm. But a large language model won’t turn vague first guesses into a well-formed critical analysis, or even ask helpful questions as a friend would; it will simply regurgitate those guesses, still unexamined, but in confident language.

We are also more vicious in social media posts and online chats than face-to-face. The well-documented online disinhibition effect encourages toxic language. Most of us have had the experience of venting ferocious rage about someone online, only to reconcile when we speak face-to-face or hear the warmth of a voice over the phone. While chatbots are trained to give sycophantic responses, they see humankind at our cruelest, learning about us from the only world where every flame war leaves an eternal written footprint, while the spoken conversations of forgiveness and reconciliation fade away. Their responses do not imitate our online aggression, but are still shaped by it, even in their rigid efforts to avoid it.

It’s easy to draw the wrong conclusions from a selective slice of a society’s communications. Medieval Norse sagas made us imagine a culture of mostly Viking warriors, since poets rarely described the farming majority. Chivalric romances focused on kings and courts, and long made us see the middle ages as a world of monarchies, erasing the many medieval republics. Statistically, we’ve been led to believe ancient Romans cared deeply about their republic, but 10% of all surviving Latin was written by one man, Cicero, whose work contains 70% of all surviving Roman uses of the word republic. Training language models on only certain human writings may introduce similar distortions. AI might make us seem more quarrelsome, as we are online. It might inflate the cultural significance of political topics primarily discussed on Twitter/X or Bluesky, or the massive topic-specific corpuses of LinkedIn and Goodreads.

Some large language models are being trained on human speech from movies and television shows, but that speech is still scripted, and disproportionately highlights certain contexts over others (for example, police dramas, fueled by stories of murder, make up a quarter of primetime television programming). We are not funny or hurtful or romantic the same way in real life as we are in sitcoms. At least one startup is offering to pay people to record their phone calls for AI-training purposes, but this remains a niche idea; anything large scale would cause massive privacy concerns.

We don’t pretend to know what the best solutions might be. But one has to imagine if there’s ingenuity to develop AI models, then surely there’s ingenuity to come up with a way to train them on informal human speech instead of us only at our most stylized, veiled, and sometimes worst. By excluding the overwhelming majority of language production on the planet – people talking, fully and naturally, to each other – these models are being trained to mirror everything but us at our most authentically human.

Bruce Schneier is a security technologist who teaches at the Harvard Kennedy School at Harvard University. Ada Palmer is a fantasy and science fiction novelist, futurist, and historian of technology and information at the University of Chicago

Read Entire Article