10 Biases AI Learned From Our Writing

10 Biases AI Learned From Our Writing

We’ve spent years teaching machines to understand us, feeding them a colossal digital library of everything we’ve ever written online. By 2025, these Large Language Models (LLMs) are more than just tools; they’re becoming mirrors. And the reflection they show us is… complicated. They’ve learned our language, but they’ve also inherited our bad habits, our blind spots, and our deeply ingrained biases.

The code itself isn’t prejudiced. The bias comes from the data—the trillions of words from blogs, books, social media, and news articles that form the AI’s “worldview”. It’s a digital tapestry woven with every beautiful and ugly thread of human communication. Here are 10 of the most significant linguistic biases that have been baked into our AI companions.

1. Gendered Professions and Roles

This is one of the oldest and most stubborn biases. Because our historical and current writing often associates certain jobs with specific genders, the AI does too. It has read countless texts where doctors are “he” and nurses are “she”, where engineers are male and elementary school teachers are female.

The Telltale Sign: Prompt the AI with a gender-neutral sentence and watch it assign a gender in the follow-up. For example:

  • Prompt: “The surgeon briefed the assistant before the operation”.
  • Likely AI Assumption: The surgeon is a man, the assistant a woman. If you ask, “What did she say next”? the AI will almost invariably assume “she” refers to the assistant.

This reinforces outdated stereotypes, subtly shaping narratives and user perceptions about who belongs in what role.

2. Western-Normativity: The “Default” Human

When you ask an AI to describe a “typical holiday meal” or a “traditional wedding”, what do you get? More often than not, it’s a turkey dinner and a white dress. The overwhelming majority of the AI’s training data comes from North America and Europe, creating a powerful cultural default.

The Telltale Sign: Unless you add a specific cultural qualifier (e.g., “a traditional Nigerian wedding”), the model assumes a Western, often American, context. Concepts of family, law, celebration, and even the seasons are skewed towards a Western framework, marginalizing the experiences of the global majority.

3. Anglocentric Conceptual Flattening

English is the lingua franca of the internet, which means it dominates AI training data. This has a curious side effect: concepts that don’t have a clean, one-to-one translation into English get “flattened”. The AI struggles with the linguistic and cultural nuance of words that represent unique feelings or social structures.

The Telltale Sign: Ask an AI to explain a concept like the Portuguese saudade (a deep, melancholic longing) or the Danish hygge (a cozy, convivial contentment). It will likely give you a dictionary definition—”a feeling of longing” or “coziness”—but fail to capture the deep cultural resonance, because it learned the concept through the imperfect lens of English translations.

4. Temporal Bias: The Tyranny of the Recent

AI models are trained on the internet as it exists now. This means content from the last 5-10 years is vastly overrepresented compared to digitized content from, say, the 1980s or the early 2000s. The AI might know every detail about a viral 2024 TikTok trend but offer a surprisingly shallow summary of a major historical event from the 20th century.

The Telltale Sign: The AI treats recent cultural phenomena with the same “weight” as long-standing historical facts. This recency bias creates a skewed perspective of history, where the internet era is the most detailed, important, and influential period of all time.

5. Grammatical Schizophrenia

AI has learned grammar from two conflicting sources: formal, prescriptivist guides (like style manuals and grammar websites) and the messy, evolving, descriptivist reality of how people actually write online (forums, social media). This creates a strange internal conflict.

The Telltale Sign: The same model might rigidly “correct” you for ending a sentence with a preposition (an outdated rule) in one response, while using hyper-casual internet slang like “iykyk” (if you know, you know) in the next. It hasn’t learned the sophisticated social cues for when to be a grammar stickler and when to be casual.

6. The North American English Default

Even within the English-speaking world, there’s a clear pecking order. AI defaults to North American English in spelling, vocabulary, and idioms.

The Telltale Sign: It will consistently use color, center, and organize over their British counterparts (colour, centre, organise). It will refer to the season as “fall” instead of “autumn” and talk about “sneakers” instead of “trainers”. While not overtly harmful, this erases the rich diversity of Global Englishes and reinforces American linguistic dominance.

7. Default Able-Bodied Language

Our everyday language is riddled with ableist metaphors, and the AI has learned them all. We use phrases like “turn a blind eye”, “that idea is lame”, “crippling debt”, or “fell on deaf ears”. The AI reproduces these idioms without understanding their origins or the harm they can perpetuate.

The Telltale Sign: When generating descriptive text, the AI’s default perspective is that of an able-bodied person. It describes experiences of walking, seeing, and hearing as universal, inadvertently marginalizing disabilities by treating them as deviations from the norm.

8. Toxic Positivity

In an effort to prevent models from generating harmful, negative, or dangerous content, developers have fine-tuned them to be relentlessly helpful and agreeable. This often results in a bias towards positivity, even when it’s inappropriate.

The Telltale Sign: Ask the AI for the “cons” or “disadvantages” of a popular idea. It will often hedge, soften the criticism, or immediately pivot to list the positive aspects. It struggles to engage with legitimate critique, offering platitudes instead of nuanced analysis and creating a sanitized, unrealistically cheerful communication style.

9. Ideological Echo Chambers

The internet is not a level playing field. Certain communities—political, social, or hobbyist—are far more prolific in generating text than others. An AI trained on this data can inadvertently absorb the specific jargon, arguments, and worldview of the most vocal online tribes.

The Telltale Sign: The AI may present a niche view from a high-volume subreddit or political forum as a mainstream opinion. It doesn’t distinguish between a widely held belief and one that is simply “loud” online, amplifying echo chambers and potentially hardening ideological divides.

10. The Assumption of Formality in Non-Western Languages

While English data is a mix of formal and very informal, training data for many other languages often comes from more formal sources like government websites, encyclopedias, and official news publications. This can lead to a strange linguistic mismatch.

The Telltale Sign: When asked to generate casual dialogue in a language like Japanese or Korean, the AI might produce text that is overly polite, stilted, or formal. It misses the complex systems of honorifics and social registers used in everyday conversation because its “schooling” in that language was limited to textbooks and official documents.


The Mirror’s Lesson

Spotting these biases isn’t about blaming the machine. An LLM is, for now, a reflection of its teachers—all of us. These linguistic quirks, cultural assumptions, and embedded prejudices are a direct consequence of the data we’ve created. As we move forward, the challenge for developers, linguists, and writers is twofold: to curate more diverse and equitable training data, and for us, the users, to interact with these tools critically. By understanding the biases baked into our AI, we get a clearer, and sometimes uncomfortable, look at the biases baked into ourselves.