How Large Language Models Actually Work (A Plain-English Guide)
A clear, jargon-free explanation of how large language models like ChatGPT and Claude actually work — from tokens and training to why they predict the next word.
When you type a question into ChatGPT or Claude and get back a fluent, useful answer, it can feel like there’s a little mind on the other end. There isn’t. Underneath is a large language model — a piece of software that has learned, from an enormous amount of text, to predict what words tend to come next.
That one idea explains more about how large language models work than any amount of sci-fi framing. These systems are pattern machines of staggering scale, not databases and not conscious beings. Once you see them that way, their strengths and their famous failures both start to make sense.
This guide walks through the whole pipeline in plain English: how text becomes tokens, how training teaches the model patterns, why “predict the next word” produces coherent paragraphs, and where the limits come from. No math degree required.
The one-sentence version
A large language model is a program that takes the text you give it and predicts the most likely next chunk of text, over and over, until it has produced a complete response.
That’s genuinely the core loop. Everything else — the helpfulness, the coding ability, the occasional confident nonsense — comes out of doing that one thing extremely well, at a scale that’s hard to picture. The model isn’t looking up an answer. It’s generating one piece by piece, guided by patterns it absorbed during training.
The word “large” is doing real work in the name. These models are trained on a huge slice of human writing and contain billions of internal numbers (called parameters) that encode the patterns they found. The size is part of why they feel capable — and part of why they’re expensive to build and run.
It helps to compare this to how you might finish a familiar phrase. If someone says “the early bird catches the…”, your brain supplies “worm” without effort, because you’ve seen that pattern countless times. A language model does something similar, but across the whole breadth of written language and at a level of nuance that lets it continue not just clichés but arguments, code, and explanations. The difference is scale and subtlety, not a fundamentally different trick.
Step one: turning words into tokens
Models don’t read letters or whole words the way we do. They work with tokens — short chunks of text. A token might be a whole common word like “the,” a piece of a longer word like “ing,” a space, or a punctuation mark. A rough rule of thumb in English is that a token averages around three-quarters of a word, though it varies.
So a sentence like “Large language models are useful” gets broken into a handful of tokens before the model sees it. Each token is converted into a list of numbers — a vector — that represents its meaning and how it relates to other tokens. This numerical form is what the model actually computes with.
Why does this matter to you as a user? Two practical reasons:
- Pricing and limits are measured in tokens, not words. When a tool talks about input and output costs or a maximum length, it’s counting tokens.
- How much the model can “hold at once” is a token limit. That’s the context window, and it shapes how much of a long document or conversation the model can consider. We dig into that in context windows explained.
Tokenization sounds like a technical footnote, but it’s the doorway between human language and the math inside the model.
There’s a subtle consequence worth knowing. Because the model thinks in tokens rather than letters, it can stumble on tasks that seem trivial to us — like counting the letters in a word or spotting that two words rhyme. It never “sees” the individual characters the way you do; it sees token chunks. That’s why an otherwise brilliant model can occasionally fumble a spelling or letter-counting puzzle. It’s not stupidity, it’s a side effect of how the input is chopped up.
Different languages also tokenize differently. Text in some languages breaks into more tokens than the equivalent English, which affects both cost and how much fits in the model’s working memory. None of this changes the big picture, but it explains a few oddities you might otherwise find baffling.
Step two: training on a mountain of text
A fresh model knows nothing. Training is how it learns. The dominant method is beautifully simple to describe: show the model a passage of text with the next chunk hidden, ask it to guess what comes next, then nudge its internal numbers slightly toward the right answer. Repeat this across a vast amount of text, an astronomical number of times.
Each tiny correction is small. But run it across an enormous body of writing — books, articles, websites, code, conversations — and the model gradually internalizes the patterns of language. Grammar, facts, writing styles, reasoning steps, the structure of an argument, how code is formatted: all of it gets absorbed as statistical regularities, not as stored sentences.
This first phase is called pretraining, and it’s where most of the raw capability comes from. The result is a model that’s good at continuing text but not yet shaped into a helpful assistant.
A natural question is: does the model memorize its training text? Mostly, no. There’s far too much text to store verbatim, and the model has a fixed (if large) number of parameters. Instead it compresses what it sees into general patterns. It learns that questions tend to be followed by answers, that recipes have ingredients and steps, that polite emails open and close in certain ways. It can reproduce very common phrases it saw repeatedly, but for the most part it’s generalizing, not retrieving. This is exactly why it can write a sentence no human has ever written before — it’s composing from patterns, not copying from a file.
This also explains the training cutoff. A model’s knowledge is essentially frozen at the point its training data ends. Events after that date simply aren’t in the patterns it learned, which is why an unconnected model can’t tell you about something that happened last week. Some products patch this gap by giving the model live web access, but the base model itself has a knowledge horizon.
From raw model to helpful assistant
A pretrained model will happily continue any text, including unhelpful or unsafe directions. So labs add more steps:
- Fine-tuning on examples of good question-and-answer behavior teaches the model to act like a helpful assistant rather than a generic text-continuer.
- Alignment techniques — often using human feedback on which responses are better — steer the model toward answers that are helpful, honest, and safe.
This is why ChatGPT and Claude feel like assistants instead of autocomplete on steroids. The assistant personality is a learned layer on top of the base prediction engine. If you want the full pipeline, the major model providers publish overviews — OpenAI’s developer documentation is one accessible starting point.

Step three: predicting the next token, on repeat
Here’s where the magic appears to happen. When you send a prompt, the model processes your tokens and produces a probability for every possible next token — essentially a ranked guess of what should come next. It picks one, adds it to the text, and then repeats the whole process with the new, slightly longer text.
Word by word (token by token, really), a complete answer takes shape. The model isn’t planning the whole paragraph in advance the way a person might outline an essay. It’s making a very informed next-step guess, thousands of times in a row, with each guess conditioned on everything that came before.
A setting called temperature controls how adventurous those picks are. Low temperature makes the model favor the single most likely token, producing safe, consistent output. Higher temperature lets it choose less likely options, producing more varied and creative — sometimes weirder — results. That’s why the same prompt can give slightly different answers each time.
This loop is also why the models are so good at format. Ask for a bulleted list, a table, or a polite email, and the model has seen millions of those patterns. It “knows” what one looks like statistically, so it produces a convincing version.
It also explains why prompts matter so much. Your prompt is the starting text that the whole prediction process is conditioned on. A vague prompt gives the model little to anchor to, so it falls back on generic, average-sounding patterns. A specific prompt — with context, a clear task, and an example or two — steers the predictions toward exactly what you want. You’re not just asking a question; you’re setting up the conditions for the next-token guesses. That’s the entire reason prompt writing is a skill worth learning.
Does the model “think”?
It’s tempting to assume that because the output reads like reasoning, there’s reasoning happening inside. The honest answer is nuanced. The model does perform genuine computation that lets it combine ideas, follow multi-step logic, and solve problems it wasn’t explicitly taught. But it does this in service of predicting plausible text, not because it has goals or understanding in the human sense. When you ask it to “think step by step,” you’re nudging it to generate the intermediate steps as text — and producing those steps often genuinely improves the final answer, because each step becomes context for the next. Useful? Absolutely. The same as human thought? Not quite.
Why they sometimes confidently make things up
Because a model generates the most plausible-sounding continuation rather than retrieving a verified fact, it can produce text that reads perfectly but is wrong. This is called hallucination, and it’s not a bug in the usual sense — it’s a direct consequence of how the system works.
The model has no built-in sense of “I don’t actually know this.” It will fill a gap with the most likely-sounding tokens, and likely-sounding is not the same as true. A made-up citation or a confidently wrong date comes out in the same smooth, authoritative voice as a correct answer.
Knowing this changes how you should use these tools:
- Treat factual claims as drafts to verify, especially names, numbers, quotes, and citations.
- Give the model the source material when accuracy matters, so it’s working from your text rather than its memory.
- Ask it to show its reasoning, which makes errors easier to spot.
We cover the causes and fixes in detail in why AI models hallucinate. The short version: fluent does not mean factual.
It’s worth sitting with why this is so hard to fully eliminate. The model’s whole job is to produce the most plausible next tokens. Most of the time, plausible and true line up, because the training text was largely accurate and the patterns reflect that. But when the model is asked about something obscure, something past its training cutoff, or something that simply doesn’t exist, it doesn’t hit a wall and stop. It keeps generating plausible text, because that’s all it knows how to do. The result is an answer that has the shape of truth without the substance. Better training and clever techniques reduce how often this happens, but the underlying tendency is baked into the approach.
What these models are not
Clearing up a few common misconceptions makes everything else click:
- They are not a search engine. A base model isn’t looking things up live; it’s generating from learned patterns. (Some products bolt on real web search, but that’s an added feature, not the model itself.)
- They don’t have a perfect memory. Within one conversation they only “remember” what fits in the context window. Close the chat and that working memory is gone unless the product saves it separately.
- They are not conscious or self-aware. They model language, not experience. The first-person voice is a learned style, not an inner life.
- They are not always up to date. A model’s core knowledge is frozen at its training cutoff unless it’s connected to live tools.
Holding these in mind keeps your expectations calibrated. You get a powerful, fast, flexible text engine — not an oracle.
Where they genuinely shine
It’s easy to read a list of limitations and conclude the technology is fragile. It isn’t. Once you understand what a language model actually is, you can point it at the tasks it’s exceptionally good at:
- Transforming text you provide. Summarizing a document, rewriting in a different tone, translating, reformatting, extracting structure — anything where the source material is right there and the model just has to reshape it. Errors are rare here because the model isn’t relying on memory.
- Drafting from a blank page. First drafts of emails, outlines, code, posts, and ideas. The model is a tireless starting point; you bring the judgment and the edits.
- Explaining and tutoring. Breaking down a concept at the level you ask for, answering follow-up questions, and approaching a topic from different angles until it clicks.
- Brainstorming. Generating lots of options quickly, which you then filter. Quantity and variety are strengths of a next-token machine.
The common thread is that these tasks play to generation and pattern-matching while keeping a human in charge of accuracy and final calls. Use the model for the heavy lifting of producing and reshaping text, and you’ll get enormous value with few nasty surprises.
A simple rule of thumb captures it: lean on the model when being plausible and well-formed is most of the job, and stay hands-on when being exactly right is the job. Drafting a paragraph, restructuring notes, suggesting ten headlines — plausibility is enough, and you’ll polish from there. Stating a medical dose, a legal deadline, or a financial figure — exactness is everything, and that’s your responsibility to confirm. The same tool serves both, as long as you know which mode you’re in.
Putting it together
So how do large language models work? They convert your words into tokens, draw on patterns learned from a vast amount of training text, and generate a response one token at a time by repeatedly predicting what’s most likely to come next. A layer of fine-tuning and alignment turns that raw prediction engine into the helpful assistant you actually chat with.
That single framing — sophisticated next-token prediction — explains both why they’re so capable and why they sometimes fail in characteristic ways. They’re brilliant at producing plausible, well-structured language, and they need a human’s judgment for anything that has to be true.
If you take one mental model away, make it this: a language model is a fluent, well-read assistant with no fact-checker and no memory beyond the current conversation. That picture predicts its behavior remarkably well. It explains the smooth prose, the occasional confident error, the forgetting in long chats, and the knowledge horizon. Keep that image in mind and the tool stops being mysterious.
Once the mechanics stop feeling like magic, you use these tools better: you lean on them for drafting, transforming, and explaining, and you stay in the loop for facts and final decisions.
Want more plain-English explainers like this one? Join the Internet 101 newsletter for one useful email a week. Next, see how memory limits shape what AI can do in context windows explained, or read the broader background on large language models.
Liked this guide? Get the next one free.
One practical email on AI and the modern internet — new explainers, tool picks, and how-tos. No hype, no spam.
Join curious builders learning AI the practical way. No spam, ever.
Keep reading
Claude Fable 5 Explained: Anthropic's Mythos-Class Model
What Claude Fable 5 is, where it fits in Anthropic's Claude 5 lineup, its capabilities and safeguards, pricing, and how it differs from Claude Code.
Why AI Models Hallucinate (And How to Reduce It)
Why AI models make things up, what 'hallucination' really means, and practical ways to reduce wrong answers in your own everyday use.
Small Language Models: When Smaller Beats Bigger
Why small language models are having a moment — faster, cheaper, private, and good enough for many tasks. When to choose one over a frontier model.