AI Basics

What Is a Token, Really? The Unit That Decides AI Cost, Memory, and Mistakes

Tokens are the building blocks behind every AI interaction, and knowing what they are helps you avoid billing surprises and understand why AI sometimes cuts off mid-answer.

Nathan Nobert
Nathan Nobertwith help from my agents, of course.
||6 min read

The Bill That Made No Sense

A client in Calgary forwarded me their first AI API usage report and thought the system was broken. They'd asked a short question. The AI had given a reply about the same length back. Yet the report showed 2,400 tokens billed.

The tool wasn't broken. What they hadn't realized was that each time they sent a new message, their chat interface automatically included the full conversation history above it. Every prior question and answer got re-sent along with the new one. A brief question at the end of a ten-message chat is actually carrying ten messages worth of content.

That's the thing about tokens. Once you understand what they are, a lot of confusing AI behavior starts to make sense.

The pricing. The memory limits. Why a long conversation costs more than a one-off question. Why AI sometimes gets cut off mid-sentence. All of it traces back to tokens.

What a Token Actually Is

A token is not a word. It's not a character either. It's a chunk of text, roughly the size of a short word or a common syllable.

AI models don't read the way you do. Before processing anything, they split all incoming text into these chunks and work from there.

Common short words like "the," "cat," "and," and "at" are each usually one token. Longer or less common words get split into multiple tokens. The word "antidisestablishmentarianism" might become six or seven tokens. Emojis often cost two to four tokens each, even though they're just one small character on your screen.

As a rough guide: 1,000 tokens is approximately 750 English words. Or think of it as about a page and a half of standard text. That's close enough to be useful when you're trying to estimate cost or figure out why a long document isn't fitting into a conversation.

Where Tokens Show Up in Your Day-to-Day

For most people using ChatGPT, Claude, or Copilot through a monthly subscription, tokens happen in the background. You don't see them and you don't pay per use. But the moment you or someone on your team starts building a workflow, integrating AI into your software, or using the API directly, tokens become the billing unit. Four places in particular are worth knowing.

1. Pricing

API pricing is charged per million tokens, split into input (what you send) and output (what the AI sends back). The exact rate depends on which model you're using. A common mid-range model might cost around $3 per million input tokens and $15 per million output tokens.

For a small business using AI to summarize a dozen documents a week, those numbers stay very small. But automated systems that process hundreds of requests a day need to be designed with token efficiency in mind.

The input cost matters especially if your prompts are long. A detailed system instruction that runs to 800 words gets re-billed every single time someone sends a message, because it gets included with every request.

2. The Context Window

The context window is the AI's short-term memory. It's the total number of tokens the model can hold in mind at once: your instructions, the conversation so far, any documents you've pasted in, and the answer it's about to write. When the conversation exceeds that limit, the model has to drop something. Usually the oldest parts of the conversation disappear first.

Context windows have grown dramatically in recent years. Most modern models can handle 100,000 tokens or more, which is roughly a full novel. But if you're working with large documents, long conversations, or multiple files at once, it's still worth knowing the limit exists. When an AI suddenly seems to "forget" something you told it earlier in a long session, a full context window is often the reason.

3. Rate Limits

AI providers set caps on how many tokens you can use per minute or per day at each pricing tier. If a workflow sends a lot of requests in a short burst, for example a batch of customer emails all processed at once, you can hit those limits and see errors until the window resets. This is rarely an issue for casual use, but it matters when you're building something automated.

4. Long Conversations Cost More Than Single Questions

This is the one that surprises people the most. In most chat interfaces and API integrations, every new message you send includes the full conversation history. Message one costs a few hundred tokens.

By message ten, each new reply is carrying all ten prior exchanges. The cost doesn't stay flat. It grows with every turn.

For a casual user on a subscription, this is invisible. For a business paying per token, it means that a well-designed workflow sends focused, short conversations rather than one long rambling thread. Starting a fresh conversation for each distinct task keeps costs predictable.

A Check You Can Do Right Now

If you want to see token counts in practice, both ChatGPT and Claude show usage stats when you use the API or developer tools. But even without that, you can get a feel for it on the OpenAI tokenizer page (platform.openai.com/tokenizer). Paste in any text and it shows you exactly how many tokens it becomes and highlights how the text gets split.

Paste in the same sentence in English, then in French, then in Chinese. You'll notice that the English version is usually the most token-efficient. Non-English languages often need more tokens to express the same content, which affects cost if you're working with multilingual text at scale.

The Honest Limits of This Explanation

The 750 words per 1,000 tokens rule of thumb works well for plain English text. It breaks down quickly in other cases. Code is denser, so it costs more tokens per line than prose.

Technical documents with lots of numbers and symbols run higher. Some languages, especially those that use characters outside the standard Latin alphabet, cost significantly more tokens per word.

Pricing also changes as models update. The rates I mentioned earlier reflect mid-2026 pricing for common models, but providers adjust their pricing fairly often as competition increases. If you're building something where cost matters, always check the current rate card for the specific model you're using.

The Short Version

Here's what to carry forward from this:

  • Tokens are text chunks, not words. About 750 English words equals 1,000 tokens.
  • Pricing is per token, billed separately for what you send and what you receive back.
  • The context window is the AI's memory limit. Long documents and long conversations consume it fast.
  • Every message in a chat carries the full conversation history. Costs grow as conversations get longer.
  • Plain English is the most token-efficient text type. Code, emojis, and non-English languages cost more.

If you're using a flat subscription like ChatGPT Plus or Claude Pro, none of this affects your bill directly. But it does explain why the AI sometimes forgets things, why it can seem slower on long conversations, and why it occasionally cuts off before finishing.

If you're building or evaluating any AI workflow that bills per use, token efficiency is one of the first things to think about. A poorly designed prompt that repeats context unnecessarily can multiply costs by three or four times for no benefit.

If you want a second opinion on how a workflow is set up, or you're not sure whether a per-token API or a flat subscription makes more sense for what you're trying to do, that's a good starting point for a free discovery call. We can usually answer that question in the first fifteen minutes.

Nathan Nobert
Nathan Nobertwith help from my agents, of course.Co-Founder & AI Consultant

Need help with AI?

Book a free AI audit and we’ll show you exactly where AI can save your business time and money.

Get Your Free AI Audit