How does tokenization work?
Before a model processes text, a tokenizer splits it into tokens using a learned vocabulary. Most modern models use subword tokenization (variants of byte-pair encoding), which means common words are usually a single token while rare words, long words, code, and non-English text get broken into several pieces.
A few practical consequences fall out of this. Common English words like 'the' or 'time' are one token each. A word like 'tokenization' may be two or three tokens. Whitespace and punctuation count too — a leading space is often part of a token. Emoji and many non-Latin scripts cost more tokens per visible character, so the same sentence in, say, Japanese or Arabic can use noticeably more tokens than its English equivalent.
The 4-characters-per-token rule is only a planning heuristic. For anything cost-sensitive, count real tokens with the provider's own tooling rather than estimating from word count.