text

Perplexity

A measure of how predictable text is to a language model — low perplexity is a strong AI signal.

Perplexity measures how surprising the next token in a sequence is to a language model. If the model expects token X with high probability and X is what appears, perplexity is low. AI-generated text typically has low perplexity because it's produced by sampling from the model's own probability distribution — by definition, the next token is what the model expected.

Human writing has higher and more variable perplexity. Humans choose unusual words, drop articles, use regional idioms, and make typos that LLMs avoid. AI text detectors like GPTZero and Originality.ai score perplexity directly.

SynthGuard's text humanizer raises perplexity through three mechanisms: phrase-replace (swaps high-probability LLM phrases for less-common alternatives), contractions (introduces forms LLMs underuse), and optional unicode tricks (replaces selected characters with visually identical look-alikes that produce unexpected tokens).

Burstiness is the other axis detectors use. The two together give a strong AI signal when both are low — and a strong human signal when both are raised.