Unicode Character Scanners#

These scanners detect special Unicode characters that AI models (ChatGPT, Copilot, etc.) frequently insert into text but that students almost never type manually. They run regardless of the selected language.

Scanner IDCharacterExampleConfidence
em-dashU+2014 — (em dash)“Text — more text” instead of “Text - more text”High
en-dash-word-joinU+2013 – between letters“word–joiner” instead of “word-joiner”High
smart-quotesU+201D " and U+2018 '“quoted” instead of “quoted”Medium / Low
ellipsisU+2026 … (horizontal ellipsis)“and so on…” instead of “and so on…”Medium
non-breaking-spaceU+00A0 (non-breaking space)Invisible — looks like a normal spaceMedium
invisible-spaceU+200B, U+200A, U+2009, U+202F, U+FEFFCompletely invisible zero-width charactersHigh
minus-signU+2212 − (minus sign)“5 − 3” instead of “5 - 3”Medium

Why these matter#

When students type text in a word processor, they use the standard keyboard characters: hyphens (-), straight quotes ("), three dots (...), and regular spaces. AI models, however, are trained on typographically polished text and tend to output Unicode variants of these characters.

A single em dash is not proof of AI usage. But a document full of em dashes, smart quotes, and non-breaking spaces — combined with other findings — is a strong signal.

Special cases#

  • U+201C " (left double quotation mark) is not flagged because it is the standard closing quotation mark in German typography.
  • U+2019 ’ (right single quotation mark) is not flagged because it appears in legitimate German contractions.
  • Ellipsis in tables of contents is filtered out to avoid false positives from dot leaders.