Architecture on Quadrant IntegrityLens

Text Extraction

Mon, 01 Jan 0001 00:00:00 +0000

Quadrant IntegrityLens uses a smart extraction strategy that balances speed and accuracy depending on the type of PDF.

Embedded text (fast path)#

Most PDFs created from Word processors have an embedded text layer. Extracting this text is very fast (~0.2 seconds) and produces high-quality results. This is the default path for most student submissions.
Broken text layer detection#

Some PDFs — particularly those generated by LaTeX — have a text layer that contains garbled characters. Quadrant IntegrityLens detects this automatically by checking for specific Unicode indicators (standalone diaeresis characters) that signal a broken text layer. When a broken text layer is detected, Quadrant IntegrityLens falls back to OCR automatically. No manual intervention is needed.

Mon, 01 Jan 0001 00:00:00 +0000

After text extraction, Quadrant IntegrityLens parses the document structure and runs scanners concurrently.

Structure parsing#

The extracted Markdown text is parsed to identify page boundaries (from  markers), headings (Markdown headings at any level), and sections (text between headings). This structure allows each finding to be annotated with a precise location: page number, heading, and surrounding section text.
Concurrent scanning#

All enabled scanners run concurrently on the full text. Each scanner is independent and focuses on a specific type of AI indicator. Scanners declare which languages they support — when you set --language, only matching scanners run. Language-independent scanners (Unicode and structural) always run.