Architecture#

Quadrant IntegrityLens processes documents through a pipeline of extraction, analysis, and reporting.

flowchart TD
    A[PDF Input] --> B{Text layer OK?}
    B -- Yes --> C[Embedded Text Extraction<br/>~0.2s]
    B -- No --> D[OCR with PaddleOCR<br/>~25s]
    C --> E[Markdown with Page Markers]
    D --> E
    E --> F[Parse Structure<br/>Pages + Headings]
    F --> G[Run Scanners Concurrently]
    G --> H[Annotate Findings<br/>Page, Heading, Section]
    H --> I[Sort by Position]
    I --> J[Terminal Display]
    I --> K[PDF Report]
  • Text Extraction — how PDFs are converted to analysable text
  • Analysis — how scanners process the text and produce findings