GitHub

SDB-26 Standard Page

A benchmark for document authenticity, not marketing accuracy.

SDB-26 measures whether verification systems withstand real synthetic-document attacks.

SDB-26 defines reproducible evaluation for synthetic, edited, and screen-recaptured artifacts in operational conditions, with transparent metrics and schema-valid outputs.

Measurement Grid

SDB-26 is built around measurable, comparable outcomes:

Metric Meaning Why it matters
BR (Bypass Rate) Share of fraudulent/synthetic documents incorrectly approved Core indicator of control failure
CG (Confidence Gap) Mean confidence on wrongly approved cases Detects overconfident error patterns
GS (Generator Sensitivity) BR segmented by generator/model family Shows where systems break first
FPR (False Positive Rate) Share of genuine cases flagged as suspicious/fraud Tracks customer/business impact
ABR / ACG (v1.1 preview) Agent bypass patterns; ACG uses envelope compound_confidence on joint approvals Surfaces weak agent/instrumentation gating alongside document BR
TCR / HAR (v1.1 preview) Tool-call coverage and handoff audit rates on agent-mediated flows Whether logs reconstruct how the evidence package was built

Reference: STANDARD.md (§4.5 preview metrics), METHODOLOGY.md, results_schema.json.

Attack Levels

SDB-26 evaluates three escalating attack classes:

  • L1 — Standard Generation: direct AI-generated documents, no post-processing.
  • L2 — Advanced Diffusion: fine-tuning/editing/metadata manipulation scenarios.
  • L3 — Screen Recapture: synthetic/edited files recaptured through display pipelines.

L3 is a foundation layer in the methodology because recapture can remove or distort provenance cues while preserving plausible visual content.

Audit Trails

SDB-26 includes FRC and the FRC A2A Extension (docs/FRC_A2A_EXTENSION.md, v0.5.2) for auditable decisions across human-direct, agent-assisted, and managed-agent channels.

Core links

  • docs/FRC_OVERVIEW.md
  • docs/FRC_A2A_EXTENSION.md
  • docs/FRC_A2A_DEPLOYMENT_MAPPING.md

Highlights in v0.5.2

  • Compound routing combines document FRC with an agent_verdict posture, including INSUFFICIENT × PARTIALLY_ATTESTED → REVIEW and INSUFFICIENT × SUSPICIOUS → ESCALATE, with a decision tree so a bad capture plus a risky agent path is not reduced to “upload again”.
  • Normative L0 → agent_verdict mapping so PARTIALLY_ATTESTED is not an informal catch-all.
  • Confidence split: verdict_confidence (core payload) = document layer only; compound_confidence (envelope) = joint compound_verdict; published composition IDs (CC_MIN / CC_DOC_ONLY / CC_CUSTOM) support comparable benchmarks.
  • A2A Protocol alignment: optional a2a_correlation and schemas/a2a_v1_surfaces.json follow formal Task / TaskState shapes from the Agent2Agent (A2A) specification.
  • Threat model adds T6 (shadow connector / FRC-L0-CONNECTOR-OUT-OF-POLICY) and T7 (opaque secret–workload binding / FRC-L0-SECRET-BINDING-UNKNOWN).

Together this bridges document authenticity to agent-era traceability (instrumentation_trace, L0/L0-D, ABR / ACG / TCR / HAR where applicable).

FRC A2A Schemas

Machine-validatable artifacts:

  • schemas/frc_schema_v1_0_0.json — document-layer FRC.
  • schemas/frc_a2a_envelope_v0_2_0.json — audit envelope (agent_verdict, compound_verdict, compound_confidence, optional agent_layer_confidence, a2a_correlation).
  • schemas/a2a_v1_surfaces.json — A2A type surfaces for correlation fields.
  • examples/frc/, scripts/validate_frc_schemas.py — examples and validation.

Responsible Release

SDB-26 is published as a defender-oriented benchmark.

Public artifacts focus on taxonomy, measurement contracts, schema surfaces, and redacted examples that improve defensive evaluation quality. Operational evasion playbooks and attack-enabling parameter detail are intentionally excluded from open release.

Policy and release boundaries:

  • docs/RESPONSIBLE_RELEASE_POLICY.md
  • examples/l2e/ (redacted fixture examples only)
  • schemas/l2e_fixture_schema_v0_1_0.json

Reference Implementation

Practical implementation path:

  • Forensic packet collection workflow (collect_forensic_packet.py) for repeatable corpus acquisition pipelines.
  • Schema-valid decision artifacts using FRC/FRC A2A outputs and fixtures in this repository.

Related repo artifacts:

  • examples/frc/
  • tests/frc/
  • CHANGELOG.md — FRC A2A v0.5.1 / v0.5.2 and related schema notes.

Why Now

As AI generation quality and agent-mediated onboarding velocity rise, trust controls must move from static checks to measurable, reproducible evidence chains.

SDB-26 provides that measurement contract.