SDB-26 Standard Page

A benchmark for document authenticity, not marketing accuracy.

SDB-26 measures whether verification systems withstand real synthetic-document attacks.

SDB-26 defines reproducible evaluation for synthetic, edited, and screen-recaptured artifacts in operational conditions, with transparent metrics and schema-valid outputs.

Measurement Grid

SDB-26 is built around measurable, comparable outcomes:

Metric	Meaning	Why it matters
BR (Bypass Rate)	Share of fraudulent/synthetic documents incorrectly approved	Core indicator of control failure
CG (Confidence Gap)	Mean confidence on wrongly approved cases	Detects overconfident error patterns
GS (Generator Sensitivity)	BR segmented by generator/model family	Shows where systems break first
FPR (False Positive Rate)	Share of genuine cases flagged as suspicious/fraud	Tracks customer/business impact
ABR / ACG (v1.1 preview)	Agent bypass patterns; ACG uses envelope `compound_confidence` on joint approvals	Surfaces weak agent/instrumentation gating alongside document BR
TCR / HAR (v1.1 preview)	Tool-call coverage and handoff audit rates on agent-mediated flows	Whether logs reconstruct how the evidence package was built

Reference: STANDARD.md (§4.5 preview metrics), METHODOLOGY.md, results_schema.json.

Attack Levels

SDB-26 evaluates three escalating attack classes:

L1 — Standard Generation: direct AI-generated documents, no post-processing.
L2 — Advanced Diffusion: fine-tuning/editing/metadata manipulation scenarios.
L3 — Screen Recapture: synthetic/edited files recaptured through display pipelines.

L3 is a foundation layer in the methodology because recapture can remove or distort provenance cues while preserving plausible visual content.

Audit Trails

SDB-26 includes FRC and the FRC A2A Extension (docs/FRC_A2A_EXTENSION.md, v0.5.2) for auditable decisions across human-direct, agent-assisted, and managed-agent channels.

Core links

docs/FRC_OVERVIEW.md
docs/FRC_A2A_EXTENSION.md
docs/FRC_A2A_DEPLOYMENT_MAPPING.md

Highlights in v0.5.2

Compound routing combines document FRC with an agent_verdict posture, including INSUFFICIENT × PARTIALLY_ATTESTED → REVIEW and INSUFFICIENT × SUSPICIOUS → ESCALATE, with a decision tree so a bad capture plus a risky agent path is not reduced to “upload again”.
Normative L0 → agent_verdict mapping so PARTIALLY_ATTESTED is not an informal catch-all.
Confidence split: verdict_confidence (core payload) = document layer only; compound_confidence (envelope) = joint compound_verdict; published composition IDs (CC_MIN / CC_DOC_ONLY / CC_CUSTOM) support comparable benchmarks.
A2A Protocol alignment: optional a2a_correlation and schemas/a2a_v1_surfaces.json follow formal Task / TaskState shapes from the Agent2Agent (A2A) specification.
Threat model adds T6 (shadow connector / FRC-L0-CONNECTOR-OUT-OF-POLICY) and T7 (opaque secret–workload binding / FRC-L0-SECRET-BINDING-UNKNOWN).

Together this bridges document authenticity to agent-era traceability (instrumentation_trace, L0/L0-D, ABR / ACG / TCR / HAR where applicable).

FRC A2A Schemas

Machine-validatable artifacts:

schemas/frc_schema_v1_0_0.json — document-layer FRC.
schemas/frc_a2a_envelope_v0_2_0.json — audit envelope (agent_verdict, compound_verdict, compound_confidence, optional agent_layer_confidence, a2a_correlation).
schemas/a2a_v1_surfaces.json — A2A type surfaces for correlation fields.
examples/frc/, scripts/validate_frc_schemas.py — examples and validation.

Responsible Release

SDB-26 is published as a defender-oriented benchmark.

Public artifacts focus on taxonomy, measurement contracts, schema surfaces, and redacted examples that improve defensive evaluation quality. Operational evasion playbooks and attack-enabling parameter detail are intentionally excluded from open release.

Policy and release boundaries:

docs/RESPONSIBLE_RELEASE_POLICY.md
examples/l2e/ (redacted fixture examples only)
schemas/l2e_fixture_schema_v0_1_0.json

Reference Implementation

Practical implementation path:

Forensic packet collection workflow (collect_forensic_packet.py) for repeatable corpus acquisition pipelines.
Schema-valid decision artifacts using FRC/FRC A2A outputs and fixtures in this repository.

Related repo artifacts:

examples/frc/
tests/frc/
CHANGELOG.md — FRC A2A v0.5.1 / v0.5.2 and related schema notes.

Why Now

As AI generation quality and agent-mediated onboarding velocity rise, trust controls must move from static checks to measurable, reproducible evidence chains.

SDB-26 provides that measurement contract.