SDB-26 Standard Page
A benchmark for document authenticity, not marketing accuracy.
SDB-26 defines reproducible evaluation for synthetic, edited, and screen-recaptured artifacts in operational conditions, with transparent metrics and schema-valid outputs.
Measurement Grid
SDB-26 is built around measurable, comparable outcomes:
| Metric | Meaning | Why it matters |
|---|---|---|
| BR (Bypass Rate) | Share of fraudulent/synthetic documents incorrectly approved | Core indicator of control failure |
| CG (Confidence Gap) | Mean confidence on wrongly approved cases | Detects overconfident error patterns |
| GS (Generator Sensitivity) | BR segmented by generator/model family | Shows where systems break first |
| FPR (False Positive Rate) | Share of genuine cases flagged as suspicious/fraud | Tracks customer/business impact |
| ABR / ACG (v1.1 preview) | Agent bypass patterns; ACG uses envelope compound_confidence on joint approvals |
Surfaces weak agent/instrumentation gating alongside document BR |
| TCR / HAR (v1.1 preview) | Tool-call coverage and handoff audit rates on agent-mediated flows | Whether logs reconstruct how the evidence package was built |
Reference: STANDARD.md (§4.5 preview metrics), METHODOLOGY.md, results_schema.json.
Attack Levels
SDB-26 evaluates three escalating attack classes:
- L1 — Standard Generation: direct AI-generated documents, no post-processing.
- L2 — Advanced Diffusion: fine-tuning/editing/metadata manipulation scenarios.
- L3 — Screen Recapture: synthetic/edited files recaptured through display pipelines.
L3 is a foundation layer in the methodology because recapture can remove or distort provenance cues while preserving plausible visual content.
Audit Trails
SDB-26 includes FRC and the FRC A2A Extension (docs/FRC_A2A_EXTENSION.md, v0.5.2)
for auditable decisions across human-direct, agent-assisted, and managed-agent channels.
Core links
docs/FRC_OVERVIEW.mddocs/FRC_A2A_EXTENSION.mddocs/FRC_A2A_DEPLOYMENT_MAPPING.md
Highlights in v0.5.2
-
Compound routing combines document FRC with an
agent_verdictposture, including INSUFFICIENT × PARTIALLY_ATTESTED → REVIEW and INSUFFICIENT × SUSPICIOUS → ESCALATE, with a decision tree so a bad capture plus a risky agent path is not reduced to “upload again”. -
Normative L0 → agent_verdict mapping so
PARTIALLY_ATTESTEDis not an informal catch-all. -
Confidence split:
verdict_confidence(core payload) = document layer only;compound_confidence(envelope) = jointcompound_verdict; published composition IDs (CC_MIN/CC_DOC_ONLY/CC_CUSTOM) support comparable benchmarks. -
A2A Protocol alignment: optional
a2a_correlationandschemas/a2a_v1_surfaces.jsonfollow formal Task / TaskState shapes from the Agent2Agent (A2A) specification. -
Threat model adds T6 (shadow connector /
FRC-L0-CONNECTOR-OUT-OF-POLICY) and T7 (opaque secret–workload binding /FRC-L0-SECRET-BINDING-UNKNOWN).
Together this bridges document authenticity to agent-era traceability
(instrumentation_trace, L0/L0-D, ABR / ACG / TCR / HAR where applicable).
FRC A2A Schemas
Machine-validatable artifacts:
schemas/frc_schema_v1_0_0.json— document-layer FRC.-
schemas/frc_a2a_envelope_v0_2_0.json— audit envelope (agent_verdict,compound_verdict,compound_confidence, optionalagent_layer_confidence,a2a_correlation). schemas/a2a_v1_surfaces.json— A2A type surfaces for correlation fields.examples/frc/,scripts/validate_frc_schemas.py— examples and validation.
Responsible Release
SDB-26 is published as a defender-oriented benchmark.
Public artifacts focus on taxonomy, measurement contracts, schema surfaces, and redacted examples that improve defensive evaluation quality. Operational evasion playbooks and attack-enabling parameter detail are intentionally excluded from open release.
Policy and release boundaries:
docs/RESPONSIBLE_RELEASE_POLICY.mdexamples/l2e/(redacted fixture examples only)schemas/l2e_fixture_schema_v0_1_0.json
Reference Implementation
Practical implementation path:
-
Forensic packet collection workflow (
collect_forensic_packet.py) for repeatable corpus acquisition pipelines. - Schema-valid decision artifacts using FRC/FRC A2A outputs and fixtures in this repository.
Related repo artifacts:
examples/frc/tests/frc/CHANGELOG.md— FRC A2A v0.5.1 / v0.5.2 and related schema notes.
Why Now
As AI generation quality and agent-mediated onboarding velocity rise, trust controls must move from static checks to measurable, reproducible evidence chains.
SDB-26 provides that measurement contract.