ExplorerBiotechnologyBiology
Research PaperResearchia:202605.04020

CRC-Screen: Certified DNA-Synthesis Hazard Screening Under Taxonomic Shift

Najmul Hasan

Abstract

DNA-synthesis providers screen incoming orders by searching the requested sequence against curated hazard lists. We show that this baseline collapses to a 100% false-flag rate when the hazardous sequence comes from a taxonomic family absent from the reference set: under Conformal Risk Control's certified miss-rate constraint, a low-discrimination signal forces the threshold below the entire test-benign mass. We compose three signals derived from a synthesis order's public annotation: $k$-mer Jac...

Submitted: May 4, 2026Subjects: Biology; Biotechnology

Description / Details

DNA-synthesis providers screen incoming orders by searching the requested sequence against curated hazard lists. We show that this baseline collapses to a 100% false-flag rate when the hazardous sequence comes from a taxonomic family absent from the reference set: under Conformal Risk Control's certified miss-rate constraint, a low-discrimination signal forces the threshold below the entire test-benign mass. We compose three signals derived from a synthesis order's public annotation: kk-mer Jaccard similarity to known toxins, the trimmed-mean score of a five-LLM judge panel, and cosine similarity to clustered embedding centroids. Fused under a monotone logistic aggregator and calibrated by Conformal Risk Control, the resulting screener certifies E[FNR]α\mathbb{E}[\mathrm{FNR}] \le α. Across ten leave-one-taxonomic-family-out folds at α=0.05α=0.05 on UniProt KW-0800 reviewed toxins, the calibrated screener achieves 0% test miss rate on every fold and 0% test false-flag rate on nine of ten folds. The bound's finite-sample slack 1/(ncal+1)1/(n_{\mathrm{cal}}+1) caps the certifiable miss rate at 1.77% on our 200-hazard subsample; reaching procurement-grade α=103α=10^{-3} requires an 18×18\times larger calibration set, which the full reviewed UniProt KW-0800 corpus is large enough to deliver. The binding constraint on certifiable DNA-synthesis screening is calibration data, not algorithms. Code: https://github.com/najmulhasan-code/crc-screen


Source: arXiv:2605.00074v1 - http://arxiv.org/abs/2605.00074v1 PDF: https://arxiv.org/pdf/2605.00074v1 Original Link: http://arxiv.org/abs/2605.00074v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
May 4, 2026
Topic:
Biotechnology
Area:
Biology
Comments:
0
Bookmark
CRC-Screen: Certified DNA-Synthesis Hazard Screening Under Taxonomic Shift | Researchia