ExplorerGenomicsBiology
Research PaperResearchia:202601.26005

Intrinsic Limits of Read Trimming in Single-Stranded Bisulfite Sequencing

Yihan Fang

Abstract

Single-stranded whole-genome bisulfite sequencing (ssWGBS) enables DNA methylation profiling in low-input and highly fragmented samples, including cell-free DNA, but introduces stochastic enzymatic artifacts that complicate preprocessing and downstream interpretation. In post-bisulfite library construction, Adaptase-mediated tailing blurs the boundary between biological sequence and synthetic additions, rendering read trimming a persistent source of variability across analytical pipelines. We sh...

Submitted: January 26, 2026Subjects: Biology; Genomics

Description / Details

Single-stranded whole-genome bisulfite sequencing (ssWGBS) enables DNA methylation profiling in low-input and highly fragmented samples, including cell-free DNA, but introduces stochastic enzymatic artifacts that complicate preprocessing and downstream interpretation. In post-bisulfite library construction, Adaptase-mediated tailing blurs the boundary between biological sequence and synthetic additions, rendering read trimming a persistent source of variability across analytical pipelines. We show that this variability reflects an intrinsic limit of per-read boundary inference rather than an algorithmic shortcoming: boundary localization is fundamentally asymmetric between paired-end reads, with Read 2 exhibiting kinetically structured artifacts that support constrained read-level inference, while apparent contamination in Read 1 arises conditionally from geometry-driven read-through events and is not well-defined at the single-read level. Even within Read 2, bisulfite-induced compositional degeneracy creates an indistinguishable regime in which genomic and synthetic origins share support under the same observable sequence evidence, implying a strictly positive Bayes error under any deterministic per-read decision rule and placing a fundamental limit on per-read boundary fidelity. By explicitly characterizing these limits, we reframe read trimming in ssWGBS as a constrained inference problem and introduce a conservative framework that operates only where supported by observable evidence (including short-range nucleotide texture), exposes interpretable trade-offs between genomic retention and residual artifact risk, and avoids forced resolution where boundaries are intrinsically unresolvable. Together, these results clarify why fixed trimming heuristics persist in practice and provide a principled foundation for uncertainty-aware preprocessing in ssWGBS.


Source: arXiv:2601.19002v1 - http://arxiv.org/abs/2601.19002v1 PDF: https://arxiv.org/pdf/2601.19002v1 Original Link: http://arxiv.org/abs/2601.19002v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jan 26, 2026
Topic:
Genomics
Area:
Biology
Comments:
0
Bookmark
Intrinsic Limits of Read Trimming in Single-Stranded Bisulfite Sequencing | Researchia