Back to Explorer
Research PaperResearchia:202601.26005[Genomics > Biology]

Intrinsic Limits of Read Trimming in Single-Stranded Bisulfite Sequencing

Yihan Fang

Abstract

Single-stranded whole-genome bisulfite sequencing (ssWGBS) enables DNA methylation profiling in low-input and highly fragmented samples, including cell-free DNA, but introduces stochastic enzymatic artifacts that complicate preprocessing and downstream interpretation. In post-bisulfite library construction, Adaptase-mediated tailing blurs the boundary between biological sequence and synthetic additions, rendering read trimming a persistent source of variability across analytical pipelines. We show that this variability reflects an intrinsic limit of per-read boundary inference rather than an algorithmic shortcoming: boundary localization is fundamentally asymmetric between paired-end reads, with Read 2 exhibiting kinetically structured artifacts that support constrained read-level inference, while apparent contamination in Read 1 arises conditionally from geometry-driven read-through events and is not well-defined at the single-read level. Even within Read 2, bisulfite-induced compositional degeneracy creates an indistinguishable regime in which genomic and synthetic origins share support under the same observable sequence evidence, implying a strictly positive Bayes error under any deterministic per-read decision rule and placing a fundamental limit on per-read boundary fidelity. By explicitly characterizing these limits, we reframe read trimming in ssWGBS as a constrained inference problem and introduce a conservative framework that operates only where supported by observable evidence (including short-range nucleotide texture), exposes interpretable trade-offs between genomic retention and residual artifact risk, and avoids forced resolution where boundaries are intrinsically unresolvable. Together, these results clarify why fixed trimming heuristics persist in practice and provide a principled foundation for uncertainty-aware preprocessing in ssWGBS.


Source: arXiv:2601.19002v1 - http://arxiv.org/abs/2601.19002v1 PDF: https://arxiv.org/pdf/2601.19002v1 Original Link: http://arxiv.org/abs/2601.19002v1

Submission:1/26/2026
Comments:0 comments
Subjects:Biology; Genomics
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!