Back to Explorer
Research PaperResearchia:202601.29228[Biotechnology > Biology]

Beyond Conditional Computation: Retrieval-Augmented Genomic Foundation Models with Gengram

Huinan Xu

Abstract

Current genomic foundation models (GFMs) rely on extensive neural computation to implicitly approximate conserved biological motifs from single-nucleotide inputs. We propose Gengram, a conditional memory module that introduces an explicit and highly efficient lookup primitive for multi-base motifs via a genomic-specific hashing scheme, establishing genomic "syntax". Integrated into the backbone of state-of-the-art GFMs, Gengram achieves substantial gains (up to 14%) across several functional genomics tasks. The module demonstrates robust architectural generalization, while further inspection of Gengram's latent space reveals the emergence of meaningful representations that align closely with fundamental biological knowledge. By establishing structured motif memory as a modeling primitive, Gengram simultaneously boosts empirical performance and mechanistic interpretability, providing a scalable and biology-aligned pathway for the next generation of GFMs. The code is available at https://github.com/zhejianglab/Genos, and the model checkpoint is available at https://huggingface.co/ZhejiangLab/Gengram.


Source: arXiv:2601.22203v1 - http://arxiv.org/abs/2601.22203v1 PDF: https://arxiv.org/pdf/2601.22203v1 Original Article: View on arXiv

Submission:1/29/2026
Comments:0 comments
Subjects:Biology; Biotechnology
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Beyond Conditional Computation: Retrieval-Augmented Genomic Foundation Models with Gengram | Researchia