ExplorerBiotechnologyBiology
Research PaperResearchia:202605.13028

Set-Aggregated Genome Embeddings for Microbiome Abundance Prediction

Younhun Kim

Abstract

Microbiome functions are encoded within the genes of the community-wide metagenome. A natural question is whether properties of a microbial community can be predicted just from knowing the raw DNA sequences of its members. In this work, we employ set-aggregated genome embeddings (SAGE) to predict community-level abundance profiles, exploiting the few-shot learning capabilities of genomic language models (GLMs). We benchmark this approach to show improved generalization on novel genomes compared ...

Submitted: May 13, 2026Subjects: Biology; Biotechnology

Description / Details

Microbiome functions are encoded within the genes of the community-wide metagenome. A natural question is whether properties of a microbial community can be predicted just from knowing the raw DNA sequences of its members. In this work, we employ set-aggregated genome embeddings (SAGE) to predict community-level abundance profiles, exploiting the few-shot learning capabilities of genomic language models (GLMs). We benchmark this approach to show improved generalization on novel genomes compared to classical bioinformatics approaches. Model ablation shows that community-level latent representations directly result in improved performance. Lastly, we demonstrate the benefits of intermediate transformations between latent representations and demonstrate the differences between GLM embedding choices.


Source: arXiv:2605.12286v1 - http://arxiv.org/abs/2605.12286v1 PDF: https://arxiv.org/pdf/2605.12286v1 Original Link: http://arxiv.org/abs/2605.12286v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
May 13, 2026
Topic:
Biotechnology
Area:
Biology
Comments:
0
Bookmark