ExplorerPharmaceutical ResearchBiochemistry
Research PaperResearchia:202605.30016

PROTOCOL: Late Interaction Retrieval for Protein Homolog Search

Gabrielle Cohn

Abstract

Protein homology search underlies function annotation, structure prediction, and evolutionary analysis, but remains challenging in the "twilight zone," where global sequence similarity is weak and classical alignment methods lose sensitivity. Protein language models provide context-aware representations that could improve alignment sensitivity in this regime. However, prior protein embedding-based retrieval pipelines often pool these representations into a single vector, potentially obscuring lo...

Submitted: May 30, 2026Subjects: Biochemistry; Pharmaceutical Research

Description / Details

Protein homology search underlies function annotation, structure prediction, and evolutionary analysis, but remains challenging in the "twilight zone," where global sequence similarity is weak and classical alignment methods lose sensitivity. Protein language models provide context-aware representations that could improve alignment sensitivity in this regime. However, prior protein embedding-based retrieval pipelines often pool these representations into a single vector, potentially obscuring local motifs, domains, or conserved residues that reveal remote homology. We introduce ProtoCol, a model which represents proteins as sets of residue embeddings and uses ColBERT-style late interaction to test whether residue-level comparison improves homolog retrieval. ProtoCol encodes proteins independently, keeps candidate representations pre-computable, and scores candidates with MaxSim over residue embeddings. On SCOPe superfamily and Pfam clan benchmarks, ProtoCol outperforms sequence-composition, alignment-based, pooled PLM, and trained single-vector baselines, supporting late interaction as an effective retrieval layer for remote homology search.


Source: arXiv:2605.29158v1 - http://arxiv.org/abs/2605.29158v1 PDF: https://arxiv.org/pdf/2605.29158v1 Original Link: http://arxiv.org/abs/2605.29158v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
May 30, 2026
Topic:
Pharmaceutical Research
Area:
Biochemistry
Comments:
0
Bookmark