ExplorerPharmaceutical ResearchBiochemistry
Research PaperResearchia:202605.16052

Structural Interpretations of Protein Language Model Representations via Differentiable Graph Partitioning

Siddhant Dutta

Abstract

Protein language models such as ESM-2 learn rich residue representations that achieve strong performance on protein function prediction, but their features remain difficult to interpret as structural $\&$ evolutionary signals are encoded in dense latent spaces. We propose a plug-$\&$-play framework that projects ESM-2 representations onto protein contact graphs $\&$ applies $\textbf{SoftBlobGIN}$, a lightweight Graph Isomorphism Network with differentiable Gumbel-softmax substructure pooling, to...

Submitted: May 16, 2026Subjects: Biochemistry; Pharmaceutical Research

Description / Details

Protein language models such as ESM-2 learn rich residue representations that achieve strong performance on protein function prediction, but their features remain difficult to interpret as structural &\& evolutionary signals are encoded in dense latent spaces. We propose a plug-&\&-play framework that projects ESM-2 representations onto protein contact graphs &\& applies SoftBlobGIN\textbf{SoftBlobGIN}, a lightweight Graph Isomorphism Network with differentiable Gumbel-softmax substructure pooling, to perform structure-aware message passing &\& learn coarse functional substructures for downstream prediction tasks. Across enzyme classification, SoftBlobGIN achieves 92.8% accuracy &\& 0.898 macro-F1. Unlike post hoc analysis of protein language models alone, our method produces directly auditable structural explanations: GNNExplainer recovers biologically meaningful active-site residues, spatially localized functional clusters, &\& catalytic contact patterns. On binding-site detection, SoftBlobGIN improves residue AUROC from 0.8850.885 using an ESM-2 linear probe to 0.9830.983, indicating that these structural explanations are not recoverable from language-model features alone. Learned blob partitions provide an additional layer of interpretability by automatically grouping residues into functional substructures, with blobs containing annotated active-site residues showing 1.85×1.85\times higher importance than other blobs (ρ=0.339ρ{=}0.339, p=0.009p{=}0.009), without any active-site supervision. Our framework requires no retraining of the language model, adds only \sim1.1M parameters, &\& generalises across ProteinShake tasks, achieving FmaxF_{\max} of 0.7330.733 on Gene Ontology prediction &\& AUROC of 0.9690.969 on binding-site detection. We position this as an interpretable structural companion to protein language models that makes their predictions more transparent &\& auditable.


Source: arXiv:2605.10985v1 - http://arxiv.org/abs/2605.10985v1 PDF: https://arxiv.org/pdf/2605.10985v1 Original Link: http://arxiv.org/abs/2605.10985v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
May 16, 2026
Topic:
Pharmaceutical Research
Area:
Biochemistry
Comments:
0
Bookmark
Structural Interpretations of Protein Language Model Representations via Differentiable Graph Partitioning | Researchia