Back to Explorer
Research PaperResearchia:202602.18084[Pharmaceutical Research > Biochemistry]

Hermes: Large DEL Datasets Train Generalizable Protein-Ligand Binding Prediction Models

Maxwell Kleinsasser

Abstract

The quality and consistency of training data remain critical bottlenecks for protein-ligand binding prediction. Public affinity datasets, aggregated from thousands of labs and assay formats, introduce biases that limit model generalization and complicate evaluation. DNA-encoded chemical libraries (DELs) offer a potential solution: unified experimental protocols generating massive binding datasets across diverse chemical and protein target space. We present Hermes, a lightweight transformer trained exclusively on DEL data from screens against hundreds of protein targets, representing one of the largest and most protein-diverse DEL training sets applied to protein-ligand interaction (PLI) modeling to date. Despite never seeing traditional affinity measurements during training, Hermes generalizes to held-out targets, novel chemical scaffolds, and external benchmarks derived from public binding data and high-throughput screens. Our results demonstrate that DEL data alone captures transferable protein-ligand interaction representations, while Hermes' minimal architecture enables inference speeds suitable for large-scale virtual screening.


Source: arXiv:2602.13503v1 - http://arxiv.org/abs/2602.13503v1 PDF: https://arxiv.org/pdf/2602.13503v1 Original Link: http://arxiv.org/abs/2602.13503v1

Submission:2/18/2026
Comments:0 comments
Subjects:Biochemistry; Pharmaceutical Research
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Hermes: Large DEL Datasets Train Generalizable Protein-Ligand Binding Prediction Models | Researchia | Researchia