ExplorerPharmaceutical ResearchBiochemistry
Research PaperResearchia:202605.30018

Ligand-Conditioned Discrete Diffusion for Protein Sequence-Structure Co-Design

Chen Wei

Abstract

Proteins perform their biological functions through three-dimensional structures encoded by amino acid sequences, and ligand-binding protein co-design requires models that generate sequence-structure compatible proteins under explicit ligand constraints. Although continuous diffusion and flow-based models support ligand-aware design in coordinate or latent spaces, existing discrete diffusion protein language models mainly operate over sequence or structure tokens without direct small-molecule co...

Submitted: May 30, 2026Subjects: Biochemistry; Pharmaceutical Research

Description / Details

Proteins perform their biological functions through three-dimensional structures encoded by amino acid sequences, and ligand-binding protein co-design requires models that generate sequence-structure compatible proteins under explicit ligand constraints. Although continuous diffusion and flow-based models support ligand-aware design in coordinate or latent spaces, existing discrete diffusion protein language models mainly operate over sequence or structure tokens without direct small-molecule conditioning. We introduce \textbf{ProtLiD2^2}, a \textbf{Prot}ein \textbf{L}igand-conditioned \textbf{D}iscrete \textbf{D}iffusion model for protein sequence-structure co-design. ProtLiD2^2 jointly generates amino-acid sequence and discrete structure tokens while incorporating ligand chemical and geometric information through geometry-aware cross-attention. Trained on over one million ligand-protein complexes, ProtLiD2^2 extends masked discrete diffusion to ligand-aware functional protein design. We further propose maximum confidence-margin guided ReMask decoding, an inference-time self-correction strategy that retains confident predictions and remasks uncertain tokens. ProtLiD2^2 improves global fold confidence over Complexa in whole-protein design, increasing TM-score from 0.672 to 0.802 and pLDDT from 64.55 to 73.00. In pocket co-design, ProtLiD2^2 reduces active-site BB-RMSD from 3.46/3.40Å for FAIR/PocketGen to 1.97Å, and improves ligand-aware pass rates over PocketGen from 14.86% to 59.73% and from 6.08% to 23.49% under stricter docking thresholds. These results support ligand-conditioned discrete diffusion as an effective token-space framework for functional protein co-design. Code will be available at https://github.com/auroua/ProtLiD.


Source: arXiv:2605.27413v1 - http://arxiv.org/abs/2605.27413v1 PDF: https://arxiv.org/pdf/2605.27413v1 Original Link: http://arxiv.org/abs/2605.27413v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
May 30, 2026
Topic:
Pharmaceutical Research
Area:
Biochemistry
Comments:
0
Bookmark
Ligand-Conditioned Discrete Diffusion for Protein Sequence-Structure Co-Design | Researchia