PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design
Publication Date: 7/13/2025
Event: Forty-Second International Conference on Machine Learning (ICML 2025)
Reference: pp. 1-18, 2025
Authors: Zhenqiao Song, NEC Laboratories America, Inc., Carnegie Mellon University; Tianxiao Li, NEC Laboratories America, Inc.; Lei Li , Carnegie Mellon University; Martin Renqiang Min, NEC Laboratories America, Inc.
Abstract: Designing protein-binding proteins with high affinity is critical in biomedical research and biotechnology. Despite recent advancements targeting specific proteins, the ability to create high-affinity binders for arbitrary protein targets on demand, without extensive rounds of wet-lab testing,remains a significant challenge. Here, we introduce PPDiff, a diffusion model to jointly design the sequence and structure of binders for arbitrary protein targets in a non-autoregressive manner. PPDiff builds upon our developed Sequence Structure Interleaving Network with Causal attention layers (SSINC), which integrates interleaved self-attention layers to capture global amino acid correlations, k-nearest neighbor (kNN) equivariant graph layers to model local interactions in three-dimensional (3D) space, and causal attention layers to simplify the intricate interdependencies within the protein sequence. To assess PPDiff, we curate PPBench, a general protein complex dataset comprising 706,360 complexes from the Protein Data Bank (PDB). The model is pretrained on PPBench and finetuned on two real-world applications: target-protein mini-binder complex design and antigen-antibody complex design. PPDiff consistently surpasses baseline methods, achieving success rates of 50.00%, 23.16%, and 16.89% for the pretraining task and the two downstream applications, respectively.
Publication Link: