Tianxiao Li NEC Labs AmericaTianxiao Li is a postdoctoral scientist in the Machine Learning Department at NEC Laboratories America. He received his undergraduate degree in Biological Sciences from Tsinghua University and earned his Ph.D. from Yale University. His expertise spans deep learning, predictive analytics, generative modeling and computational biology, with a focus on building machine learning models that support real-world decision-making in healthcare and pharmaceuticals.

Since joining NEC in 2024, Tianxiao has been a core contributor to the Physics-Informed Machine Learning Project, advancing new methods for compositional generation and reasoning that have potential applications in areas like drug discovery, diagnostics and personalized medicine. He collaborates closely with NEC’s ML team to integrate physics-informed learning, generative modeling, and explainable AI into systems that are both innovative and reliable. Tianxiao’s work also addresses key challenges in the biomedical domain: diverse modality, high dimensionality, and noisy and insufficient data, by developing more data and parameter-efficient approaches that leverage prior physical and biological knowledge, reduce research costs, shorten development cycles, ensure safety and reliability, and open new opportunities for the industry.

Posts

Learning Disentangled Equivariant Representation for Explicitly Controllable 3D Molecule Generation

We consider the conditional generation of 3D drug-like molecules with explicit control over molecular properties such as drug-like properties (e.g., Quantitative Estimate of Druglikenessor Synthetic Accessibility score) and effectively binding to specific protein sites. To tackle this problem, we propose an E(3)-equivariant Wasserstein autoencoder and factorize thelatent space of our generative model into two disentangled aspects: molecular properties and the remaining structural context of 3D molecules. Our model ensures explicit control over these molecular attributes while maintaining equivariance of coordinate representation and invariance of data likelihood. Furthermore, we introduce a novel alignment-based coordinate loss to adapt equivariant networks for auto-regressive denovo 3D molecule generation from scratch. Extensive experiments validate our model’s effectiveness on property-guidedand context-guided molecule generation, both for de-novo 3D molecule design and structure-based drug discovery against protein targets.

Introducing the Trustworthy Generative AI Project: Pioneering the Future of Compositional Generation and Reasoning

We are thrilled to announce the launch of our latest research initiative, the Trustworthy Generative AI Project. This ambitious project is set to revolutionize how we interact with multimodal content by developing cutting-edge generative models capable of compositional generation and reasoning across text, images, reports, and even 3D videos.

Disentangled Wasserstein Autoencoder for T-Cell Receptor Engineering

In protein biophysics, the separation between the functionally important residues (forming the active site or binding surface) and those that create the overall structure (the fold) is a well-established and fundamental concept. Identifying and modifying those functional sites is critical for protein engineering but computationally nontrivial, and requires significant domain knowledge. To automate this process from a data-driven perspective, we propose a disentangled Wasserstein autoencoder with an auxiliary classifier, which isolates the function-related patterns from the rest with theoretical guarantees. This enables one-pass protein sequence editing and improves the understanding of the resulting sequences and editing actionsinvolved. To demonstrate its effectiveness, we apply it to T-cell receptors (TCRs), a well-studied structure-function case. We show that our method can be used to alterthe function of TCRs without changing the structural backbone, outperforming several competing methods in generation quality and efficiency, and requiring only 10% of the running time needed by baseline models. To our knowledge, this is the first approach that utilizes disentangled representations for TCR engineering.