Gaoyuan Wang is a Postdoctoral Research Scholar at

Posts

A Variational Graph Partitioning Approach to Modeling Protein Liquid-liquid Phase Separation

Graph neural networks (GNNs) have emerged as powerful tools for representation learning. Their efficacy depends on their having an optimal underlying graph. In many cases, the most relevant information comes from specific subgraphs. In this work, we introduce a GNN-based framework (graph-partitioned GNN [GP-GNN]) to partition the GNN graph to focus on the most relevant subgraphs. Our approach jointly learns task-dependent graph partitions and node representations, making it particularly effective when critical features reside within initially unidentified subgraphs. Protein liquid-liquid phase separation (LLPS) is a problem especially well-suited to GP-GNNs because intrinsically disordered regions (IDRs) are known to function as protein subdomains in it, playing a key role in the phase separation process. In this study, we demonstrate how GP-GNN accurately predicts LLPS by partitioning protein graphs into task-relevant subgraphs consistent with known IDRs. Our model achieves state-of-the-art accuracy in predicting LLPS and offers biological insights valuable for downstream investigation.

Predicting Spatially Resolved Gene Expression via Tissue Morphology using Adaptive Spatial GNNs (ECCB)

Spatial transcriptomics technologies, which generate a spatial map of gene activity, can deepen the understanding of tissue architecture and its molecular underpinnings in health and disease. However, the high cost makes these technologies difficult to use in practice. Histological images co-registered with targeted tissues are more affordable and routinely generated in many research and clinical studies. Hence, predicting spatial gene expression from the morphological clues embedded in tissue histological images provides a scalable alternative approach to decoding tissue complexity.

zeta-QVAE: A Quantum Variational Autoencoder utilizing Regularized Mixed-state Latent Representations

A major challenge in near-term quantum computing is its application to large real-world datasets due to scarce quantum hardware resources. One approach to enabling tractable quantum models for such datasets involves compressing the original data to manageable dimensions while still representing essential information for downstream analysis. In classical machine learning, variational autoencoders (VAEs) facilitate efficient data compression, representation learning for subsequent tasks, and novel data generation. However, no model has been proposed that exactly captures all of these features for direct application to quantum data on quantum computers. Some existing quantum models for data compression lack regularization of latent representations, thus preventing direct use for generation and control of generalization. Others are hybrid models with only some internal quantum components, impeding direct training on quantum data. To bridge this gap, we present a fully quantum framework, ?-QVAE, which encompasses all the capabilities of classical VAEs and can be directly applied for both classical and quantum data compression. Our model utilizes regularized mixed states to attain optimal latent representations. It accommodates various divergences for reconstruction and regularization. Furthermore, by accommodating mixed states at every stage, it can utilize the full-data density matrix and allow for a “global” training objective. Doing so, in turn, makes efficient optimization possible and has potential implications for private and federated learning. In addition to exploring the theoretical properties of ?-QVAE, we demonstrate its performance on representative genomics and synthetic data. Our results consistently indicate that ?-QVAE exhibits similar or better performance compared to matched classical models.

Predicting Spatially Resolved Gene Expression via Tissue Morphology using Adaptive Spatial GNNs

Motivation Spatial transcriptomics technologies, which generate a spatial map of gene activity, can deepen the understanding of tissue architecture and its molecular underpinnings in health and disease. However, the high cost makes these technologies difficult to use in practice. Histological images co-registered with targeted tissues are more affordable and routinely generated in many research and clinical studies. Hence, predicting spatial gene expression from the morphological clues embedded in tissue histological images, provides a scalable alternative approach to decoding tissue complexity