Martin Min NEC Labs AmericaMartin Renqiang Min is the Department Head of the Machine Learning Department at NEC Laboratories America. He holds a Ph.D. in Computer Science from the University of Toronto and completed postdoctoral research at Yale University, where he also taught courses on deep learning. His research has been published in top venues, including Nature, NeurIPS, ICML, ICLR, CVPR, and ACL, and his innovations have been recognized  internationally, with features in Science and MIT Technology Review.

At NEC, Dr. Min directs a multidisciplinary research team at the forefront of foundational and applied artificial intelligence. His portfolio spans deep learning, natural language understanding, multimodal learning, visual reasoning, and the application of machine learning to biomedical and healthcare data. He has contributed to the design of scalable learning systems that power real-world applications, bridging cutting-edge theory with industry-scale deployment. He also co-chaired the NeurIPS Workshop on Machine Learning in Computational Biology, advancing the dialogue between AI and life sciences.

Under his leadership, NEC’s machine learning group drives innovation across multiple domains, including AI for precision medicine, next-generation language modeling, and interpretable multimodal systems. He is known for fostering interdisciplinary collaboration—both within NEC and with academic and industry partners—encouraging research that connects scientific breakthroughs with societal impact. His team’s contributions extend to core technologies used across telecommunications, enterprise solutions, and healthcare, positioning NEC at the leading edge of applied AI. Dr. Min is recognized for his ability to identify emerging trends in machine learning and translate them into long-term research roadmaps. His work continues to influence the global AI community, and his leadership ensures NEC remains a hub for transformative research that combines fundamental discovery with practical applications that improve people’s lives.

Posts

EditGRPO: Reinforcement Learning with Post-Rollout Edits for Clinically Accurate Chest X-Ray Report Generation

Radiology report generation requires advanced medical image analysis, effective temporal reasoning, and accurate text generation. Although recent innovations, particularly multimodal large language models (MLLMs), have shown improved performance, their supervised fine-tuning (SFT) objective is not explicitly aligned with clinical efficacy. In this work, we introduce EditGRPO, a mixed-policy reinforcement learning (RL) algorithm designed specifically to optimize the generation through clinically motivated rewards. EditGRPO integrates on-policy exploration with off-policy guidance by injecting sentence-level detailed corrections during training rollouts. This mixed-policy approach addresses the exploration dilemma and sampling efficiency issues typically encountered in RL. Applied to a Qwen2.5-VL-3B MLLM initialized with supervised fine-tuning (SFT), EditGRPO outperforms both SFT and vanilla GRPO baselines, achieving an average improvement of 3.4% in CheXbert, GREEN, Radgraph, and RATEScore metrics across four major chest X-ray report generation datasets. Notably, EditGRPO also demonstrates superior out-of-domain generalization, with an average performance gain of 5.9% on unseen datasets.

DiscussLLM: Teaching Large Language Models When to Speak

Large Language Models (LLMs) have demonstrated remarkable capabilities in understanding and generating human-like text, yet they largely operate as reactive agents, responding only when directly prompted. This passivity creates an “awareness gap,” limiting their potential as truly collaborative partners in dynamic human discussions. We introduce , a framework designed to bridge this gap by training models to proactively decide not just to say, but critically, to speak. Our primary contribution is a scalable two-stage data generation pipeline that synthesizes a large-scale dataset of realistic multi-turn human discussions. Each discussion is annotated with one of five intervention types (e.g., Factual Correction, Concept Definition) and contains an explicit conversational trigger where an AI intervention adds value. By training models to predict a special silent token when no intervention is needed, they learn to remain quiet until a helpful contribution can be made. We explore two architectural baselines: an integrated end-to-end model and a decoupled classifier-generator system optimized for low-latency inference. We evaluate these models on their ability to accurately time interventions and generate helpful responses, paving the way for more situationally aware and proactive conversational AI.

Identifying Combinatorial Regulatory Genes for Cell Fate Decision via Reparameterizable Subset Explanations

Cell fate decisions are highly coordinated processes governed bycomplex interactions among numerous regulatory genes, whiledisruptions in these mechanisms can lead to developmental abnormalitiesand disease. Traditional methods often fail to capture suchcombinatorial interactions, limiting their ability to fully model cellfate dynamics. Here, we introduce MetaVelo, a global feature explanationframework for identifying key regulatory gene sets influencingcell fate transitions. MetaVelo models these transitions as ablack-box function and employs a differentiable neural ordinary differentialequation (ODE) surrogate to enable efficient optimization.By reparameterizing the problem as a controllable data generationprocess, MetaVelo overcomes the challenges posed by the nondifferentiablenature of cell fate dynamics. Benchmarking acrossdiverse stand-alone and longitudinal single-cell RNA-seq datasetsand three black-box cell fate models demonstrates its superiorityover 12 baseline methods in predicting developmental trajectoriesand identifying combinatorial regulatory gene sets. MetaVelo furtherdistinguishes independent from synergistic regulatory genes,offering novel insights into the gene interactions governing cellfate. With the growing availability of high-resolution single-celldata, MetaVelo provides a scalable and effective framework fo

Group Relative Augmentation for Data Efficient Action Detection

Adapting large Video-Language Models (VLMs) for action detection using only a few examples poses challenges like overfitting and the granularity mismatch between scene-level pre-training and required person-centric understanding. We propose an efficient adaptation strategy combining parameter-efficient tuning (LoRA) with a novel learnable internal feature augmentation. Applied within the frozen VLM backbone using FiLM, these augmentations generate diverse feature variations directly relevant to the task. Additionally, we introduce a group-weighted loss function that dynamically modulates the training contribution of each augmented sample based on its prediction divergence relative to the group average. This promotes robust learning by prioritizing informative yet reasonable augmentations. We demonstrate our method’s effectiveness on complex multi-label, multi-person action detection datasets (AVA, MOMA), achieving strong mAP performance and showcasing significant data efficiency for adapting VLMs from limited examples.

PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design

Designing protein-binding proteins with high affinity is critical in biomedical research and biotechnology. Despite recent advancements targeting specific proteins, the ability to create high-affinity binders for arbitrary protein targets on demand, without extensive rounds of wet-lab testing,remains a significant challenge. Here, we introduce PPDiff, a diffusion model to jointly design the sequence and structure of binders for arbitrary protein targets in a non-autoregressive manner. PPDiff builds upon our developed Sequence Structure Interleaving Network with Causal attention layers (SSINC), which integrates interleaved self-attention layers to capture global amino acid correlations, k-nearest neighbor (kNN) equivariant graph layers to model local interactions in three-dimensional (3D) space, and causal attention layers to simplify the intricate interdependencies within the protein sequence. To assess PPDiff, we curate PPBench, a general protein complex dataset comprising 706,360 complexes from the Protein Data Bank (PDB). The model is pretrained on PPBench and finetuned on two real-world applications: target-protein mini-binder complex design and antigen-antibody complex design. PPDiff consistently surpasses baseline methods, achieving success rates of 50.00%, 23.16%, and 16.89% for the pretraining task and the two downstream applications, respectively.

Solving Inverse Problems via a Score-Based Prior: An Approximation-Free Posterior Sampling Approach

Diffusion models (DMs) have proven to be effective in modeling high-dimensional distributions, leading to their widespread adoption for representing complex priors in Bayesian inverse problems (BIPs). However, current DM-based posterior sampling methods proposed for solving common BIPs rely on heuristic approximations to the generative process. To exploit the generative capability of DMs and avoid the usage of such approximations, we propose an ensemble-based algorithm that performs posterior sampling without the use of heuristic approximations. Our algorithm is motivated by existing works that combine DM-based methods with the sequential Monte Carlo (SMC) method. By examining how the prior evolves through the diffusion process encoded by the pre-trained score function, we derive a modified partial differential equation (PDE) governing the evolution of the corresponding posterior distribution. This PDE includes a modified diffusion term and a reweighting term, which can be simulated via stochastic weighted particle methods. Theoretically, we prove that the error between the true posterior distribution canbe bounded in terms of the training error of the pre-trained score function and the ]number of particles in the ensemble. Empirically, we validate our algorithm on several inverse problems in imaging to show that our method gives more accurate reconstructions compared to existing DM-based methods.

Attribute-Centric Compositional Text-to-Image Generation

Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing thedata distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions,which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attributecentriccompositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novelimage-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes.Wefurther propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions.We validateour framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization ofACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency

Learning Disentangled Equivariant Representation for Explicitly Controllable 3D Molecule Generation

We consider the conditional generation of 3D drug-like molecules with explicit control over molecular properties such as drug-like properties (e.g., Quantitative Estimate of Druglikenessor Synthetic Accessibility score) and effectively binding to specific protein sites. To tackle this problem, we propose an E(3)-equivariant Wasserstein autoencoder and factorize thelatent space of our generative model into two disentangled aspects: molecular properties and the remaining structural context of 3D molecules. Our model ensures explicit control over these molecular attributes while maintaining equivariance of coordinate representation and invariance of data likelihood. Furthermore, we introduce a novel alignment-based coordinate loss to adapt equivariant networks for auto-regressive denovo 3D molecule generation from scratch. Extensive experiments validate our model’s effectiveness on property-guidedand context-guided molecule generation, both for de-novo 3D molecule design and structure-based drug discovery against protein targets.

NEC Labs America Attends the 39th Annual AAAI Conference on Artificial Intelligence #AAAI25

Our NEC Lab America team attended the Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25), in Philadelphia, Pennsylvania at the Pennsylvania Convention Center from February 25 to March 4, 2025. The purpose of the AAAI conference series was to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. Our team presented technical papers, led special tracks, delivered talks on key topics, participated in workshops, conducted tutorials, and showcased research in poster sessions. The team greeted visitors at Booth #208 and was there Thursday through Saturday.

Reducing Hallucinations of Medical Multimodal Large Language Models with Visual Retrieval-Augmented Generation

Multimodal Large Language Models (MLLMs) have shown impressive performance in vision and text tasks. However, hallucination remains a major challenge, especially in fields like healthcare where details are critical. In this work, we show how MLLMs may be enhanced to support Visual RAG (V-RAG), a retrieval-augmented generation framework that incorporates both text and visual data from retrieved images. On the MIMIC-CXR chest X-ray report generation and Multicare medical image caption generation datasets, we show that Visual RAG improves the accuracy of entity probing, which asks whether a medical entities is grounded by an image. We show that the improvements extend both to frequent and rare entities, the latter of which may have less positive training data. Downstream, we apply V-RAG with entity probing to correct hallucinations and generate more clinically accurate X-ray reports, obtaining a higher RadGraph-F1 score.