Xujiang Zhao NEC Labs America

Xujiang Zhao is a researcher in the Data Science & System Security department at NEC Laboratories America, based in Princeton, New Jersey. He holds a B.S. in Civil Engineering from Chongqing University and an M.S. in Computer Science from the University of Science and Technology of China. He earned his PhD in Computer Science from the University of Texas at Dallas, and his academic training provided a strong foundation in both theoretical and applied aspects of computing, which continues to shape his contributions at NEC.

At NEC Labs, Zhao’s research focuses on aligning large language models (LLMs) with human intent through techniques that enhance explainability, factual consistency, uncertainty estimation, and robustness. He develops methods that make LLMs more transparent and reliable, ensuring that they can be applied in sensitive, high-stakes environments. A key area of his work is building collaborative agent systems that integrate LLMs with domain-specific expertise and human feedback loops, enabling AI to work more effectively as a partner in decision-making.

Beyond language alignment, Zhao explores applications in image–text retrieval, synthetic media detection, and multi-agent reasoning, areas that are increasingly critical for enterprise knowledge management, misinformation defense, and the verification of AI-generated content. By combining fundamental advances in machine learning with applied research, his work pushes forward the responsible and practical use of foundation models across industries.

Posts

Multi-Agent Procedural Graph Extraction with Structural and Logical Refinement

Automatically extracting workflows as procedural graphs from natural language is promising yet underexplored, demanding both structural validity and logical alignment. While recent large language models (LLMs) show potential for procedural graph extraction, they often produce ill-formed structures or misinterpret logical flows. We present text2flow, a multi-agent framework that formulates procedural graph extraction as a multi-round reasoning process with dedicated structural and logical refinement. The framework iterates through three stages: (1) a graph extraction phase with the graph builder agent, (2) a structural feedback phase in which a simulation agent diagnoses and explains structural defects, and (3) a logical feedback phase in which a semantic agent aligns semantics between flow logic and linguistic cues in the source text. Important feedback is prioritized and expressed in naturallanguage, which is injected into subsequent prompts, enabling interpretable and controllable refinement. This modular design allows agents to target distinct error types without supervision or parameter updates. Experiments demonstrate that text2flow achieves substantial improvements in both structural correctness and logical consistency over strong baselines.

DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router

Large Language Models (LLMs) excel at many reasoning tasks but struggle with knowledge-intensive queries due to their inability to dynamically access up-to-date or domain-specific information. Retrieval-Augmented Generation (RAG) has emerged as a promising solution, enabling LLMs to ground their responses in external sources. However, existing RAG methods lack fine-grained control over both the query and source sides, resulting in noisy retrieval, shallow reasoning, and limited adaptability to heterogeneous knowledge sources. In this work, we introduce DeepSieve, a novel RAG method that incorporates information sieving via LLM-as-a-knowledge-router. DeepSieve breaks down complex queries into structured sub-queries and recursively routes each to the most appropriate knowledge source, filtering out irrelevant information through a multi-stage information sieving process. This modular and transparent approach ensures that DeepSieve remains adaptable across diverse information needs. Experiments on three multi-hop QA benchmarks involving heterogeneous sources show that DeepSieve achieves greater reasoning depth, retrieval precision, and interpretability compared to conventional RAG approaches. Our codes are available at https://github.com/MinghoKwok/DeepSieve.

Decoding Time Series with LLMs: A Multi-Agent Framework for Cross-Domain Annotation

Time series data is ubiquitous across various domains, including manufacturing, finance, and healthcare. High-quality annotations are essential for effectively understanding time series and facilitating downstream tasks. However, obtaining such annotations is challenging, particularly in mission-critical domains. In this paper, we propose TESSA, a multi-agent system designed to automatically generate both general and domain-specific annotations for time series data. TESSA introduces two agents: a general annotation agent and a domain-specific annotation agent. The general agent captures common patterns and knowledge across multiple source domains, leveraging both time-series-wise and text-wise features to generate general annotations. Meanwhile, the domain-specific agent utilizes limited annotations from the target domain to learn domain-specific terminology and generate targeted annotations. Extensive experiments on multiple synthetic and real-world datasets demonstrate that TESSA effectively generates high-quality annotations, outperforming existing methods.

MARLIN: Multi-Agent Reinforcement Learning for Incremental DAG Discovery

Uncovering causal structures from observational data is crucial for understanding complex systems and making informed decisions. While reinforcement learning (RL) has shown promise in identifying these structures in the form of a directed acyclic graph (DAG), existing methods often lack efficiency, making them unsuitable for online applications. In this paper, we propose MARLIN, an efficient multi-agent RL-based approach for incremental DAG learning. MARLIN uses a DAG generation policy that maps a continuous real-valued space to the DAG space as an intra-batch strategy, then incorporates two RL agents — state-specific and state-invariant — to uncover causal relationships and integrates these agents into an incremental learning framework. Furthermore, the framework leverages a factored action space to enhance parallelization efficiency. Extensive experiments on synthetic and real datasets demonstrate that MARLIN out-performs state-of-the-art methods in terms of both efficiency and effectiveness.

SolverLLM: Leveraging Test-Time Scaling for Optimization Problem via LLM-Guided Search

Large Language Models (LLMs) offer promising capabilities for tackling complex reasoning tasks, including optimization problems. However, existing methods either rely on prompt engineering, which leads to poor generalization across problem types, or require costly supervised training. We introduce SolverLLM, a training-free framework that leverages test-time scaling to solve diverse optimization problems. Rather than solving directly, SolverLLM generates mathematical formulations and translates them into solver-ready code, guided by a novel Monte Carlo Tree Search (MCTS) strategy. To enhance the search process, we modify classical MCTS with (1) dynamic expansion for adaptive formulation generation, (2) prompt backpropagation to guide exploration via outcome-driven feedback, and (3) uncertainty backpropagation to incorporate reward reliability into decision-making. Experiments on six standard benchmark datasets demonstrate that SolverLLM outperforms both prompt-based and learning-based baselines, achieving strong generalization without additional training.

Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection

The rapid advancement of large language models (LLMs) such as ChatGPT, DeepSeek, and Claude has significantly increased the presence of AI-generated text in digital communication. This trend has heightened the need for reliable detection methods to distinguish between human-authored and machine-generated content. Existing approaches both zero-shot methods and supervised classifiers largely conceptualize this task as a binary classification problem, often leading to poor generalization across domains and models. In this paper, we argue that such a binary formulation fundamentally mischaracterizes the detection task by assuming a coherent representation of human-written texts. In reality, human texts do not constitute a unified distribution, and their diversity cannot be effectively captured through limited sampling. This causes previous classifiers to memorize observed OOD characteristics rather than learn the essence of ‘non-ID’ behavior, limiting generalization to unseen human-authored inputs. Based on this observation, we propose reframing the detection task as an out-of-distribution (OOD) detection problem, treating human-written texts as distributional outliers while machine-generated texts are in-distribution (ID) samples. To this end, we develop a detection framework using one-class learning method including DeepSVDD and HRN, and score-based learning techniques such as energy-based method, enabling robust and generalizable performance. Extensive experiments across multiple datasets validate the effectiveness of our OOD-based approach. Specifically, the OOD-based method achieves 98.3% AUROC and AUPR with only 8.9% FPR95 on DeepFake dataset. Moreover, we test our detection framework on multilingual, attacked, and unseen-model and -domain text settings, demonstrating the robustness and generalizability of our framework. Code, pretrained weights, and demo will be released openly at https://github.com/cong-zeng/ood-llm-detect.

NeurIPS 2025 in San Diego from November 30th to December 5th, 2025

NEC Laboratories America is heading to San Diego for NeurIPS 2025, where our researchers will present cutting-edge work spanning optimization, AI systems, language modeling, and trustworthy machine learning. multi-agent coordination, scalable training, efficient inference, and techniques for detecting LLM-generated text.

Correlation-aware Online Change Point Detection

Change point detection aims to identify abrupt shifts occurring at multiple points within a data sequence. This task becomes particularly challenging in the online setting, where different types of change can occur, including shifts in both the marginal and joint distributions of the data. In this paper, we address these challenges by tracking the Riemannian geometry of correlation matrices, allowing Riemannian metrics to compute the geodesic distance as an accurate measure of correlation dynamics.We introduce Rio-CPD, a correlation-aware online change point detection framework that integrates the Riemannian geometry of the manifold of symmetric positive definite matrices with the cumulative sum (CUSUM) statistic for detecting change points. Rio-CPD employs a novel CUSUM design by computing the geodesic distance between current observations and the Fréchet mean of prior observations. With appropriate choices of Riemannian metrics, Rio-CPD offers a simple yet effective and computationally efficient algorithm. We also provide a theoretical analysis on standard metrics for change point detection within Rio-CPD. Experimental results on both synthetic and real-world datasets demonstrate that Rio-CPD outperforms existing methods on detection accuracy, average detection delay, and efficiency.

Uncertainty Quantification and Reasoning for Reliable AI Seminar at Brigham Young University

Our researcher Xujiang Zhao will present “Uncertainty Quantification and Reasoning for Reliable AI” at Brigham Young University on Thursday, Sept. 25 at 11 a.m. in TMCB 1170. The seminar explores how statistical modeling and reasoning frameworks can strengthen trustworthy AI, making systems more robust and transparent in high-stakes applications like healthcare and autonomous systems. Attendees will gain insights into how uncertainty quantification is shaping the next generation of responsible AI.

Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey

Large language models (LLMs) have significantly advanced the field of natural language processing (NLP), providing a highly useful, task-agnostic foundation for a wide range of applications. However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles, caused by the heterogeneity of domain data, the sophistication of domain knowledge, the uniqueness of domain objectives, and the diversity of the constraints (e.g., various social norms, cultural conformity, religious beliefs, and ethical standards in the domain applications). Domain specification techniques are key to making large language models disruptive in many applications. Specifically, to solve these hurdles, there has been a notable increase in research and practices conducted in recent years on the domain specialization of LLMs. This emerging field of study, with its substantial potential for impact, necessitates a comprehensive and systematic review to summarize better and guide ongoing work in this area. In this article, we present a comprehensive survey on domain specification techniques for large language models, an emerging direction critical for large language model applications. First, we propose a systematic taxonomy that categorizes the LLM domain-specialization techniques based on the accessibility to LLMs and summarizes the framework for all the subcategories as well as their relations and differences to each other. Second, we present an extensive taxonomy of critical application domains that can benefit dramatically from specialized LLMs, discussing their practical significance and open challenges. Last, we offer our insights into the current research status and future trends in this area.