400-Gb/s mode division multiplexing-based bidirectional free space optical communication in real-time with commercial transponders

In this work, for the first time, we experimentally demonstrate mode division multiplexing-based bidirectional free space optical communication in real-time using commercial transponders. As proof of concept, via bidirectional pairs of Hermite-Gaussian modes (HG00, HG10, and HG01), using a Telecom Infra Project Phoenix compliant commercial 400G transponder, 400-Gb/s data signals (56-Gbaud, DP-16QAM) are bidirectionally transmitted error free, i.e., with less than 1e-2 pre-FEC BERs, over approximately 1-m of free space

EdgeSync: Efficient Edge-Assisted Video Analytics via Network Contention-Aware Scheduling

With the advancement of 5G, edge-assisted video analytics has become increasingly popular, driven by the technology’s ability to support low-latency, high-bandwidth applications. However, in scenarios where multiple clients competing for network resources, network contention poses a significant challenge. In this paper, we propose a novel scheduling algorithm that intelligently batches and aligns the offloading of multiple video analytics clients to optimize both network and edge server resource utilization while meeting the Service Level Objective (SLO). Experiment with a cellular network testbed shows that our approach successfully processes 93% or more of inference requests from 7 different clients to the edge server while meeting the SLOs, whereas other approaches achieve a lower success rate, ranging from 65% to 85% under the same condition.

Attribute-Centric Compositional Text-to-Image Generation

Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing thedata distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions,which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attributecentriccompositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novelimage-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes.Wefurther propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions.We validateour framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization ofACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency

G-Litter Marine Litter Dataset Augmentation with Diffusion Models and Large Language Models on GPU Acceleration

Marine litter detection is crucial for environmental monitoring, yet the imbalance in existing datasets limits model performance in identifying various types of waste accurately. This paper presents an efficient data augmentation pipeline that combines generative diffusion models (e.g., Stable Diffusion) and Large Language Models (LLMs) to expand the G-Litter dataset, a marine litter dataset designed for autonomous detection in heterogeneous environments. Leveraging scalable diffusion models for image generation and Alpaca LLMs for diverse prompt generation, our approach augments underrepresented classes by generating over 200 additional images per class, significantly improving the dataset’s balance. Training G-Litter augmented dataset using YOLOv8 for object detection demonstrated an increase in detection performance, improving recall by 7.82% and mAP50 by 3.87% (compared with baseline results). This study emphasizes the potential for combining generative AI with HPC resources to automate data augmentation on large-scale, unstructured datasets, particularly in edge computing contexts for real-time marine monitoring. The models were tested on real videos captured during simulated missions, demonstrating a superior ability to detect submerged objects in dynamic scenarios. These results highlight the potential of generative AI techniques to improve dataset quality and detection model performance, laying the foundation for further expansion in real-time marine monitoring.

Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection

Action detection aims to detect (recognize and localize) human actions spatially and temporally in videos. Existing approaches focus on the closed-set setting where an action detector is trained and tested on videos from a fixed set of action categories. However, this constrained setting is not viable in an open world where test videos inevitably come beyond the trained action categories. In this paper, we address the practical yet challenging Open-Vocabulary Action Detection (OVAD) problem. It aims to detect any action in test videos while training a model on a fixed set of action categories. To achieve such an open-vocabulary capability, we propose a novel method OpenMixer that exploits the inherent semantics and localizability of large vision-language models (VLM) within the family of query-based detection transformers (DETR). Specifically, the OpenMixer is developed by spatial and temporal OpenMixer blocks (S-OMBand T-OMB), and a dynamically fused alignment (DFA) module. The three components collectively enjoy the merits of strong generalization from pre-trained VLMs and end to-end learning from DETR design. Moreover, we established OVAD benchmarks under various settings, and the experimental results show that the OpenMixer performs the best over baselines for detecting seen and unseen actions.

Learning Disentangled Equivariant Representation for Explicitly Controllable 3D Molecule Generation

We consider the conditional generation of 3D drug-like molecules with explicit control over molecular properties such as drug-like properties (e.g., Quantitative Estimate of Druglikenessor Synthetic Accessibility score) and effectively binding to specific protein sites. To tackle this problem, we propose an E(3)-equivariant Wasserstein autoencoder and factorize thelatent space of our generative model into two disentangled aspects: molecular properties and the remaining structural context of 3D molecules. Our model ensures explicit control over these molecular attributes while maintaining equivariance of coordinate representation and invariance of data likelihood. Furthermore, we introduce a novel alignment-based coordinate loss to adapt equivariant networks for auto-regressive denovo 3D molecule generation from scratch. Extensive experiments validate our model’s effectiveness on property-guidedand context-guided molecule generation, both for de-novo 3D molecule design and structure-based drug discovery against protein targets.

TimeCAP: Learning to Contextualize, Augment, and Predict Time Series Events with Large Language Model Agents

Time series data is essential in various applications, including climate modeling, healthcare monitoring, and financial analytics. Understanding the contextual information associated with real-world time series data is often essential for accurate and reliable event predictions. In this paper, we introduce TimeCAP, a time-series processing framework that creatively employs Large Language Models (LLMs) as contextualizers of time series data, extending their typical usage as predictors. TimeCAP incorporates two independent LLM agents: one generates a textual summary capturing the context of the time series, while the other uses this enriched summary to make more informed predictions. In addition, TimeCAP employs a multi-modal encoder that synergizes with the LLM agents, enhancing predictive performance through mutual augmentation of inputs with in-context examples. Experimental results on real-world datasets demonstrate that TimeCAP outperforms state-of-the-art methods for time series event prediction, including those utilizing LLMs as predictors, achieving an average improvement of 28.75% in F1 score.

Domain-Guided Weight Modulation for Semi-Supervised Domain Generalization

Unarguably deep learning models capable of generalizing to unseen domain data while leveraging a few labels are of great practical significance due to low developmental costs. In search of this endeavor we study the challenging problem of semi-supervised domain generalization (SSDG) where the goal is to learn a domain-generalizable model while using only a small fraction of labeled data and a relatively large fraction of unlabeled data. Domain generalization (DG) methods show subpar performance under the SSDG setting whereas semi-supervised learning (SSL) methods demonstrate relatively better performance however they are considerably poor compared to the fully-supervised DG methods. Towards handling this new but challenging problem of SSDG we propose a novel method that can facilitate the generation of accurate pseudo-labels under various domain shifts. This is accomplished by retaining the domain-level specialism in the classifier during training corresponding to each source domain. Specifically we first create domain-level information vectors on the fly which are then utilized to learn a domain-aware mask for modulating the classifier’s weights. We provide a mathematical interpretation for the effect of this modulation procedure on both pseudo-labeling and model training. Our method is plug-and-play and can be readily applied to different SSL baselines for SSDG. Extensive experiments on six challenging datasets in two different SSDG settings show that our method provides visible gains over the various strong SSL-based SSDG baselines. Our code is available at github.com/DGWM.

Incident Diagnosing and Reporting System based on Retrieval Augmented Large Language Model

The Internet-of-Things (IoT) is widely used in many applications such as smart city, transportation, healthcare, and environment monitoring. A key task of IoT maintenance is to analyze the abnormal sensor records and generate incident report. Traditionally, domain experts engage in such labor intensive tasks. Recent advances in Large Language Model (LLM) have sparked interests in developing AI-based systems to automate these labor intensive processes. However, two critical problems hinder the effective application of LLM in IoTs: (1) LLM lacks background knowledge of deployed IoTs; and (2) the incidents are complex = events involving many sensors and components. LLM needs to understand the sensor relationships for accurate diagnosis. In this study, we propose a Retrieval Augmented language model based Incident Diagnosing and Reporting system (RAIDR) for IoT applications. RAIDR retrieves related system documents based on the incident features and leverages LLM to analyze anomalies, identify root causes, and automatically generate incident reports. The automated incident reporting process streamlines end users’ decision making for system maintenance and troubleshooting.

NEC Labs America Attends the 39th Annual AAAI Conference on Artificial Intelligence #AAAI25

Our NEC Lab America team attended the Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25), in Philadelphia, Pennsylvania at the Pennsylvania Convention Center from February 25 to March 4, 2025. The purpose of the AAAI conference series was to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. Our team presented technical papers, led special tracks, delivered talks on key topics, participated in workshops, conducted tutorials, and showcased research in poster sessions. The team greeted visitors at Booth #208 and was there Thursday through Saturday.