Entries by NEC Labs America

Kunal Rao presents SlideCraft: Context-Aware Slides Generation Agent at PICom 2025 on October 21st

Kunal Rao (presenting virtually) will present “SlideCraft: Context-Aware Slides Generation Agent” at the IEEE International Conference on Pervasive Intelligence and Computing hashtag#PICom2025 on Tuesday, Oct 21 (10:30am–12pm JST) | Monday, Oct 20 (9:30–11pm ET) in Hokkaido, Japan. SlideCraft uses AI to automatically generate presentation slides from research content, making technical communication faster and context-aware for scientists and professionals.

Sparsh Garg Presents Mapillary Vistas Validation for Fine-Grained Traffic Signs at DataCV 2025

Our Sparsh Garg, a Senior Associate Researcher in the Media Analytics Department, will present “Mapillary Vistas Validation for Fine-Grained Traffic Signs: A Benchmark Revealing Vision-Language Model Limitations” at the Data Computer Vision (DataCV) 2025 workshop as part of ICCV 2025 in Honolulu, Hawai’i, on Sunday, October 19th, from 11:15 am – 11:25 am.

Emerging Integrated Photonic Technologies Leveraging Multimaterial Integration for AI and Datacenter Applications

Since the inception of integrated photonics, multimaterial integration has served as a primary avenue for new technology innovations. Now, with an ever-increasing demand for integrated photonics as a platform for both high-performance links from/within datacenters and AI acceleration, multimaterial integration has begun to play an even more critical role in pushing capabilities beyond their current limits. In this work, we review photonics for AI and datacenter applications, the current landscape of multimaterial integration in photonics, and the ways in which multimaterial integration techniques have been recently utilized to push the performance of modulators on silicon and chip-scale optical frequency combs.

THAT: Token-wise High-frequency Augmentation Transformer for Hyperspectral Pansharpening

Transformer-based methods have demonstrated strong potential in hyperspectral pansharpening by modeling long-range dependencies. However, their effectiveness is often limited by redundant token representations and a lack of multiscale feature modeling. Hyperspectral images exhibit intrinsic spectral priors (e.g., abundance sparsity) and spatial priors(e.g., non-local similarity), which are critical for accurate reconstruction. From a spectral–spatial perspective, Vision Transformers (ViTs) face two major limitations: they struggle to preserve high-frequency components—such as material edges and texture transitions, and suffer from attention dispersion across redundant tokens. These issues stem from the global self-attention mechanism, which tends to dilute high-frequency signals and overlook localized details. To address these challenges, we propose the Token-wise High-frequency AugmentationTransformer (THAT), a novel framework designed to enhance hyperspectral pansharpening through improved high-frequency feature representation and token selection. Specifically, THAT introduces: (1) Pivotal Token Selective Attention (PTSA) to prioritize informative tokens and suppress redundancy; (2) a Multi-level Variance-aware Feed-forward Network (MVFN) to enhance high-frequency detail learning. Experiments on standard benchmarks show that THAT achieves state-of-the-art performance with improved reconstruction quality and efficiency.

iFinder: Structured Zero-Shot Vision-Based LLM Grounding for Dash-Cam Video Reasoning

Grounding large language models (LLMs) in domain-specific tasks like post-hoc dash-cam driving video analysis is challenging due to their general-purpose training and lack of structured inductive biases. As vision is often the sole modality available for such analysis (i.e., no LiDAR, GPS, etc.), existing video-based vision-language models (V-VLMs) struggle with spatial reasoning, causal inference, and explainability of events in the input video. To this end, we introduce iFinder, a structured semantic grounding framework that decouples perception from reasoning by translating dash-cam videos into a hierarchical, interpretable data structure for LLMs. iFinder operates as a modular, training-free pipeline that employs pretrained vision models to extract critical cues — object pose, lane positions, and object trajectories — which are hierarchically organized into frame- and video-level structures. Combined with a three-block prompting strategy, it enables step-wise, grounded reasoning for the LLM to refine a peer V-VLM’s outputs and provide accurate reasoning. Evaluations on four public dash-cam video benchmarks show that iFinder’s proposed grounding with domain-specific cues, especially object orientation and global context, significantly outperforms end-to-end V-VLMs on four zero-shot driving benchmarks, with up to 39% gains in accident reasoning accuracy. By grounding LLMs with driving domain-specific representations, iFinder offers a zero-shot, interpretable, and reliable alternative to end-to-end V-VLMs for post-hoc driving video understanding.

Leveraging Digital Twins for AII-Photonics Networks-as-a-Ser­vice: Enabling Innovation and Efficiency

This tutorial presents an architecture and methods for all-photonics networks-as-a-service in distributed Al data center infrastructures. We discuss server-based coherent transceiver architectures, remote transponder control, rapid end-to-end lightpath provisioning, digital longitudinal monitoring, and line-system calibration, demonstrating their feasibility through field validations.

Digital Twins Beyond C-band Using GNPy

GNPy advancements enable accurate and efficient modeling of multiband optical networks for digital twin applications. The developed solvers for Kerr nonlinearity and SRS have been validated through simulation and experimentally in C+L transmission, supporting real-world network planning, design, and performance optimization across disaggregated optical infrastructures.

Energy-based Generative Models for Distributed Acoustic Sensing Event Classification in Telecom Networks

Distributed fiber-optic sensing combined with machine learning enables continuous monitoring of telecom infrastructure. We employ generative modeling for event classification, supporting semi­ supervised learning, uncertainty calibration, and noise resilience. Our approach offers a scalable, data-efficient solution for real-world deployment in complex environments.