murugan sankaradas Archives | NEC Labs America

Open-SAT: LLM-Guided Query Embedding Refinement for Open-Vocabulary Object Retrieval in Satellite Imagery

May 15, 2026/in Publications/by NEC Labs America

In satellite applications, user queries often take the form of open-ended natural language, extending beyond a fixed set of predefined categories. This open-vocabulary nature poses significant challenges for retrieving relevant image tiles, as the retrieval system must generalize to a wide range of unseen objects and concepts. While vision-language models (VLMs) such as CLIP are widely used for text-image retrieval, even fine-tuned variants often struggle to accurately align such queries with satellite imagery. To address this, we propose Open-SAT, a training-free query embedding refinement algorithm that operates at inference time to improve alignment between user queries and satellite image content. Open-SAT uses VLMs to compute embeddings for image tiles, which are stored in a vector database for efficient retrieval. At query time, it leverages Large Language Models (LLMs) to refine the text embeddings by incorporating contextual information about objects of interest and their surroundings. A threshold-free retrieval mechanism further enhances accuracy and efficiency. Experimental results in three public benchmarks demonstrate that Open-SAT improves the F1 score by up to 16.04%, while retrieving a comparable number of image tiles. These results demonstrate the effectiveness of Open-SAT in open-vocabulary satellite image retrieval, leveraging LLM guidance without the need for additional training or supervision.

TalentScout: Multimodal AI-Driven Expert Finding in Organizations

October 21, 2025/in Publications/by NEC Labs America

Identifying subject-matter experts within organizations remains a challenging task due to the scale, heterogeneity, and unstructured nature of enterprise knowledge assets. We present TalentScout, an AI-driven expert identification system that constructs a unified, skill-centric knowledge graph by ingesting and analyzing diverse media, including research papers, reports, presentations, transcripts, and supervisor recommendations. TalentScout’s modular architecture integrates document parsing, audio/video transcription, metadata extraction, large language model-based skill extraction, multi-factor author disambiguation, and evidence-weighted skill attribution. At query time, TalentScout decomposes natural language queries into canonical skill requirements, traverses the constructed knowledge graph, and ranks experts based on aggregated skill weights, document quality, and endorsement signals, providing document-level justifications for each recommendation. We evaluate TalentScout on multiple public and internal enterprise datasets, including DBLP, TREC Enterprise, Tilburg, and ManConCorpus. Using standard information retrieval metrics such as Precision@ 5, Recall@5, nDCG@5, and Mean Reciprocal Rank (MRR), TalentScout consistently outperforms leading baselines, achieving up to 24% higher Precision@ 5 in early expert retrieval. The results highlight TalentScouts scalability, transparency, and accuracy, establishing it as a practical solution for evidence-based expert discovery and organizational talent management.

SlideCraft: Context-aware Slides Generation Agent

October 21, 2025/in Publications/by NEC Labs America

Creating effective slide presentations requires adapting both content and structure to match the communication context e.g. whether the presentation is for summarizing to executives, or reporting progress to research supervisors. In research and enterprise environments, this need for context-sensitive presentations often leads to repeated, manual reformatting of the same material to suit different audiences. Existing generative systems support slide creation but typically rely on structured inputs, assume a fixed format, and offer limited ability to iteratively refine outputs through natural language feedback. Moreover, they rarely accommodate organizational constraints such as formatting guidelines, domain-specific terminology, or branding requirements. We present SlideCraft, a context-aware generative agent that autonomously creates and edits slide presentations based on natural language instructions. SlideCraft infers the intended presentation context, such as an executive-facing or a project review summary for technical oversight, and selects the appropriate slide template. It then synthesizes content from input documents, enriches it with external knowledge and internal assets, assembles it into a structured intermediate representation, and generates a validated slide deck. SlideCraft supports both first-time slide creation and iterative updates, operating through familiar natural language interfaces like email or messaging tools. Our experiments demonstrate that SlideCraft consistently produces high-quality, context-aware presentations tailored to diverse communication settings, with minimal human input and reliable adherence to enterprise constraints.

Murugan Sankaradas presents TalentScout: Multimodal AI-Driven Expert Finding in Organizations at PICom2025 on October 21st

October 17, 2025/in Events/by NEC Labs America

Murugan Sankaradas (presenting virtually) will present “TalentScout: Multimodal AI-Driven Expert Finding in Organizations” at the IEEE International Conference on Pervasive Intelligence and Computing (PICom2025) on Tuesday, October 21 (10:30am–12pm JST) | Monday, October 20 (9:30–11pm ET) in Hokkaido, Japan.

Kunal Rao presents SlideCraft: Context-Aware Slides Generation Agent at PICom 2025 on October 21st

October 15, 2025/in Events/by NEC Labs America

Kunal Rao (presenting virtually) will present “SlideCraft: Context-Aware Slides Generation Agent” at the IEEE International Conference on Pervasive Intelligence and Computing hashtag#PICom2025 on Tuesday, Oct 21 (10:30am–12pm JST) | Monday, Oct 20 (9:30–11pm ET) in Hokkaido, Japan. SlideCraft uses AI to automatically generate presentation slides from research content, making technical communication faster and context-aware for scientists and professionals.

Roadside Multi-LiDAR Data Fusion for Enhanced Traffic Safety

August 3, 2025/in Publications/by NEC Labs America

Roadside LiDAR (Light Detection and Ranging) sensors promise safer and faster traffic management and vehicular operations. However, occlusion and small view angles are significant challenges to widespread use of roadside LiDARs. We consider fusing data from multiple LiDARs at a traffic intersection to better estimate traffic parameters than one can estimate from a single LiDAR. The key challenge is to calibrate multiple LiDARs both in time and space. The problem is more complex when heterogeneous sensors differ in resolution and are positioned arbitrarily on a traffic intersection.We propose a calibration technique to fuse multiple LiDARs. We show that our technique works on various data granularity and enables real-time analytics for roadside traffic monitoring. We evaluate on a large number of simulated traffic scenarios and show that fusion improves accuracy of vehicle counting and near-collision detection. We apply our algorithm on real traffic data and demonstrate utility in classifying vehicles and detecting occluded traffic participants.

EcoDoc: A Cost-Efficient Multimodal Document Processing System for Enterprises Using LLMs

July 27, 2025/in Publications/by NEC Labs America

Enterprises are increasingly adopting Generative AI applications to extract insights from large volumes of multimodal documents in domains such as finance, law, healthcare, and industry. These documents contain structured and unstructured data (images, charts, handwritten texts, etc.) requiring robust AI systems for effective retrieval and comprehension. Recent advancements in Retrieval-Augmented Generation (RAG) frameworks and Vision-Language Models (VLMs) have improved retrieval performance on multimodal documents by processing pages as images. However, large-scale deployment remains challenging due to the high cost of LLM API usage and the slower inference speed of image-based processing of pages compared to text-based processing. To address these challenges, we propose EcoDoc, a cost-effective multimodal document processing system that dynamically selects the processing modalities for each page as an image or text based on page characteristics and query intent. Our experimental evaluation on TAT-DQA and DocVQA benchmarks shows that EcoDoc reduces average query processing latency by up to 2.29× and cost by up to 10×, without compromising accuracy.

SimCache: Similarity Caching for Efficient VLM-based Scene Understanding

June 11, 2025/in Publications/by NEC Labs America

Scene understanding systems analyze visual contexts by detecting objects, their attributes, and the interactions among them to provide a holistic interpretation. Understanding a scene requires analyzing multiple salient regions within a single video frame. Recently, Vision-Language Models (VLMs) have emerged as powerful tools for scene understanding, leveraging learned world knowledge to enable deployment without specialized training or fine-tuning. However, deploying VLMs in real-time applications is challenging due to their high computational and memory requirements, which limit processing throughput. We propose SimCache, a novel software-based caching mechanism that optimizes VLM-based scene understanding systems by reducing redundant computations. SimCache stores the embedding representation of a salient region and its detected activity, enabling reuse of VLM computations for similar regions in future frames. Specifically, SimCache exploits two types of redundancy: (1) temporal locality, reusing computations for similar regions across adjacent frames, and (2) semantic locality, reusing computations for visually distinct regions that represent the same activity at different times. SimCache includes a multi-tier cache architecture with specialized cache search and refinement policies to exploit redundancy efficiently and accurately. Experiments on action recognition datasets demonstrate that SimCache improves system throughput by up to 9.4× and reduces VLM computations by up to 24.4× with minimal accuracy loss.

Real-Time Network-Aware Roadside LiDAR Data Compression

April 2, 2025/in Publications/by NEC Labs America

LiDAR technology has emerged as a pivotal tool in Intelligent Transportation Systems (ITS), providing unique capabilities that have significantly transformed roadside traffic applications. However, this transformation comes with a distinct challenge: the immense volume of data generated by LiDAR sensors. These sensors produce vast amounts of data every second, which can overwhelm both private and public 5G networks that are used to connect intersections. This data volume makes it challenging to stream raw sensor data across multiple intersections effectively. This paper proposes an efficient real-time compression method for roadside LiDAR data. Our approach exploits a special characteristic of roadside LiDAR data: the background points are consistent across all frames. We detect these background points and send them to edge servers only once. For each subsequent frame, we filter out the background points and compress only the remaining data. This process achieves significant temporal compression by eliminating redundant background data and substantial spatial compression by focusing only on the filtered points. Our method is sensor-agnostic, exceptionally fast, memory-efficient, and adaptable to varying network conditions. It offers a 2.5x increase in compression rates and improves application-level accuracy by 40% compared to current state-of-the-art methods.

CAMTUNER: Adaptive Video Analytics Pipelines via Real-time Automated Camera Parameter Tuning

March 31, 2025/in Publications/by NEC Labs America

In Video Analytics Pipelines (VAP), Analytics Units (AUs) such as object detection and face recognition operating on remote servers rely heavily on surveillance cameras to capture high-quality video streams to achieve high accuracy. Modern network cameras offer an array of parameters that directly influence video quality. While a few of such parameters, e.g., exposure, focus and white balance, are automatically adjusted by the camera internally, the others are not. We denote such camera parameters as non-automated (NAUTO) parameters. In this work, we first show that in a typical surveillance camera deployment, environmental condition changes can have significant adverse effect on the accuracy of insights from the AUs, but such adverse impact can potentially be mitigated by dynamically adjusting NAUTO camera parameters in response to changes in environmental conditions. Second, since most end-users lack the skill or understanding to appropriately configure these parameters and typically use a fixed parameter setting, we present CAMTUNER, to our knowledge, the first framework that dynamically adapts NAUTO camera parameters to optimize the accuracy of AUs in a VAP in response to adverse changes in environmental conditions. CAMTUNER is based on SARSA reinforcement learning and it incorporates two novel components: a light-weight analytics quality estimator and a virtual camera that drastically speed up offline RL training. Our controlled experiments and real-world VAP deployment show that compared to a VAP using the default camera setting, CAMTUNER enhances VAP accuracy by detecting 15.9% additional persons and 2.6%-4.2% additional cars (without any false positives) in a large enterprise parking lot. CAMTUNER opens up new avenues for elevating video analytics accuracy, transcending mere incremental enhancements achieved through refining deep-learning models.

Posts

Open-SAT: LLM-Guided Query Embedding Refinement for Open-Vocabulary Object Retrieval in Satellite Imagery

TalentScout: Multimodal AI-Driven Expert Finding in Organizations

SlideCraft: Context-aware Slides Generation Agent

Murugan Sankaradas presents TalentScout: Multimodal AI-Driven Expert Finding in Organizations at PICom2025 on October 21st

Kunal Rao presents SlideCraft: Context-Aware Slides Generation Agent at PICom 2025 on October 21st

EcoDoc: A Cost-Efficient Multimodal Document Processing System for Enterprises Using LLMs

SimCache: Similarity Caching for Efficient VLM-based Scene Understanding

CAMTUNER: Adaptive Video Analytics Pipelines via Real-time Automated Camera Parameter Tuning

Contact Us

About Us

Our Pages

Recent Publications

Events

News