NEC Labs Blue Logo Square

Srimat T. Chakradhar

Department Head

Integrated Systems

Posts

TalentScout: Multimodal AI-Driven Expert Finding in Organizations

Identifying subject-matter experts within organizations remains a challenging task due to the scale, heterogeneity, and unstructured nature of enterprise knowledge assets. We present TalentScout, an AI-driven expert identification system that constructs a unified, skill-centric knowledge graph by ingesting and analyzing diverse media, including research papers, reports, presentations, transcripts, and supervisor recommendations. TalentScout’s modular architecture integrates document parsing, audio/video transcription, metadata extraction, large language model-based skill extraction, multi-factor author disambiguation, and evidence-weighted skill attribution. At query time, TalentScout decomposes natural language queries into canonical skill requirements, traverses the constructed knowledge graph, and ranks experts based on aggregated skill weights, document quality, and endorsement signals, providing document-level justifications for each recommendation. We evaluate TalentScout on multiple public and internal enterprise datasets, including DBLP, TREC Enterprise, Tilburg, and ManConCorpus. Using standard information retrieval metrics such as Precision@ 5, Recall@5, nDCG@5, and Mean Reciprocal Rank (MRR), TalentScout consistently outperforms leading baselines, achieving up to 24% higher Precision@ 5 in early expert retrieval. The results highlight TalentScout’s scalability, transparency, and accuracy, establishing it as a practical solution for evidence-based expert discovery and organizational talent management.

SlideCraft: Context-aware Slides Generation Agent

Creating effective slide presentations requires adapting both content and structure to match the communication context e.g. whether the presentation is for summarizing to executives, or reporting progress to research supervisors. In research and enterprise environments, this need for context-sensitive presentations often leads to repeated, manual reformatting of the same material to suit different audiences. Existing generative systems support slide creation but typically rely on structured inputs, assume a fixed format, and offer limited ability to iteratively refine outputs through natural language feedback. Moreover, they rarely accommodate organizational constraints such as formatting guidelines, domain-specific terminology, or branding requirements. We present SlideCraft, a context-aware generative agent that autonomously creates and edits slide presentations based on natural language instructions. SlideCraft infers the intended presentation context, such as an executive-facing or a project review summary for technical oversight, and selects the appropriate slide template. It then synthesizes content from input documents, enriches it with external knowledge and internal assets, assembles it into a structured intermediate representation, and generates a validated slide deck. SlideCraft supports both first-time slide creation and iterative updates, operating through familiar natural language interfaces like email or messaging tools. Our experiments demonstrate that SlideCraft consistently produces high-quality, context-aware presentations tailored to diverse communication settings, with minimal human input and reliable adherence to enterprise constraints.

Murugan Sankaradas presents TalentScout: Multimodal AI-Driven Expert Finding in Organizations at PICom2025 on October 21st

Murugan Sankaradas (presenting virtually) will present “TalentScout: Multimodal AI-Driven Expert Finding in Organizations” at the IEEE International Conference on Pervasive Intelligence and Computing (PICom2025) on Tuesday, October 21 (10:30am–12pm JST) | Monday, October 20 (9:30–11pm ET) in Hokkaido, Japan.

Kunal Rao presents SlideCraft: Context-Aware Slides Generation Agent at PICom 2025 on October 21st

Kunal Rao (presenting virtually) will present “SlideCraft: Context-Aware Slides Generation Agent” at the IEEE International Conference on Pervasive Intelligence and Computing hashtag#PICom2025 on Tuesday, Oct 21 (10:30am–12pm JST) | Monday, Oct 20 (9:30–11pm ET) in Hokkaido, Japan. SlideCraft uses AI to automatically generate presentation slides from research content, making technical communication faster and context-aware for scientists and professionals.

Bifröst: Peer-to-peer Load-balancing for Function Execution in Agentic AI Systems

Agentic AI systems rely on Large Language Models (LLMs) to execute complex tasks by invoking external functions. The efficiency of these systems depends on how well function execution is managed, especially under heterogeneous and high-variance workloads, where function execution times can range from milliseconds to several seconds. Traditional load-balancing techniques, such as round-robin, least-loaded, and Peak-EWMA (used in Linkerd), struggle in such settings: round-robin ignores load imbalance, least-loaded reacts slowly to rapid workload shifts, and Peak-EWMA relies on latency tracking, which is ineffective for workloads with high execution time variability. In this paper, we introduce Bifröst, a peer-to-peer load-balancing mechanism that distributes function requests based on real-time active request count rather than latency estimates. Instead of relying on centralized load-balancers or client-side decisions, Bifröst enables function-serving pods to dynamically distribute load by comparing queue lengths and offloading requests accordingly. This avoids unnecessary overhead while ensuring better responsiveness under high-variance workloads. Our evaluation on open-vocabulary object detection, multi-modal understanding, and code generation workloads shows that Bifröst improves function completion time by up to 20% when processing 13,700 requests from 137 AI agents on a 32-node Kubernetes cluster, outperforming both OpenFaaS and OpenFaaS with Linkerd. In an AI-driven insurance claims processing workflow, Bifröst achieves up to 25% faster execution.

Roadside Multi-LiDAR Data Fusion for Enhanced Traffic Safety

Roadside LiDAR (Light Detection and Ranging) sensors promise safer and faster traffic management and vehicular operations. However, occlusion and small view angles are significant challenges to widespread use of roadside LiDARs. We consider fusing data from multiple LiDARs at a traffic intersection to better estimate traffic parameters than one can estimate from a single LiDAR. The key challenge is to calibrate multiple LiDARs both in time and space. The problem is more complex when heterogeneous sensors differ in resolution and are positioned arbitrarily on a traffic intersection.We propose a calibration technique to fuse multiple LiDARs. We show that our technique works on various data granularity and enables real-time analytics for roadside traffic monitoring. We evaluate on a large number of simulated traffic scenarios and show that fusion improves accuracy of vehicle counting and near-collision detection. We apply our algorithm on real traffic data and demonstrate utility in classifying vehicles and detecting occluded traffic participants.

EcoDoc: A Cost-Efficient Multimodal Document Processing System for Enterprises Using LLMs

Enterprises are increasingly adopting Generative AI applications to extract insights from large volumes of multimodal documents in domains such as finance, law, healthcare, and industry. These documents contain structured and unstructured data (images, charts, handwritten texts, etc.) requiring robust AI systems for effective retrieval and comprehension. Recent advancements in Retrieval-Augmented Generation (RAG) frameworks and Vision-Language Models (VLMs) have improved retrieval performance on multimodal documents by processing pages as images. However, large-scale deployment remains challenging due to the high cost of LLM API usage and the slower inference speed of image-based processing of pages compared to text-based processing. To address these challenges, we propose EcoDoc, a cost-effective multimodal document processing system that dynamically selects the processing modalities for each page as an image or text based on page characteristics and query intent. Our experimental evaluation on TAT-DQA and DocVQA benchmarks shows that EcoDoc reduces average query processing latency by up to 2.29× and cost by up to 10×, without compromising accuracy.

XPF: Agentic AI System for Business Workflow Automation

In this paper, we propose a novel agentic AI system called XPF, which enables users to create “agents” using just natural language, where each agent is capable of executing complex, real-world business workflows in an accurate and reliable manner. XPF provides an interface to develop and iterate over the agent creation process and then deploy the agent in production when satisfactory results are produced consistently. The key components of XPF include: (a) planner, which leverages LLM to generate a step-by-step plan, which can further be edited by a human (b) compiler, which leverages LLM to compile the plan into a flow graph (c) executor, which handles distributed execution of the flow graph (using LLM, tools, RAG, etc.) on an underlying cluster and (d) verifier, which helps in verification of the output (through human generated tests or auto-generated tests using LLM). We develop five different agents using XPF and conduct experiments to evaluate one particular aspect i.e. difference in accuracy and reliability of the five agents with “human-generated” vs “auto-generated” plans. Our experiments show that we can get much more accurate and reliable response for a business workflow when step-by-step instructions (in natural language) are given by a human familiar with the workflow, rather than letting the LLM figure out the execution plan steps. In particular, we observe that “human-generated” plan almost always gives 100% accuracy whereas “auto-generated” plan almost never gives 100% accuracy. In terms of reliability, we observe through Rouge-L, Blue and Meteor scores, that the output from “human-generated” plan is much more reliable than “auto-generated” plan.

Re-ranking the Context for Multimodal Retrieval Augmented Generation

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge to generate a response within a context with improved accuracy and reduced hallucinations. However, multi-modal RAG systems face unique challenges: (i) the retrieval process may select irrelevant entries to user query (e.g., images, documents), and (ii) vision-language models or multi-modal language models like GPT-4o may hallucinate when processing these entries to generate RAG output. In this paper, we aim to address the first challenge, i.e, improving the selection of relevant context from the knowledge-base in retrieval phase of the multi-modal RAG. Specifically, we leverage the relevancy score (RS) measure designed in our previous work for evaluating the RAG performance to select more relevant entries in retrieval process. The retrieval based on embeddings, say CLIP-based embedding, and cosine similarity usually perform poorly particularly for multi-modal data. We show that by using a more advanced relevancy measure, one can enhance the retrieval process by selecting more relevant pieces from the knowledge-base and eliminate the irrelevant pieces from the context by adaptively selecting up-to-?? entries instead of fixed number of entries. Our evaluation using COCO dataset demonstrates significant enhancement in selecting relevant context and accuracy of the generated response.

SimCache: Similarity Caching for Efficient VLM-based Scene Understanding

Scene understanding systems analyze visual contexts by detecting objects, their attributes, and the interactions among them to provide a holistic interpretation. Understanding a scene requires analyzing multiple salient regions within a single video frame. Recently, Vision-Language Models (VLMs) have emerged as powerful tools for scene understanding, leveraging learned world knowledge to enable deployment without specialized training or fine-tuning. However, deploying VLMs in real-time applications is challenging due to their high computational and memory requirements, which limit processing throughput. We propose SimCache, a novel software-based caching mechanism that optimizes VLM-based scene understanding systems by reducing redundant computations. SimCache stores the embedding representation of a salient region and its detected activity, enabling reuse of VLM computations for similar regions in future frames. Specifically, SimCache exploits two types of redundancy: (1) temporal locality, reusing computations for similar regions across adjacent frames, and (2) semantic locality, reusing computations for visually distinct regions that represent the same activity at different times. SimCache includes a multi-tier cache architecture with specialized cache search and refinement policies to exploit redundancy efficiently and accurately. Experiments on action recognition datasets demonstrate that SimCache improves system throughput by up to 9.4× and reduces VLM computations by up to 24.4× with minimal accuracy loss.