Kunal Rao NEC Labs America

Kunal Rao

Researcher

Integrated Systems

Posts

TalentScout: Multimodal AI-Driven Expert Finding in Organizations

Identifying subject-matter experts within organizations remains a challenging task due to the scale, heterogeneity, and unstructured nature of enterprise knowledge assets. We present TalentScout, an AI-driven expert identification system that constructs a unified, skill-centric knowledge graph by ingesting and analyzing diverse media, including research papers, reports, presentations, transcripts, and supervisor recommendations. TalentScout’s modular architecture integrates document parsing, audio/video transcription, metadata extraction, large language model-based skill extraction, multi-factor author disambiguation, and evidence-weighted skill attribution. At query time, TalentScout decomposes natural language queries into canonical skill requirements, traverses the constructed knowledge graph, and ranks experts based on aggregated skill weights, document quality, and endorsement signals, providing document-level justifications for each recommendation. We evaluate TalentScout on multiple public and internal enterprise datasets, including DBLP, TREC Enterprise, Tilburg, and ManConCorpus. Using standard information retrieval metrics such as Precision@ 5, Recall@5, nDCG@5, and Mean Reciprocal Rank (MRR), TalentScout consistently outperforms leading baselines, achieving up to 24% higher Precision@ 5 in early expert retrieval. The results highlight TalentScout’s scalability, transparency, and accuracy, establishing it as a practical solution for evidence-based expert discovery and organizational talent management.

SlideCraft: Context-aware Slides Generation Agent

Creating effective slide presentations requires adapting both content and structure to match the communication context e.g. whether the presentation is for summarizing to executives, or reporting progress to research supervisors. In research and enterprise environments, this need for context-sensitive presentations often leads to repeated, manual reformatting of the same material to suit different audiences. Existing generative systems support slide creation but typically rely on structured inputs, assume a fixed format, and offer limited ability to iteratively refine outputs through natural language feedback. Moreover, they rarely accommodate organizational constraints such as formatting guidelines, domain-specific terminology, or branding requirements. We present SlideCraft, a context-aware generative agent that autonomously creates and edits slide presentations based on natural language instructions. SlideCraft infers the intended presentation context, such as an executive-facing or a project review summary for technical oversight, and selects the appropriate slide template. It then synthesizes content from input documents, enriches it with external knowledge and internal assets, assembles it into a structured intermediate representation, and generates a validated slide deck. SlideCraft supports both first-time slide creation and iterative updates, operating through familiar natural language interfaces like email or messaging tools. Our experiments demonstrate that SlideCraft consistently produces high-quality, context-aware presentations tailored to diverse communication settings, with minimal human input and reliable adherence to enterprise constraints.

Murugan Sankaradas presents TalentScout: Multimodal AI-Driven Expert Finding in Organizations at PICom2025 on October 21st

Murugan Sankaradas (presenting virtually) will present “TalentScout: Multimodal AI-Driven Expert Finding in Organizations” at the IEEE International Conference on Pervasive Intelligence and Computing (PICom2025) on Tuesday, October 21 (10:30am–12pm JST) | Monday, October 20 (9:30–11pm ET) in Hokkaido, Japan.

Kunal Rao presents SlideCraft: Context-Aware Slides Generation Agent at PICom 2025 on October 21st

Kunal Rao (presenting virtually) will present “SlideCraft: Context-Aware Slides Generation Agent” at the IEEE International Conference on Pervasive Intelligence and Computing hashtag#PICom2025 on Tuesday, Oct 21 (10:30am–12pm JST) | Monday, Oct 20 (9:30–11pm ET) in Hokkaido, Japan. SlideCraft uses AI to automatically generate presentation slides from research content, making technical communication faster and context-aware for scientists and professionals.

Bifröst: Peer-to-peer Load-balancing for Function Execution in Agentic AI Systems

Agentic AI systems rely on Large Language Models (LLMs) to execute complex tasks by invoking external functions. The efficiency of these systems depends on how well function execution is managed, especially under heterogeneous and high-variance workloads, where function execution times can range from milliseconds to several seconds. Traditional load-balancing techniques, such as round-robin, least-loaded, and Peak-EWMA (used in Linkerd), struggle in such settings: round-robin ignores load imbalance, least-loaded reacts slowly to rapid workload shifts, and Peak-EWMA relies on latency tracking, which is ineffective for workloads with high execution time variability. In this paper, we introduce Bifröst, a peer-to-peer load-balancing mechanism that distributes function requests based on real-time active request count rather than latency estimates. Instead of relying on centralized load-balancers or client-side decisions, Bifröst enables function-serving pods to dynamically distribute load by comparing queue lengths and offloading requests accordingly. This avoids unnecessary overhead while ensuring better responsiveness under high-variance workloads. Our evaluation on open-vocabulary object detection, multi-modal understanding, and code generation workloads shows that Bifröst improves function completion time by up to 20% when processing 13,700 requests from 137 AI agents on a 32-node Kubernetes cluster, outperforming both OpenFaaS and OpenFaaS with Linkerd. In an AI-driven insurance claims processing workflow, Bifröst achieves up to 25% faster execution.

XPF: Agentic AI System for Business Workflow Automation

In this paper, we propose a novel agentic AI system called XPF, which enables users to create “agents” using just natural language, where each agent is capable of executing complex, real-world business workflows in an accurate and reliable manner. XPF provides an interface to develop and iterate over the agent creation process and then deploy the agent in production when satisfactory results are produced consistently. The key components of XPF include: (a) planner, which leverages LLM to generate a step-by-step plan, which can further be edited by a human (b) compiler, which leverages LLM to compile the plan into a flow graph (c) executor, which handles distributed execution of the flow graph (using LLM, tools, RAG, etc.) on an underlying cluster and (d) verifier, which helps in verification of the output (through human generated tests or auto-generated tests using LLM). We develop five different agents using XPF and conduct experiments to evaluate one particular aspect i.e. difference in accuracy and reliability of the five agents with “human-generated” vs “auto-generated” plans. Our experiments show that we can get much more accurate and reliable response for a business workflow when step-by-step instructions (in natural language) are given by a human familiar with the workflow, rather than letting the LLM figure out the execution plan steps. In particular, we observe that “human-generated” plan almost always gives 100% accuracy whereas “auto-generated” plan almost never gives 100% accuracy. In terms of reliability, we observe through Rouge-L, Blue and Meteor scores, that the output from “human-generated” plan is much more reliable than “auto-generated” plan.

Latency-driven Execution of LLM-generated Application Code on the Computing Continuum

Latency-critical applications demand quick responses. Ideally, detailed insights are preferable for the best decision making and response actions. However, in situations when detailed insights cannot be provided quickly, even basic information goes a long way in tackling the situation effectively. For example, in marine security application, it is critical to immediately notify as soon as an unauthorized vessel is seen. Hence, timely response may be prioritized over the response based on entire details. To address such latency-critical situations, in this paper, we propose a novel system called DiCE-EC, which leverages LLM to generate distributed code with speculative execution on Edge (fast and simple response using resource constrained hardware) and Cloud (detailed response using powerful hardware, but may be fast or slow depending on network conditions). DiCE-EC breaks down application into smaller components and executes them asynchronously across the edge and cloud computing continuum. As network conditions vary, we show through real-world marine security application, that DiCE-EC is effective in dynamically choosing detailed insights from cloud when received within latency-constraint, or falling back to simple response from edge to guarantee timely alert delivery. Without such dynamic selection of response from edge or cloud, existing systems either always provide simple responses or drop alerts. We perform real network measurements in the Gulf of Pozzuoli in Naples, Italy along accessible areas (inland and in a Ferry) and generate 1 million realistic measurements across four inaccessible regions, and demonstrate that DiCE-EC never misses an alert, while baseline misses up to ?4% alerts with real data and up to ?1% (10,000 alerts) with generated data.

LLM-based Distributed Code Generation and Cost-Efficient Execution in the Cloud

The advancement of Generative Artificial Intelligence (AI), particularly Large Language Models (LLMs), is reshaping the software industry by automating code generation. Many LLM-driven distributed processing systems rely on serial code generation constrained by predefined libraries, limiting flexibility and adaptability. While some approaches enhance performance through parallel execution or optimize edge-cloud distributed processing for specific domains, they often overlook the cost implications of deployment, restricting scalability and economic feasibility across diverse cloud environments. This paper presents DiCE-C, a system that eliminates these constraints by starting directly from a natural language query. DiCE-C dynamically identifies available tools at runtime, programmatically refines LLM prompts, and employs a stepwise approach—first generating serial code and then transforming it into distributed code. This adaptive methodology enables efficient distributed execution without dependence on specific libraries. By leveraging high-level parallelism at the Application Programming Interface (API) level and managing API execution as services within a Kubernetes-based runtime, DiCE-C reduces idle GPU time and facilitates the use of smaller, cost-effective GPU instances. Experiments with a vision-based insurance application demonstrate that DiCE-C reduces cloud operational costs by up to 72% when using smaller GPUs (A6000 and A4000 GPU machines vs. A100 GPU machine) and by 32% when using identical GPUs (A100 GPU machines). This flexible and cost-efficient approach makes DiCE-C a scalable solution for deploying LLM-generated vision applications in cloud environments.

CAMTUNER: Adaptive Video Analytics Pipelines via Real-time Automated Camera Parameter Tuning

In Video Analytics Pipelines (VAP), Analytics Units (AUs) such as object detection and face recognition operating on remote servers rely heavily on surveillance cameras to capture high-quality video streams to achieve high accuracy. Modern network cameras offer an array of parameters that directly influence video quality. While a few of such parameters, e.g., exposure, focus and white balance, are automatically adjusted by the camera internally, the others are not. We denote such camera parameters as non-automated (NAUTO) parameters. In this work, we first show that in a typical surveillance camera deployment, environmental condition changes can have significant adverse effect on the accuracy of insights from the AUs, but such adverse impact can potentially be mitigated by dynamically adjusting NAUTO camera parameters in response to changes in environmental conditions. Second, since most end-users lack the skill or understanding to appropriately configure these parameters and typically use a fixed parameter setting, we present CAMTUNER, to our knowledge, the first framework that dynamically adapts NAUTO camera parameters to optimize the accuracy of AUs in a VAP in response to adverse changes in environmental conditions. CAMTUNER is based on SARSA reinforcement learning and it incorporates two novel components: a light-weight analytics quality estimator and a virtual camera that drastically speed up offline RL training. Our controlled experiments and real-world VAP deployment show that compared to a VAP using the default camera setting, CAMTUNER enhances VAP accuracy by detecting 15.9% additional persons and 2.6%-4.2% additional cars (without any false positives) in a large enterprise parking lot. CAMTUNER opens up new avenues for elevating video analytics accuracy, transcending mere incremental enhancements achieved through refining deep-learning models.

DiCE-M: Distributed Code Generation and Execution for Marine Applications – An Edge-Cloud Approach

Edge computing has emerged as a transformative technology that reduces application latency, improves cost efficiency, enhances security, and enables large-scale deployment of applications across various domains. In environmental monitoring, systems such as MegaSense[49], use low-cost sensors to gather and process real-time air quality data through edge-cloud collaboration, highlighting the critical role of edge computing in enabling scalable, efficient solutions. Similarly, marine science increasingly requires real-time processing and analysis of marine data from remote, resource-constrained environments. In this paper, we extend the power of edge computing by integrating it with Generative Artificial Intelligence(GenAI),specifically large language models (LLMs), to address challenges in marine science applications. We propose DiCE-M (Distributed Code generation and Execution for Marine applications), a robust system that uses LLM to generate distributed code for marine applications and then utilizes a runtime to efficiently execute it on an edge+cloud computing infrastructure. Specifically, DiCE-M leverages edge computing to execute lightweight AI models locally on unmanned surface vehicles(USVs)while offloading complex tasks to the cloud, thus balancing computational load and enabling realtime monitoring in marine environments. We use marine litter identification as an example application to demonstrate the utility of DiCE-M. Our results show that DiCE-M reduces latency by more than 2X when marine litter is not detected and cuts cloud computing costs by more than half compared to traditional cloud-based approaches. By selectively cropping and transmitting relevant image portions, DiCE-M further improves bandwidth efficiency, making it a reliable and cost-effective solution for deploying AI-driven applications on resource-constrained USVs in dynamic marine environments.