Giuseppe Coviello NEC Labs America

Giuseppe Coviello is a Senior Researcher in the Integrated Systems Department at NEC Laboratories America. He received his undergraduate degree in ​​Computer Science from University of Naples ‘Parthenope’. Giuseppe’s research centers on image and video processing, neural network architectures, and multimedia analysis. Giuseppe’s research spans multiple areas at the intersection of artificial intelligence and systems engineering, with a particular focus on image and video processing, neural network architectures, and multimedia content analysis. A significant aspect of Giuseppe’s work involves the design, development, and deployment of distributed computing systems. At NECLA, he leads and contributes to projects involving AI-enhanced media applications and real-time analytics systems designed for deployment across hybrid cloud and edge computing environments, all of which heavily rely on the principles and architectures of distributed computing. His work plays a critical role in designing and optimizing real-time analytics engines, edge computing platforms for the Internet of Things (IoT), and distributed, fault-tolerant computing frameworks. These efforts are fundamental to supporting the creation of secure, scalable, and efficient infrastructures that meet the demanding needs of enterprise software platforms, next-generation telecommunications networks, and mission-critical applications in infrastructure and public safety. By leveraging the power of distributed computing systems, Giuseppe enables advanced data processing and AI capabilities at scale, ensuring robust performance and reliability across a diverse range of modern digital ecosystems.

Posts

TacTool: Tactical Tool usage in Agentic AI Systems

Large language models (LLMs) are becoming the centerpiece in the design and deployment of Agentic artificial intelligence (AI) systems. AI agents typically have (a) reasoning ability to analyze and think through the given task, (b) context/memory to remember things in the short-term and long-term, and (c) tools at their disposal to interact with the outsideworld. While solving the given task, it must decide whether tool use is required; if so, it must then select the appropriate tool and invoke it with the correct parameters. Although LLMs have advanced considerably in recent years, their tool-use capabilities remain limited. Even OpenAI’s most capable model to date, GPT-5, continues to struggle with reliable tool usage. In this paper, we propose TacTool, which empowers AI agents with improved tool selection and tool call formulation using different LLMs. We conduct experiments using Nestful and Berkeley Function Calling Leaderboard version 3 (BFCLv3) benchmarks and show that TacTool achieves ?27% and ?3% improvement over GPT- 4o on Nestful and BFCL v3 dataset, respectively.

TalentScout: Multimodal AI-Driven Expert Finding in Organizations

Identifying subject-matter experts within organizations remains a challenging task due to the scale, heterogeneity, and unstructured nature of enterprise knowledge assets. We present TalentScout, an AI-driven expert identification system that constructs a unified, skill-centric knowledge graph by ingesting and analyzing diverse media, including research papers, reports, presentations, transcripts, and supervisor recommendations. TalentScout’s modular architecture integrates document parsing, audio/video transcription, metadata extraction, large language model-based skill extraction, multi-factor author disambiguation, and evidence-weighted skill attribution. At query time, TalentScout decomposes natural language queries into canonical skill requirements, traverses the constructed knowledge graph, and ranks experts based on aggregated skill weights, document quality, and endorsement signals, providing document-level justifications for each recommendation. We evaluate TalentScout on multiple public and internal enterprise datasets, including DBLP, TREC Enterprise, Tilburg, and ManConCorpus. Using standard information retrieval metrics such as Precision@ 5, Recall@5, nDCG@5, and Mean Reciprocal Rank (MRR), TalentScout consistently outperforms leading baselines, achieving up to 24% higher Precision@ 5 in early expert retrieval. The results highlight TalentScout’s scalability, transparency, and accuracy, establishing it as a practical solution for evidence-based expert discovery and organizational talent management.

SlideCraft: Context-aware Slides Generation Agent

Creating effective slide presentations requires adapting both content and structure to match the communication context e.g. whether the presentation is for summarizing to executives, or reporting progress to research supervisors. In research and enterprise environments, this need for context-sensitive presentations often leads to repeated, manual reformatting of the same material to suit different audiences. Existing generative systems support slide creation but typically rely on structured inputs, assume a fixed format, and offer limited ability to iteratively refine outputs through natural language feedback. Moreover, they rarely accommodate organizational constraints such as formatting guidelines, domain-specific terminology, or branding requirements. We present SlideCraft, a context-aware generative agent that autonomously creates and edits slide presentations based on natural language instructions. SlideCraft infers the intended presentation context, such as an executive-facing or a project review summary for technical oversight, and selects the appropriate slide template. It then synthesizes content from input documents, enriches it with external knowledge and internal assets, assembles it into a structured intermediate representation, and generates a validated slide deck. SlideCraft supports both first-time slide creation and iterative updates, operating through familiar natural language interfaces like email or messaging tools. Our experiments demonstrate that SlideCraft consistently produces high-quality, context-aware presentations tailored to diverse communication settings, with minimal human input and reliable adherence to enterprise constraints.

Murugan Sankaradas presents TalentScout: Multimodal AI-Driven Expert Finding in Organizations at PICom2025 on October 21st

Murugan Sankaradas (presenting virtually) will present “TalentScout: Multimodal AI-Driven Expert Finding in Organizations” at the IEEE International Conference on Pervasive Intelligence and Computing (PICom2025) on Tuesday, October 21 (10:30am–12pm JST) | Monday, October 20 (9:30–11pm ET) in Hokkaido, Japan.

Bifröst: Peer-to-peer Load-balancing for Function Execution in Agentic AI Systems

Agentic AI systems rely on Large Language Models (LLMs) to execute complex tasks by invoking external functions. The efficiency of these systems depends on how well function execution is managed, especially under heterogeneous and high-variance workloads, where function execution times can range from milliseconds to several seconds. Traditional load-balancing techniques, such as round-robin, least-loaded, and Peak-EWMA (used in Linkerd), struggle in such settings: round-robin ignores load imbalance, least-loaded reacts slowly to rapid workload shifts, and Peak-EWMA relies on latency tracking, which is ineffective for workloads with high execution time variability. In this paper, we introduce Bifröst, a peer-to-peer load-balancing mechanism that distributes function requests based on real-time active request count rather than latency estimates. Instead of relying on centralized load-balancers or client-side decisions, Bifröst enables function-serving pods to dynamically distribute load by comparing queue lengths and offloading requests accordingly. This avoids unnecessary overhead while ensuring better responsiveness under high-variance workloads. Our evaluation on open-vocabulary object detection, multi-modal understanding, and code generation workloads shows that Bifröst improves function completion time by up to 20% when processing 13,700 requests from 137 AI agents on a 32-node Kubernetes cluster, outperforming both OpenFaaS and OpenFaaS with Linkerd. In an AI-driven insurance claims processing workflow, Bifröst achieves up to 25% faster execution.

XPF: Agentic AI System for Business Workflow Automation

In this paper, we propose a novel agentic AI system called XPF, which enables users to create “agents” using just natural language, where each agent is capable of executing complex, real-world business workflows in an accurate and reliable manner. XPF provides an interface to develop and iterate over the agent creation process and then deploy the agent in production when satisfactory results are produced consistently. The key components of XPF include: (a) planner, which leverages LLM to generate a step-by-step plan, which can further be edited by a human (b) compiler, which leverages LLM to compile the plan into a flow graph (c) executor, which handles distributed execution of the flow graph (using LLM, tools, RAG, etc.) on an underlying cluster and (d) verifier, which helps in verification of the output (through human generated tests or auto-generated tests using LLM). We develop five different agents using XPF and conduct experiments to evaluate one particular aspect i.e. difference in accuracy and reliability of the five agents with “human-generated” vs “auto-generated” plans. Our experiments show that we can get much more accurate and reliable response for a business workflow when step-by-step instructions (in natural language) are given by a human familiar with the workflow, rather than letting the LLM figure out the execution plan steps. In particular, we observe that “human-generated” plan almost always gives 100% accuracy whereas “auto-generated” plan almost never gives 100% accuracy. In terms of reliability, we observe through Rouge-L, Blue and Meteor scores, that the output from “human-generated” plan is much more reliable than “auto-generated” plan.

Latency-driven Execution of LLM-generated Application Code on the Computing Continuum

Latency-critical applications demand quick responses. Ideally, detailed insights are preferable for the best decision making and response actions. However, in situations when detailed insights cannot be provided quickly, even basic information goes a long way in tackling the situation effectively. For example, in marine security application, it is critical to immediately notify as soon as an unauthorized vessel is seen. Hence, timely response may be prioritized over the response based on entire details. To address such latency-critical situations, in this paper, we propose a novel system called DiCE-EC, which leverages LLM to generate distributed code with speculative execution on Edge (fast and simple response using resource constrained hardware) and Cloud (detailed response using powerful hardware, but may be fast or slow depending on network conditions). DiCE-EC breaks down application into smaller components and executes them asynchronously across the edge and cloud computing continuum. As network conditions vary, we show through real-world marine security application, that DiCE-EC is effective in dynamically choosing detailed insights from cloud when received within latency-constraint, or falling back to simple response from edge to guarantee timely alert delivery. Without such dynamic selection of response from edge or cloud, existing systems either always provide simple responses or drop alerts. We perform real network measurements in the Gulf of Pozzuoli in Naples, Italy along accessible areas (inland and in a Ferry) and generate 1 million realistic measurements across four inaccessible regions, and demonstrate that DiCE-EC never misses an alert, while baseline misses up to ?4% alerts with real data and up to ?1% (10,000 alerts) with generated data.

LLM-based Distributed Code Generation and Cost-Efficient Execution in the Cloud

The advancement of Generative Artificial Intelligence (AI), particularly Large Language Models (LLMs), is reshaping the software industry by automating code generation. Many LLM-driven distributed processing systems rely on serial code generation constrained by predefined libraries, limiting flexibility and adaptability. While some approaches enhance performance through parallel execution or optimize edge-cloud distributed processing for specific domains, they often overlook the cost implications of deployment, restricting scalability and economic feasibility across diverse cloud environments. This paper presents DiCE-C, a system that eliminates these constraints by starting directly from a natural language query. DiCE-C dynamically identifies available tools at runtime, programmatically refines LLM prompts, and employs a stepwise approach—first generating serial code and then transforming it into distributed code. This adaptive methodology enables efficient distributed execution without dependence on specific libraries. By leveraging high-level parallelism at the Application Programming Interface (API) level and managing API execution as services within a Kubernetes-based runtime, DiCE-C reduces idle GPU time and facilitates the use of smaller, cost-effective GPU instances. Experiments with a vision-based insurance application demonstrate that DiCE-C reduces cloud operational costs by up to 72% when using smaller GPUs (A6000 and A4000 GPU machines vs. A100 GPU machine) and by 32% when using identical GPUs (A100 GPU machines). This flexible and cost-efficient approach makes DiCE-C a scalable solution for deploying LLM-generated vision applications in cloud environments.

CAMTUNER: Adaptive Video Analytics Pipelines via Real-time Automated Camera Parameter Tuning

In Video Analytics Pipelines (VAP), Analytics Units (AUs) such as object detection and face recognition operating on remote servers rely heavily on surveillance cameras to capture high-quality video streams to achieve high accuracy. Modern network cameras offer an array of parameters that directly influence video quality. While a few of such parameters, e.g., exposure, focus and white balance, are automatically adjusted by the camera internally, the others are not. We denote such camera parameters as non-automated (NAUTO) parameters. In this work, we first show that in a typical surveillance camera deployment, environmental condition changes can have significant adverse effect on the accuracy of insights from the AUs, but such adverse impact can potentially be mitigated by dynamically adjusting NAUTO camera parameters in response to changes in environmental conditions. Second, since most end-users lack the skill or understanding to appropriately configure these parameters and typically use a fixed parameter setting, we present CAMTUNER, to our knowledge, the first framework that dynamically adapts NAUTO camera parameters to optimize the accuracy of AUs in a VAP in response to adverse changes in environmental conditions. CAMTUNER is based on SARSA reinforcement learning and it incorporates two novel components: a light-weight analytics quality estimator and a virtual camera that drastically speed up offline RL training. Our controlled experiments and real-world VAP deployment show that compared to a VAP using the default camera setting, CAMTUNER enhances VAP accuracy by detecting 15.9% additional persons and 2.6%-4.2% additional cars (without any false positives) in a large enterprise parking lot. CAMTUNER opens up new avenues for elevating video analytics accuracy, transcending mere incremental enhancements achieved through refining deep-learning models.

G-Litter Marine Litter Dataset Augmentation with Diffusion Models and Large Language Models on GPU Acceleration

Marine litter detection is crucial for environmental monitoring, yet the imbalance in existing datasets limits model performance in identifying various types of waste accurately. This paper presents an efficient data augmentation pipeline that combines generative diffusion models (e.g., Stable Diffusion) and Large Language Models (LLMs) to expand the G-Litter dataset, a marine litter dataset designed for autonomous detection in heterogeneous environments. Leveraging scalable diffusion models for image generation and Alpaca LLMs for diverse prompt generation, our approach augments underrepresented classes by generating over 200 additional images per class, significantly improving the dataset’s balance. Training G-Litter augmented dataset using YOLOv8 for object detection demonstrated an increase in detection performance, improving recall by 7.82% and mAP50 by 3.87% (compared with baseline results). This study emphasizes the potential for combining generative AI with HPC resources to automate data augmentation on large-scale, unstructured datasets, particularly in edge computing contexts for real-time marine monitoring. The models were tested on real videos captured during simulated missions, demonstrating a superior ability to detect submerged objects in dynamic scenarios. These results highlight the potential of generative AI techniques to improve dataset quality and detection model performance, laying the foundation for further expansion in real-time marine monitoring.