XPF: Agentic AI System for Business Workflow Automation
Publication Date: 7/20/2025
Event: 3rd Workshop on AI for Systems (AI4Sys 2025) In conjunction with HPDC 2025
Reference: pp. 1-6, 2025
Authors: Kunal Rao, NEC Laboratories America, Inc.; Giuseppe Coviello, NEC Laboratories America, Inc.; Gennaro Mellone, NEC Laboratories America, Inc., University of Napoli, Parthenope; Ciro Giuseppe De Vita, NEC Laboratories America, Inc., University of Napoli, Parthenope; Srimat T. Chakradhar, NEC Laboratories America, Inc.
Abstract: In this paper, we propose a novel agentic AI system called XPF, which enables users to create “agents” using just natural language, where each agent is capable of executing complex, real-world business workflows in an accurate and reliable manner. XPF provides an interface to develop and iterate over the agent creation process and then deploy the agent in production when satisfactory results are produced consistently. The key components of XPF include: (a) planner, which leverages LLM to generate a step-by-step plan, which can further be edited by a human (b) compiler, which leverages LLM to compile the plan into a flow graph (c) executor, which handles distributed execution of the flow graph (using LLM, tools, RAG, etc.) on an underlying cluster and (d) verifier, which helps in verification of the output (through human generated tests or auto-generated tests using LLM). We develop five different agents using XPF and conduct experiments to evaluate one particular aspect i.e. difference in accuracy and reliability of the five agents with “human-generated” vs “auto-generated” plans. Our experiments show that we can get much more accurate and reliable response for a business workflow when step-by-step instructions (in natural language) are given by a human familiar with the workflow, rather than letting the LLM figure out the execution plan steps. In particular, we observe that “human-generated” plan almost always gives 100% accuracy whereas “auto-generated” plan almost never gives 100% accuracy. In terms of reliability, we observe through Rouge-L, Blue and Meteor scores, that the output from “human-generated” plan is much more reliable than “auto-generated” plan.
Publication Link: