Latency-driven Execution of LLM-generated Application Code on the Computing Continuum
Publication Date: 5/19/2025
Event: The Third Workshop on Urgent Analytics for Distributed Computing (QUICK25) at CCGrid 2025
Reference: pp. 17-25, 2025
Authors: Kunal Rao, NEC Laboratories America, Inc.; Giuseppe Coviello, NEC Laboratories America, Inc.; Ciro Giuseppe De Vita, NEC Laboratories America, Inc., University of Napoli, Parthenope; Gennaro Mellone, NEC Laboratories America, Inc., University of Napoli, Parthenope; Mohammad A. Khojastepour, NEC Laboratories America, Inc.; Srimat T. Chakradhar, NEC Laboratories America, Inc.
Abstract: Latency-critical applications demand quick responses. Ideally, detailed insights are preferable for the best decision-making and response actions. However, in situations when detailed insights cannot be provided quickly, even basic information goes a long way in tackling the situation effectively. For example, in a marine security application, it is critical to immediately notify as soon as an unauthorized vessel is seen. Hence, a timely response may be prioritized over the response based on the entire details. To address such latency-critical situations, in this paper, we propose a novel system called DiCE-EC, which leverages LLM to generate distributed code with speculative execution on Edge (fast and simple response using resource constrained hardware) and Cloud (detailed response using powerful hardware, but may be fast or slow depending on network conditions). DiCE-EC breaks down the application into smaller components and executes them asynchronously across the edge and cloud computing continuum. As network conditions vary, we show through real-world marine security application, that DiCE-EC is effective in dynamically choosing detailed insights from cloud when received within latency-constraint, or falling back to simple response from edge to guarantee timely alert delivery. Without such dynamic selection of response from edge or cloud, existing systems either always provide simple responses or drop alerts. We perform real network measurements in the Gulf of Pozzuoli in Naples, Italy along accessible areas (inland and in a Ferry) and generate 1 million realistic measurements across four inaccessible regions, and demonstrate that DiCE-EC never misses an alert, while baseline misses up to ?4% alerts with real data and up to ?1% (10,000 alerts) with generated data.
Publication Link: