Latency-driven Execution of LLM-generated Application Code on the Computing Continuum
Latency-critical applications demand quick responses. Ideally, detailed insights are preferable for the best decision making and response actions. However, in situations when detailed insights cannot be provided quickly, even basic information goes a long way in tackling the situation effectively. For example, in marine security application, it is critical to immediately notify as soon as an unauthorized vessel is seen. Hence, timely response may be prioritized over the response based on entire details. To address such latency-critical situations, in this paper, we propose a novel system called DiCE-EC, which leverages LLM to generate distributed code with speculative execution on Edge (fast and simple response using resource constrained hardware) and Cloud (detailed response using powerful hardware, but may be fast or slow depending on network conditions). DiCE-EC breaks down application into smaller components and executes them asynchronously across the edge and cloud computing continuum. As network conditions vary, we show through real-world marine security application, that DiCE-EC is effective in dynamically choosing detailed insights from cloud when received within latency-constraint, or falling back to simple response from edge to guarantee timely alert delivery. Without such dynamic selection of response from edge or cloud, existing systems either always provide simple responses or drop alerts. We perform real network measurements in the Gulf of Pozzuoli in Naples, Italy along accessible areas (inland and in a Ferry) and generate 1 million realistic measurements across four inaccessible regions, and demonstrate that DiCE-EC never misses an alert, while baseline misses up to ?4% alerts with real data and up to ?1% (10,000 alerts) with generated data.