LLM-based Distributed Code Generation and Cost-Efficient Execution in the Cloud
Publication Date: 4/6/2025
Event: The Sixteenth International Conference on Cloud Computing, GRIDs, and Virtualization (Cloud Computing 2025)
Reference: pp. 114-121, 2025
Authors: Kunal Rao, NEC Laboratories America, Inc.; Giuseppe Coviello, NEC Laboratories America, Inc.; Gennaro Mellone, NEC Laboratories America, Inc., University of Napoli Parthenope; Ciro Giuseppe De Vita, NEC Laboratories America, Inc., University of Napoli Parthenope; Srimat T. Chakradhar, NEC Laboratories America, Inc.
Abstract: The advancement of Generative Artificial Intelligence (AI), particularly Large Language Models (LLMs), is reshaping the software industry by automating code generation. Many LLM-driven distributed processing systems rely on serial code generation constrained by predefined libraries, limiting flexibility and adaptability. While some approaches enhance performance through parallel execution or optimize edge-cloud distributed processing for specific domains, they often overlook the cost implications of deployment, restricting scalability and economic feasibility across diverse cloud environments. This paper presents DiCE-C, a system that eliminates these constraints by starting directly from a natural language query. DiCE-C dynamically identifies available tools at runtime, programmatically refines LLM prompts, and employs a stepwise approachfirst generating serial code and then transforming it into distributed code. This adaptive methodology enables efficient distributed execution without dependence on specific libraries. By leveraging high-level parallelism at the Application Programming Interface (API) level and managing API execution as services within a Kubernetes-based runtime, DiCE-C reduces idle GPU time and facilitates the use of smaller, cost-effective GPU instances. Experiments with a vision-based insurance application demonstrate that DiCE-C reduces cloud operational costs by up to 72% when using smaller GPUs (A6000 and A4000 GPU machines vs. A100 GPU machine) and by 32% when using identical GPUs (A100 GPU machines). This flexible and cost-efficient approach makes DiCE-C a scalable solution for deploying LLM-generated vision applications in cloud environments.