LARA: Latency-Aware Resource Allocator for Stream Processing Applications
One of the key metrics of interest for stream processing applications is latency, which indicates the total time it takes for the application to process and generate insights from streaming input data. For mission-critical video analytics applications like surveillance and monitoring, it is of paramount importance to report an incident as soon as it occurs so that necessary actions can be taken right away. Stream processing applications are typically developed as a chain of microservices and are deployed on container orchestration platforms like Kubernetes. Allocation of system resources like cpu and memory to individual application microservices has direct impact on latency. Kubernetes does provide ways to allocate these resources e.g. through fixed resource allocation or through vertical pod autoscaler (VPA), however there is no straightforward way in Kubernetes to prioritize latency for an end-to end application pipeline. In this paper, we present LARA, which is specifically designed to improve latency of stream processing application pipelines. LARA uses a regression-based technique for resource allocation to individual microservices. We implement four real-world video analytics application pipelines i.e. license plate recognition, face recognition, human attributes detection and pose detection, and show that compared to fixed allocation, LARA is able to reduce latency by up to ? 2.8X and is consistently better than VPA. While reducing latency, LARA is also able to deliver over 2X throughput compared to fixed allocation and is almost always better than VPA.