Edge Computing is a distributed computing paradigm that involves processing and analyzing data closer to the source of data generation, rather than relying solely on centralized cloud servers or data centers. In edge computing, computing resources, including servers, storage, and networking equipment, are placed at or near the “edge” of a network, closer to the devices or sensors that produce data. This proximity allows for faster data processing, reduced latency, and more efficient use of bandwidth.

Posts

Scale Up while Scaling Out Microservices in Video Analytics Pipelines

Modern video analytics applications comprise multiple microservices chained together as pipelines and executed on container orchestration platforms like Kubernetes. Kubernetes automatically handles the scaling of these microservices for efficient application execution. There are two popular choices for scaling microservices in Kubernetes i.e. scaling Out using Horizontal Pod Autoscaler (HPA) and scaling Up using Vertical Pod Autoscaler (VPA). Both these have been studied independently, but there isn’t much prior work studying the joint scaling of these two. This paper investigates joint scaling, i.e., scaling up while scaling out (HPA) is in action. In particular, we focus on scaling up CPU resources allocated to the application microservices. We show that allocating fixed resources does not work well for different workloads for video analytics pipelines. We also show that Kubernetes’ VPA in conjunction with HPA does not work well for varying application workloads. As a remedy to this problem, in this paper, we propose DataX AutoScaleUp, which performs efficiently scaling up of CPU resources allocated to microservices in video analytics pipelines while Kubernetes’ HPA is operational. DataX AutoScaleUp uses novel techniques to adjust the allocated computing resources to different microservices in video analytics pipelines to improve overall application performance. Through real-world video analytics applications like Face Recognition and Human Attributes, we show that DataX AutoScaleUp can achieve up to 1.45X improvement in application processing rate when compared to alternative approaches with fixed CPU allocation and dynamic CPU allocation using VPA.

Citizen Science for the Sea with Information Technologies: An Open Platform for Gathering Marine Data and Marine Litter Detection from Leisure Boat Instruments

Data crowdsourcing is an increasingly pervasive and lifestyle-changing technology due to the flywheel effect that results from the interaction between the Internet of Things and Cloud Computing. This paper presents the Citizen Science for the Sea with Information Technologies (C4Sea-IT) framework. It is an open platform for gathering marine data from leisure boat instruments. C4Sea-IT aims to provide a coastal marine data gathering, moving, processing, exchange, and sharing platform using the existing navigation instruments and sensors for today’s leisure and professional vessels. In this work, a use case for the detection and tracking of marine litter is shown. The final goal is weather/ocean forecasts argumentation with Artificial Intelligence prediction models trained with crowdsourced data.

DyCo: Dynamic, Contextualized AI Models

DyCo: Dynamic, Contextualized AI Models Devices with limited computing resources use smaller AI models to achieve low-latency inferencing. However, model accuracy is typically much lower than the accuracy of a bigger model that is trained and deployed in places where the computing resources are relatively abundant. We describe DyCo, a novel system that ensures privacy of stream data and dynamically improves the accuracy of small models used in devices. Unlike knowledge distillation or federated learning, DyCo treats AI models as black boxes. DyCo uses a semi-supervised approach to leverage existing training frameworks and network model architectures to periodically train contextualized, smaller models for resource-constrained devices. DyCo uses a bigger, highly accurate model in the edge-cloud to auto-label data received from each sensor stream. Training in the edge-cloud (as opposed to the public cloud) ensures data privacy, and bespoke models for thousands of live data streams can be designed in parallel by using multiple edge-clouds. DyCo uses the auto-labeled data to periodically re-train, stream-specific, bespoke small models. To reduce the periodic training costs, DyCo uses different policies that are based on stride, accuracy, and confidence information.We evaluate our system, and the contextualized models, by using two object detection models for vehicles and people, and two datasets (a public benchmark and another real-world proprietary dataset). Our results show that DyCo increases the mAP accuracy measure of small models by an average of 16.3% (and up to 20%) for the public benchmark and an average of 19.0% (and up to 64.9%) for the real-world dataset. DyCo also decreases the training costs for contextualized models by more than an order of magnitude.

DataX Allocator: Dynamic resource management for stream analytics at the Edge

DataX Allocator: Dynamic resource management for stream analytics at the Edge Serverless edge computing aims to deploy and manage applications so that developers are unaware of challenges associated with dynamic management, sharing, and maintenance of the edge infrastructure. However, this is a non-trivial task because the resource usage by various edge applications varies based on the content in their input sensor data streams. We present a novel reinforcement-learning (RL) technique to maximize the processing rates of applications by dynamically allocating resources (like CPU cores or memory) to microservices in these applications. We model applications as analytics pipelines consisting of several microservices, and a pipeline’s processing rate directly impacts the accuracy of insights from the application. In our unique problem formulation, the state space or the number of actions of RL is independent of the type of workload in the microservices, the number of microservices in a pipeline, or the number of pipelines. This enables us to learn the RL model only once and use it many times to improve the accuracy of insights for a diverse set of AI/ML engines like action recognition or face recognition and applications with varying microservices. Our experiments with real-world applications, i.e., face recognition and action recognition, show that our approach outperforms other widely-used alternative approaches and achieves up to 2.5X improvement in the overall application processing rate. Furthermore, when we apply our RL model trained on a face recognition pipeline to a different and more complex action recognition pipeline, we obtain a 2X improvement in processing rate, thus showing the versatility and robustness of our RL model to pipeline changes.

Application-specific, Dynamic Reservation of 5G Compute and Network Resources by using Reinforcement Learning

Application-specific, Dynamic Reservation of 5G Compute and Network Resources by using Reinforcement Learning 5G services and applications explicitly reserve compute and network resources in today’s complex and dynamic infrastructure of multi-tiered computing and cellular networking to ensure application-specific service quality metrics, and the infrastructure providers charge the 5G services for the resources reserved. A static, one-time reservation of resources at service deployment typically results in extended periods of under-utilization of reserved resources during the lifetime of the service operation. This is due to a plethora of reasons like changes in content from the IoT sensors (for example, change in number of people in the field of view of a camera) or a change in the environmental conditions around the IoT sensors (for example, time of the day, rain or fog can affect data acquisition by sensors). Under-utilization of a specific resource like compute can also be due to temporary inadequate availability of another resource like the network bandwidth in a dynamic 5G infrastructure. We propose a novel Reinforcement Learning-based online method to dynamically adjust an application’s compute and network resource reservations to minimize under-utilization of requested resources, while ensuring acceptable service quality metrics. We observe that a complex application-specific coupling exists between the compute and network usage of an application. Our proposed method learns this coupling during the operation of the service, and dynamically modulates the compute and network resource requests to mimimize under-utilization of reserved resources. Through experimental evaluation using real-world video analytics application, we show that our technique is able to capture complex compute-network coupling relationship in an online manner i.e. while the application is running, and dynamically adapts and saves up to 65% compute and 93% network resources on average (over multiple runs), without significantly impacting application accuracy.

ROMA: Resource Orchestration for Microservices-based 5G Applications

ROMA: Resource Orchestration for Microservices-based 5G Applications With the growth of 5G, Internet of Things (IoT), edge computing and cloud computing technologies, the infrastructure (compute and network) available to emerging applications (AR/VR, autonomous driving, industry 4.0, etc.) has become quite complex. There are multiple tiers of computing (IoT devices, near edge, far edge, cloud, etc.) that are connected with different types of networking technologies (LAN, LTE, 5G, MAN, WAN, etc.). Deployment and management of applications in such an environment is quite challenging. In this paper, we propose ROMA, which performs resource orchestration for microservices-based 5G applications in a dynamic, heterogeneous, multi-tiered compute and network fabric. We assume that only application-level requirements are known, and the detailed requirements of the individual microservices in the application are not specified. As part of our solution, ROMA identifies and leverages the coupling relationship between compute and network usage for various microservices and solves an optimization problem in order to appropriately identify how each microservice should be deployed in the complex, multi-tiered compute and network fabric, so that the end-to-end application requirements are optimally met. We implemented two real-world 5G applications in video surveillance and intelligent transportation system (ITS) domains. Through extensive experiments, we show that ROMA is able to save up to 90%, 55% and 44% compute and up to 80%, 95% and 75% network bandwidth for the surveillance (watchlist) and transportation application (person and car detection), respectively. This improvement is achieved while honoring the application performance requirements, and it is over an alternative scheme that employs a static and overprovisioned resource allocation strategy by ignoring the resource coupling relationships.

Edge-based fever screening system over private 5G

Edge-based fever screening system over private 5G Edge computing and 5G have made it possible to perform analytics closer to the source of data and achieve super-low latency response times, which isn’t possible with centralized cloud deployment. In this paper, we present a novel fever screening system, which uses edge machine learning techniques and leverages private 5G to accurately identify and screen individuals with fever in real-time. Particularly, we present deep-learning based novel techniques for fusion and alignment of cross-spectral visual and thermal data streams at the edge. Our novel Cross-Spectral Generative Adversarial Network (CS-GAN) synthesizes visual images that have the key, representative object level features required to uniquely associate objects across visual and thermal spectrum. Two key features of CS-GAN are a novel, feature-preserving loss function that results in high-quality pairing of corresponding cross-spectral objects, and dual bottleneck residual layers with skip connections (a new, network enhancement) to not only accelerate real-time inference, but to also speed up convergence during model training at the edge. To the best of our knowledge, this is the first technique that leverages 5G networks and limited edge resources to enable real-time feature-level association of objects in visual and thermal streams (30 ms per full HD frame on an Intel Core i7-8650 4-core, 1.9GHz mobile processor). To the best of our knowledge, this is also the first system to achieve real-time operation, which has enabled fever screening of employees and guests in arenas, theme parks, airports and other critical facilities. By leveraging edge computing and 5G, our fever screening system is able to achieve 98.5% accuracy and is able to process ∼ 5X more people when compared to a centralized cloud deployment.

AppSlice: A system for application-centric design of 5G and edge computing applications

AppSlice: A system for application-centric design of 5G and edge computing applications Applications that use edge computing and 5G to improve response times consume both compute and network resources. However, 5G networks manage only network resources without considering the application’s compute requirements, and container orchestration frameworks manage only compute resources without considering the application’s network requirements. We observe that there is a complex coupling between an application’s compute and network usage, which can be leveraged to improve application performance and resource utilization. We propose a new, declarative abstraction called app slice that jointly considers the application’s compute and network requirements. This abstraction leverages container management systems to manage edge computing resources, and 5G network stacks to manage network resources, while the joint consideration of coupling between compute and network usage is explicitly managed by a new runtime system, which delivers the declarative semantics of the app slice. The runtime system also jointly manages the edge compute and network resource usage automatically across different edge computing environments and 5G networks by using two adaptive algorithms. We implement a complex, real-world, real-time monitoring application using the proposed app slice abstraction, and demonstrate on a private 5G/LTE testbed that the proposed runtime system significantly improves the application performance and resource usage when compared with the case where the coupling between the compute and network resource usage is ignored.

F3S: Free Flow Fever Screening

F3S: Free Flow Fever Screening Identification of people with elevated body temperature can reduce or dramatically slow down the spread of infectious diseases like COVID-19. We present a novel fever-screening system, F 3 S, that uses edge machine learning techniques to accurately measure core body temperatures of multiple individuals in a free-flow setting. F 3 S performs real-time sensor fusion of visual camera with thermal camera data streams to detect elevated body temperature, and it has several unique features: (a) visual and thermal streams represent very different modalities, and we dynamically associate semantically-equivalent regions across visual and thermal frames by using a new, dynamic alignment technique that analyzes content and context in real-time, (b) we track people through occlusions, identify the eye (inner canthus), forehead, face and head regions where possible, and provide an accurate temperature reading by using a prioritized refinement algorithm, and (c) we robustly detect elevated body temperature even in the presence of personal protective equipment like masks, or sunglasses or hats, all of which can be affected by hot weather and lead to spurious temperature readings. F 3 S has been deployed at over a dozen large commercial establishments, providing contact-less, free-flow, real-time fever screening for thousands of employees and customers in indoors and outdoor settings.