Md Yusuf Sarwar Uddin works at University of Missouri-Kansas City.


FactionFormer: Context-Driven Collaborative Vision Transformer Models for Edge Intelligence

Edge Intelligence has received attention in the recent times for its potential towards improving responsiveness, reducing the cost of data transmission, enhancing security and privacy, and enabling autonomous decisions by edge devices. However, edge devices lack the power and compute resources necessary to execute most Al models. In this paper, we present FactionFormer, a novel method to deploy resource-intensive deep-learning models, such as vision transformers (ViT), on resource-constrained edge devices. Our method is based on a key observation: edge devices are often deployed in settings where they encounter only a subset of the classes that the resource intensive Al model is trained to classify, and this subset changes across deployments. Therefore, we automatically identify this subset as a faction, devise on-the fly a bespoke resource-efficient ViT called a modelette for the faction and set up an efficient processing pipeline consisting of a modelette on the device, a wireless network such as 5G, and the resource-intensive ViT model on an edge server, all of which work collaboratively to do the inference. For several ViT models pre-trained on benchmark datasets, FactionFormer’s modelettes are up to 4× smaller than the corresponding baseline models in terms of the number of parameters, and they can infer up to 2.5× faster than the baseline setup where every input is processed by the resource-intensive ViT on the edge server. Our work is the first of its kind to propose a device-edge collaborative inference framework where bespoke deep learning models for the device are automatically devised on-the-fly for most frequently encountered subset of classes.

Chimera: Context-Aware Splittable Deep Multitasking Models for Edge Intelligence

Design of multitasking deep learning models has mostly focused on improving the accuracy of the constituent tasks, but the challenges of efficiently deploying such models in a device-edge collaborative setup (that is common in 5G deployments) has not been investigated. Towards this end, in this paper, we propose an approach called Chimera 1 for training (done Offline) and deployment (done Online) of multitasking deep learning models that are splittable across the device and edge. In the offline phase, we train our multi-tasking setup such that features from a pre-trained model for one of the tasks (called the Primary task) are extracted and task-specific sub-models are trained to generate the other (Secondary) tasks’ outputs through a knowledge distillation like training strategy to mimic the outputs of pre-trained models for the tasks. The task-specific sub-models are designed to be significantly lightweight than the original pre-trained models for the Secondary tasks. Once the sub-models are trained, during deployment, for given deployment context, characterized by the configurations, we search for the optimal (in terms of both model performance and cost) deployment strategy for the generated multitasking model, through finding one or multiple suitable layer(s) for splitting the model, so that inference workloads are distributed between the device and the edge server and the inference is done in a collaborative manner. Extensive experiments on benchmark computer vision tasks demonstrate that Chimera generates splittable multitasking models that are at least ~ 3 x parameter efficient than the existing such models, and the end-to-end device-edge collaborative inference becomes ~ 1.35 x faster with our choice of context-aware splitting decisions.