Publication Date: 10/2/2023
Event: ICCV 2023
Reference: pp. 5740-5751, 2023
Authors: Abhishek Aich, University of California, Riverside, NEC Laboratories America, Inc.; Samuel Schulter, NEC Laboratories America, Inc.; Amit K. Roy-Chowdhury, University of California, Riverside; Manmohan Chandraker, NEC Laboratories America, Inc.; Yumin Suh, NEC Laboratories America, Inc.
Abstract: We aim to train a multi-task model such that users can adjust the desired compute budget and relative importance of task performances after deployment, without retraining. This enables optimizing performance for dynamically varying user needs, without heavy computational overhead to train and save models for various scenarios. To this end, we propose a multi-task model consisting of a shared encoder and task-specific decoders where both encoder and decoder channel widths are slimmable. Our key idea is to control the task importance by varying the capacities of task-specific decoders, while controlling the total computational cost by jointly adjusting the encoder capacity. This improves overall accuracy by allowing a stronger encoder for a given budget, increases control over computational cost, and delivers high-quality slimmed sub-architectures based on user’s constraints. Our training strategy involves a novel `Configuration-Invariant Knowledge Distillation’ loss that enforces backbone representations to be invariant under different runtime width configurations to enhance accuracy. Further, we present a simple but effective search algorithm that translates user constraints to runtime width configurations of both the shared encoder and task decoders, for sampling the sub-architectures. The key rule for the search algorithm is to provide a larger computational budget to the higher preferred task decoder, while searching a shared encoder configuration that enhances the overall MTL performance. Various experiments on three multi-task benchmarks (PASCALContext, NYUDv2, and CIFAR100-MTL) with diverse backbone architectures demonstrate the advantage of our approach. For example, our method shows a higher controllability by 33.5% in the NYUD-v2 dataset over prior methods, while incurring much less compute cost.