Publication Date: 8/29/2023
Authors: Christoph Reich, NEC Laboratories America, Inc., Technische Universitat Darmstadt; Biplob Debnath, NEC Laboratories America, Inc.; Deep Patel, NEC Laboratories America, Inc.; Tim Prangemeier, Technische Universitat Darmstadt; Srimat T. Chakradhar, NEC Laboratories America, Inc.
Abstract: Lossy video compression is commonly used when transmitting and storing video data. Unified video codecs (e.g., H.264 or H.265) remain the emph(Unknown sysvar: (de facto)) standard, despite the availability of advanced (neural) compression approaches. Transmitting videos in the face of dynamic network bandwidth conditions requires video codecs to adapt to vastly different compression strengths. Rate control modules augment the codec’s compression such that bandwidth constraints are satisfied and video distortion is minimized. While, both standard video codes and their rate control modules are developed to minimize video distortion w.r.t. human quality assessment, preserving the downstream performance of deep vision models is not considered. In this paper, we present the first end-to-end learnable deep video codec control considering both bandwidth constraints and downstream vision performance, while not breaking existing standardization. We demonstrate for two common vision tasks (semantic segmentation and optical flow estimation) and on two different datasets that our deep codec control better preserves downstream performance than using 2-pass average bit rate control while meeting dynamic bandwidth constraints and adhering to standardizations.
Publication Link: https://arxiv.org/pdf/2308.16215.pdf