JPEG remains one of the most widespread lossy image coding methods. However, the non-differentiable nature of JPEG restricts the application in deep learning pipelines. Several differentiable approximations of JPEG have recently been proposed to address this issue. This paper conducts a comprehensive review of existing diff. JPEG approaches and identifies critical details that have been missed by previous methods. To this end, we propose a novel diff. JPEG approach, overcoming previous limitations. Our approach is differentiable w.r.t. the input image, the JPEG quality, the quantization tables, and the color conversion parameters. We evaluate the forward and backward performance of our diff. JPEG approach against existing methods. Additionally, extensive ablations are performed to evaluate crucial design choices. Our proposed diff. JPEG resembles the (non-diff.) reference implementation best, significantly surpassing the recent-best diff. approach by 3.47dB (PSNR) on average. For strong compression rates, we can even improve PSNR by 9.51dB. Strong adversarial attack results are yielded by our diff. JPEG, demonstrating the effective gradient approximation. Our code is available at https://github.com/necla-ml/Diff-JPEG.
Deep Video Codec Control Lossy video compression is commonly used when transmitting and storing video data. Unified video codecs (e.g., H.264 or H.265) remain the emph(Unknown sysvar: (de facto)) standard, despite the availability of advanced (neural) compression approaches. Transmitting videos in the face of dynamic network bandwidth conditions requires video codecs to adapt to vastly different compression strengths. Rate control modules augment the codec’s compression such that bandwidth constraints are satisfied and video distortion is minimized. While, both standard video codes and their rate control modules are developed to minimize video distortion w.r.t. human quality assessment, preserving the downstream performance of deep vision models is not considered. In this paper, we present the first end-to-end learnable deep video codec control considering both bandwidth constraints and downstream vision performance, while not breaking existing standardization. We demonstrate for two common vision tasks (semantic segmentation and optical flow estimation) and on two different datasets that our deep codec control better preserves downstream performance than using 2-pass average bit rate control while meeting dynamic bandwidth constraints and adhering to standardizations.
Source-Free Video Domain Adaptation with Spatial-Temporal-Historical Consistency Learning Source-free domain adaptation (SFDA) is an emerging research topic that studies how to adapt a pretrained source model using unlabeled target data. It is derived from unsupervised domain adaptation but has the advantage of not requiring labeled source data to learn adaptive models. This makes it particularly useful in real-world applications where access to source data is restricted. While there has been some SFDA work for images, little attention has been paid to videos. Naively extending image-based methods to videos without considering the unique properties of videos often leads to unsatisfactory results. In this paper, we propose a simple and highly flexible method for Source-Free Video Domain Adaptation (SFVDA), which extensively exploits consistency learning for videos from spatial, temporal, and historical perspectives. Our method is based on the assumption that videos of the same action category are drawn from the same low-dimensional space, regardless of the spatio-temporal variations in the high-dimensional space that cause domain shifts. To overcome domain shifts, we simulate spatio-temporal variations by applying spatial and temporal augmentations on target videos, and encourage the model to make consistent predictions from a video and its augmented versions. Due to the simple design, our method can be applied to various SFVDA settings, and experiments show that our method achieves state-of-the-art performance for all the settings.
Learning Higher-order Object Interactions for Keypoint-based Video Understanding Action recognition is an important problem that requires identifying actions in video by learning complex interactions across scene actors and objects. However, modern deep-learning based networks often require significant computation and may capture scene context using various modalities that further increases compute costs. Efficient methods such as those used for AR/VR often only use human-keypoint information but suffer from a loss of scene context that hurts accuracy. In this paper, we describe an action-localization method, KeyNet, that uses only the keypoint data for tracking and action recognition. Specifically, KeyNet introduces the use of object based keypoint information to capture context in the scene. Our method illustrates how to build a structured intermediate representation that allows modeling higher-order interactions in the scene from object and human keypoints without using any RGB information. We find that KeyNet is able to track and classify human actions at just 5 FPS. More importantly, we demonstrate that object keypoints can be modeled to recover any loss in context from using keypoint information over AVA action and Kinetics datasets.
4 Independence Way, Suite 200
Princeton, NJ 08540
San Jose Office
2033 Gateway Place, Suite 200
San Jose, CA 95110
NEC Laboratories America, Inc. (NEC Labs) is the US-based center for NEC Corporation’s global network of corporate research laboratories. Our diverse research groups collaborate with industry, academia and governments to provide disruptive solutions to complex problems. A leader in the integration of IT and network technologies with more than 100 years of expertise, NEC provides a combination of products and solutions that cross-utilize the company’s experience and global resources to meet the complex and ever-changing needs of its customers.
Read Our Blog Posts
- Apply for a Summer 2024 Internship
- Unearthing Nature’s Orchestra – How Fiber Optic Cables Can Hear Cicada Secrets
- NEC Labs America Team Heading to NeurIPS23 in New Orleans
- Sarper Ozharar Receives Award from Koç University
- Meet the NEC Labs America Intern Helping to Make Autonomous Vehicles Safer and More Secure
- AI/Fiber-Optic Combo Poised To Improve Telecommunications
- Industrial Labs to Drive Disruptive Innovation for the Fourth Industrial Revolution
- A New Hope: AI Research is Conquering Today’s Computer Vision Plateau
- NEC Labs America’s Time Series Data Research Drives Space Systems Innovation
- Next-Generation Computing Finally Sees Light
- AI/Fiber-Optic Combo Poised To Improve Telecommunications
- Using AI To Safely Put The First Woman On The Moon
- Our AI Research Contributing to NASA’s Artemis Space Program
- NEC provides AI-based traffic monitoring system with fiber-optic sensing technology for NEXCO CENTRAL
- Beyond Communication: Telecom Fiber Networks for Rain Detection and Classification