Convolutional Neural Networks (CNNs) are a class of deep neural networks designed for tasks involving visual data, such as image and video recognition, object detection, and image classification. CNNs have been particularly successful in computer vision tasks due to their ability to automatically learn hierarchical features from the input data.

Posts

Retrospective : A Dynamically Configurable Coprocessor For Convolutional Neural Networks

In 2008, parallel computing posed significant challenges due to the complexities of parallel programming and the bottlenecks associated with efficient parallel execution. Inspired by the remarkable scalability achieved by networking and storage systems in handling extensive packet traffic and persistent data respectively by leveraging best-effort service, we proposed a new and fundamentally different approach of best-effort computing.Having observed that a broad spectrum of existing and emerging computing workloads were from applications that had an inherent forgiving nature [2], [5], we proposed best effort computing. The new approach resulted in disproportionate gains in power, energy and latency, while improving performance. While contemplating the concept of best-effort computing [2], we noticed the resurgence of convolutional neural networks, which generated approximate but acceptable outcomes for numerous recognition, mining, and synthesis tasks. The lead author of this retrospective had previously conducted research on neural networks for his doctoral dissertation over a decade ago, and the reemergence of neural networks proved both surprising and exciting. Recognizing the connection between best-effort computing and convolutional neural networks, in 2008 we embarked on developing a programmable and dynamically reconfigurable convolutional neural network capable of performing best effort computing for various machine learning tasks that inherently allow for multiple acceptable answers. This combination of our thoughts on best-effort computing and the gradual evolution of convolutional neural networks (deep neural networks emerged much later) culminated in our 2010 ISCA work on dynamically reconfigurable convolutional neural networks.

SplitBrain: Hybrid Data and Model Parallel Deep Learning

The recent success of deep learning applications has coincided with those widely available powerful computational resources for training sophisticated machine learning models with huge datasets. Nonetheless, training large models such as convolutional neural networks using model parallelism (as opposed to data parallelism) is challenging because the complex nature of communication between model shards makes it difficult to partition the computation efficiently across multiple machines with an acceptable trade off. This paper presents SplitBrain, a high performance distributed deep learning framework supporting hybrid data and model parallelism. Specifically, SplitBrain provides layer specific partitioning that co locates compute intensive convolutional layers while sharding memory demanding layers. A novel scalable group communication is proposed to further improve the training throughput with reduced communication overhead. The results show that SplitBrain can achieve nearly linear speedup while saving up to 67% of memory consumption for data and model parallel VGG over CIFAR 10.

DECODE: A Deep-learning Framework for Condensing Enhancers and Refining Boundaries with Large-scale Functional Assays

MotivationMapping distal regulatory elements, such as enhancers, is a cornerstone for elucidating how genetic variations may influence diseases. Previous enhancer-prediction methods have used either unsupervised approaches or supervised methods with limited training data. Moreover, past approaches have implemented enhancer discovery as a binary classification problem without accurate boundary detection, producing low-resolution annotations with superfluous regions and reducing the statistical power for downstream analyses (e.g. causal variant mapping and functional validations). Here, we addressed these challenges via a two-step model called Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays (DECODE). First, we employed direct enhancer-activity readouts from novel functional characterization assays, such as STARR-seq, to train a deep neural network for accurate cell-type-specific enhancer prediction. Second, to improve the annotation resolution, we implemented a weakly supervised object detection framework for enhancer localization with precise boundary detection (to a 10 bp resolution) using Gradient-weighted Class Activation Mapping.ResultsOur DECODE binary classifier outperformed a state-of-the-art enhancer prediction method by 24% in transgenic mouse validation. Furthermore, the object detection framework can condense enhancer annotations to only 13% of their original size, and these compact annotations have significantly higher conservation scores and genome-wide association study variant enrichments than the original predictions. Overall, DECODE is an effective tool for enhancer classification and precise localization.

Ranking-based Convolutional Neural Network Models for Peptide-MHC Binding Prediction

T-cell receptors can recognize foreign peptides bound to major histocompatibility complex (MHC) class-I proteins, and thus trigger the adaptive immune response. Therefore, identifying peptides that can bind to MHC class-I molecules plays a vital role in the design of peptide vaccines. Many computational methods, for example, the state-of-the-art allele-specific method MHCflurry, have been developed to predict the binding affinities between peptides and MHC molecules. In this manuscript, we develop two allele-specific Convolutional Neural Network-based methods named ConvM and SpConvM to tackle the binding prediction problem. Specifically, we formulate the problem as to optimize the rankings of peptide-MHC bindings via ranking-based learning objectives. Such optimization is more robust and tolerant to the measurement inaccuracy of binding affinities, and therefore enables more accurate prioritization of binding peptides. In addition, we develop a new position encoding method in ConvM and SpConvM to better identify the most important amino acids for the binding events. We conduct a comprehensive set of experiments using the latest Immune Epitope Database (IEDB) datasets. Our experimental results demonstrate that our models significantly outperform the state-of-the-art methods including MHCflurry with an average percentage improvement of 6.70% on AUC and 17.10% on ROC5 across 128 alleles.

Learning Structure-And-Motion-Aware Rolling Shutter Correction

An exact method of correcting the rolling shutter (RS) effect requires recovering the underlying geometry, i.e. the scene structures and the camera motions between scanlines or between views. However, the multiple-view geometry for RS cameras is much more complicated than its global shutter (GS) counterpart, with various degeneracies. In this paper, we first make a theoretical contribution by showing that RS two-view geometry is degenerate in the case of pure translational camera motion. In view of the complex RS geometry, we then propose a Convolutional Neural Network (CNN)-based method which learns the underlying geometry (camera motion and scene structure) from just a single RS image and perform RS image correction. We call our method structure-and-motion-aware RS correction because it reasons about the concealed motions between the scanlines as well as the scene structure. Our method learns from a large-scale dataset synthesized in a geometrically meaningful way where the RS effect is generated in a manner consistent with the camera motion and scene structure. In extensive experiments, our method achieves superior performance compared to other state-of-the-art methods for single image RS correction and subsequent Structure from Motion (SfM) applications.

Hierarchical Metric Learning and Matching for 2D and 3D Geometric Correspondences

Interest point descriptors have fueled progress on almost every problem in computer vision. Recent advances in deep neural networks have enabled task-specific learned descriptors that outperform hand-crafted descriptors on many problems. We demonstrate that commonly used metric learning approaches do not optimally leverage the feature hierarchies learned in a Convolutional Neural Network (CNN), especially when applied to the task of geometric feature matching. While a metric loss applied to the deepest layer of a CNN, is often expected to yield ideal features irrespective of the task, in fact the growing receptive field as well as striding effects cause shallower features to be better at high precision matching tasks. We leverage this insight together with explicit supervision at multiple levels of the feature hierarchy for better regularization, to learn more effective descriptors in the context of geometric matching tasks. Further, we propose to use activation maps at different layers of a CNN, as an effective and principled replacement for the multi-resolution image pyramids often used for matching tasks. We propose concrete CNN architectures employing these ideas and evaluate them on multiple datasets for 2D and 3D geometric matching as well as optical flow, demonstrating state-of-the-art results and generalization across datasets.