Tripping Through Time: Efficient Temporal Localization of Activities in Videos

Localizing moments in untrimmed videos using language queries is a new task that requires the ability to accurately ground language into video. Existing approaches process the video, often more than once, to localize the activities and are inefficient. In this paper, we present TripNet, an end-to-end system which uses a gated attention architecture to model fine grained textual and visual representations in order to align text and video content. Furthermore, TripNet uses reinforcement learning to efficiently localize relevant activity clips in long videos, by learning how to skip around the video saving feature extraction and processing time. In our evaluation over Charades-STA and ActivityNet Captions dataset, we find that TripNet achieves high accuracy and only processes 32-41% of the entire video.

Learning To Simulate

Simulation is a useful tool in situations where training data for machine learning models is costly to annotate or even hard to acquire. In this work, we propose a reinforcement learning-based method for automatically adjusting the parameters of any (non-differentiable) simulator, thereby controlling the distribution of synthesized data in order to maximize the accuracy of a model trained on that data. In contrast to prior art that hand-crafts these simulation parameters or adjusts only parts of the available parameters, our approach fully controls the simulator with the actual underlying goal of maximizing accuracy, rather than mimicking the real data distribution or randomly generating a large volume of data. We find that our approach (i) quickly converges to the optimal simulation parameters in controlled experiments and (ii) can indeed discover good sets of parameters for an image rendering simulator in actual computer vision applications.

Unsupervised Domain Adaptation for Distance Metric Learning

Unsupervised domain adaptation is a promising avenue to enhance the performance of deep neural networks on a target domain, using labels only from a source domain. However, the two predominant methods, domain discrepancy reduction learning and semi-supervised learning, are not readily applicable when source and target domains do not share a common label space. This paper addresses the above scenario by learning a representation space that retains discriminative power on both the (labeled) source and (unlabeled) target domains while keeping representations for the two domains well-separated. Inspired by a theoretical analysis, we first reformulate the disjoint classification task, where the source and target domains correspond to non-overlapping class labels, to a verification one. To handle both within and cross domain verifications, we propose a Feature Transfer Network (FTN) to separate the target feature space from the original source space while aligned with a transformed source space. Moreover, we present a non-parametric multi-class entropy minimization loss to further boost the discriminative power of FTNs on the target domain. In experiments, we first illustrate how FTN works in a controlled setting of adapting from MNIST-M to MNIST with disjoint digit classes between the two domains and then demonstrate the effectiveness of FTNs through state-of-the-art performances on a cross-ethnicity face recognition problem.

Deep Co-Clustering

Co-clustering partitions instances and features simultaneously by leveraging the duality between them, and it often yields impressive performance improvement over traditional clustering algorithms. The recent development in learning deep representations has demonstrated the advantage in extracting effective features. However, the research on leveraging deep learning frameworks for co-clustering is limited for two reasons: 1) current deep clustering approaches usually decouple feature learning and cluster assignment as two separate steps, which cannot yield the task-specific feature representation; 2) existing deep clustering approaches cannot learn representations for instances and features simultaneously. In this paper, we propose a deep learning model for co-clustering called DeepCC. DeepCC utilizes the deep autoencoder for dimension reduction, and employs a variant of Gaussian Mixture Model (GMM) to infer the cluster assignments. A mutual information loss is proposed to bridge the training of instances and features. DeepCC jointly optimizes the parameters of the deep autoencoder and the mixture model in an end-to-end fashion on both the instance and the feature spaces, which can help the deep autoencoder escape from local optima and the mixture model circumvent the Expectation-Maximization (EM) algorithm. To the best of our knowledge, DeepCC is the first deep learning model for co-clustering. Experimental results on various dataseis demonstrate the effectiveness of DeepCC.

Attentional Heterogeneous Graph Neural Network: Application to Program Reidentification

Program or process is an integral part of almost every IT/OT system. Can we trust the identity/ID (e.g., executable name) of the program? To avoid detection, malware may disguise itself using the ID of a legitimate program, and a system tool (e.g., PowerShell) used by the attackers may have the fake ID of another common software, which is less sensitive. However, existing intrusion detection techniques often overlook this critical program reidentification problem (i.e., checking the program’s identity). In this paper, we propose an attentional heterogeneous graph neural network model (DeepHGNN) to verify the program’s identity based on its system behaviors. The key idea is to leverage the representation learning of the heterogeneous program behavior graph to guide the reidentification process. We formulate the program reidentification as a graph classification problem and develop an effective attentional heterogeneous graph embedding algorithm to solve it. Extensive experiments — using real-world enterprise monitoring data and real attacks — demonstrate the effectiveness of DeepHGNN across multiple popular metrics and the robustness to the normal dynamic changes like program version upgrades.

A Deep Spatio-Temporal Fuzzy Neural Network for Passenger Demand Prediction

In spite of its importance, passenger demand prediction is a highly challenging problem, because the demand is simultaneously influenced by the complex interactions among many spatial and temporal factors and other external factors such as weather. To address this problem, we propose a Spatio-TEmporal Fuzzy neural Network (STEF-Net) to accurately predict passenger demands incorporating the complex interactions of all known important factors. We design an end-to-end learning framework with different neural networks modeling different factors. Specifically, we propose to capture spatio-temporal feature interactions via a convolutional long short-term memory network and model external factors via a fuzzy neural network that handles data uncertainty significantly better than deterministic methods. To keep the temporal relations when fusing two networks and emphasize discriminative spatio-temporal feature interactions, we employ a novel feature fusion method with a convolution operation and an attention layer. As far as we know, our work is the first to fuse a deep recurrent neural network and a fuzzy neural network to model complex spatial-temporal feature interactions with additional uncertain input features for predictive learning. Experiments on a large-scale real-world dataset show that our model achieves more than 10% improvement over the state-of-the-art approaches.

Spectrally-Efficient 200G Probabilistically-Shaped 16QAM over 9000km Straight Line Transmission with Flexible Multiplexing Scheme

Flexible wavelength-multiplexing technique in backbone submarine networks has been deployed to accommodate the trend of variable-rate modulation formats. In this paper, we propose a new design of flexible-rate transponders in the scenario of flexible multiplexing scheme to achieve near-Shannon performance. Probabilistic-shaped (PS) M-QAM is capable of adjusting the bit rate at very finer granularity by adapting the entropy of the distribution matcher. Instead of delivering variable bit rates at the fixed baud rate, various baud rates of 200Gb/s PS-16QAM is demonstrated to fit into the flexible grid multiple 3.125GHz bandwidth. This flexible baud rate saves the limited optical bandwidth assigned by the flexible multiplexing scheme to improve bandwidth utilization. The 200G PS-16QAM signals are experimentally demonstrated over 9000km straight-line testbed to achieve 3.05b/s/Hz~5.33 b/s/Hz spectral efficiency (SE) with up to 4dB Q margin. In addition, the high baud rate signals are used for lower SE while low baud rate signals are targeting at high SE transmission to reduce the implementation penalty.

Fiber Nonlinearity Compensation by Neural Networks

Neuron network (NN) is proposed to work together with perturbation-based nonlinearity compensation (NLC) algorithm by feeding with intra-channel cross-phase modulation (IXPM) and intra-channel four-wave mixing (IFWM) triplets. Without prior knowledge of the transmission link and signal pulse shaping/baudrate, the optimum NN architecture and its tensor weights are completely constructed from a data-driven approach by exploring the training datasets. After trimming down the unnecessary input tensors based on their weights, its complexity is further reduced by applying the trained NN model at the transmitter side thanks to the limited alphabet size of the modulation formats. The performance advantage of Tx-side NN-NLC is experimentally demonstrated using both single-channel and WDM-channel 32Gbaud dual-polarization 16QAM over 2800km transmission

Coupled-Core Fiber Design For Enhancing Nonlinearity Tolerance

Fiber nonlinearity is a major limitation on the achievable maximum capacity per fiber core. Digital signal processing (DSP) can be used directly to compensate nonlinear impairments, however with limited effectiveness. It is well known that fibers with higher chromatic dispersion (CD) reduce nonlinear impairments, and CD can be taken care of with DSP. Since, maximum CD is limited by material dispersion of the fiber we propose using strongly-coupled multi-core fibers with large group delay (GD) between the cores. Nonlinear mitigation is achieved through strong mode coupling, and group delay between the cores which suppresses four-wave mixing interaction by inducing large phase-mismatch, albeit stochastic in nature. Through simulations we determine the threshold GD required for noticeable nonlinearity suppression depends on the fiber CD. In particular, for dispersion-uncompensated links a large GD of the order of 1ns per 1000km is required to improve optimum Q by 1 dB. Furthermore, beyond this threshold, larger GD results in larger suppression without any signs of saturation.

PoLPer: Process-Aware Restriction of Over-Privileged Setuid Calls in Legacy Applications

Setuid system calls enable critical functions such as user authentications and modular privileged components. Such operations must only be executed after careful validation. However, current systems do not perform rigorous checks, allowing exploitation of privileges through memory corruption vulnerabilities in privileged programs. As a solution, understanding which setuid system calls can be invoked in what context of a process allows precise enforcement of least privileges. We propose a novel comprehensive method to systematically extract and enforce least privilege of setuid system calls to prevent misuse. Our approach learns the required process contexts of setuid system calls along multiple dimensions: process hierarchy, call stack, and parameter in a process-aware way. Every setuid system call is then restricted to the per-process context by our kernel-level context enforcer. Previous approaches without process-awareness are too coarse-grained to control setuid system calls, resulting in over-privilege. Our method reduces available privileges even for identical code depending on whether it is run by a parent or a child process. We present our prototype called PoLPer which systematically discovers only required setuid system calls and effectively prevents real-world exploits targeting vulnerabilities of the setuid family of system calls in popular desktop and server software at near zero overhead.