Domain oriented Language Modeling with Adaptive Hybrid Masking and Optimal Transport Alignment

Motivated by the success of pre-trained language models such as BERT in a broad range of natural language processing (NLP) tasks, recent research efforts have been made for adapting these models for different application domains. Along this line, existing domain-oriented models have primarily followed the vanilla BERT architecture and have a straightforward use of the domain corpus. However, domain-oriented tasks usually require accurate understanding of domain phrases, and such fine-grained phrase-level knowledge is hard to be captured by existing pre-training scheme. Also, the word co-occurrences guided semantic learning of pre-training models can be largely augmented by entity-level association knowledge. But meanwhile, there is a risk of introducing noise due to the lack of ground truth word-level alignment. To address the issues, we provide a generalized domain-oriented approach, which leverages auxiliary domain knowledge to improve the existing pre-training framework from two aspects. First, to preserve phrase knowledge effectively, we build a domain phrase pool as auxiliary knowledge, meanwhile we introduce Adaptive Hybrid Masked Model to incorporate such knowledge. It integrates two learning modes, word learning and phrase learning, and allows them to switch between each other. Second, we introduce Cross Entity Alignment to leverage entity association as weak supervision to augment the semantic learning of pre-trained models. To alleviate the potential noise in this process, we introduce an interpretable Optimal Transport based approach to guide alignment learning. Experiments on four domain-oriented tasks demonstrate the superiority of our framework.

Multi-Scale One-Class Recurrent Neural Networks for Discrete Event Sequence Anomaly Detection

Discrete event sequences are ubiquitous, such as an ordered event series of process interactions in Information and Communication Technology systems. Recent years have witnessed increasing efforts in detecting anomalies with discrete event sequences. However, it remains an extremely difficult task due to several intrinsic challenges including data imbalance issues, discrete property of the events, and sequential nature of the data. To address these challenges, in this paper, we propose OC4Seq, a multi-scale one-class recurrent neural network for detecting anomalies in discrete event sequences. Specifically, OC4Seq integrates the anomaly detection objective with recurrent neural networks (RNNs) to embed the discrete event sequences into latent spaces, where anomalies can be easily detected. In addition, given that an anomalous sequence could be caused by either individual events, subsequences of events, or the whole sequence, we design a multi-scale RNN framework to capture different levels of sequential patterns simultaneously. We fully implement and evaluate OC4Seq on three real-world system log datasets. The results show that OC4Seq consistently outperforms various representative baselines by a large margin. Moreover, through both quantitative and qualitative analysis, the importance of capturing multi-scale sequential patterns for event anomaly detection is verified. To encourage reproducibility, we make the code and data publicly available.

SIGL: Securing Software Installations Through Deep Graph Learning

Many users implicitly assume that software can only be exploited after it is installed. However, recent supply-chain attacks demonstrate that application integrity must be ensured during installation itself. We introduce SIGL, a new tool for detecting malicious behavior during software installation. SIGL collects traces of system call activity, building a data provenance graph that it analyzes using a novel autoencoder architecture with a graph long short-term memory network (graph LSTM) for the encoder and a standard multilayer perceptron for the decoder. SIGL flags suspicious installations as well as the specific installation-time processes that are likely to be malicious. Using a test corpus of 625 malicious installers containing real-world malware, we demonstrate that SIGL has a detection accuracy of 96%, outperforming similar systems from industry and academia by up to 87% in precision and recall and 45% in accuracy. We also demonstrate that SIGL can pinpoint the processes most likely to have triggered malicious behavior, works on different audit platforms and operating systems, and is robust to training data contamination and adversarial attack. It can be used with application-specific models, even in the presence of new software versions, as well as application-agnostic meta-models that encompass a wide range of applications and installers.

Overcoming Poor Word Embeddings with Word Definitions

Modern natural language understanding models depend on pretrained subword embeddings, but applications may need to reason about words that were never or rarely seen during pretraining. We show that examples that depend critically on a rarer word are more challenging for natural language inference models. Then we explore how a model could learn to use definitions, provided in natural text, to overcome this handicap. Our model’s understanding of a definition is usually weaker than a well-modeled word embedding, but it recovers most of the performance gap from using a completely untrained word.

SkyHAUL: A Self-Organizing Gigabit Network In The Sky

We design and build SkyHaul, the first large-scale, self-organizing network of Unmanned Aerial Vehicles (UAVs) that are connected using a mm Wave wireless mesh backhaul. While the use of a mmWave backhaul paves the way for a new class of bandwidth-intensive, latency-sensitive cooperative applications (e.g. LTE coverage during disasters), the network of UAVs allows these applications to be executed at operating ranges that are far beyond the line-of-sight distances that limit individual UAVs today.To realize the challenging vision of deploying and maintaining an airborne, mm Wave mesh backhaul that caters to dynamic applications, SkyHaul’s design incorporates various elements: (i) Role-specific UAV operations that simultaneously address application tracking and backhaul connectivity (ii) Novel algorithms to jointly address the problem of deployment (position, yaw of UAVs) and traffic routing across the UAV network, and (iii)A provably optimal solution for fast and safe reconfiguration of UAV backhaul during application dynamics. We evaluate the performance of SkyHaul through both real-world UAV flight operations as well as large scale simulations.

DECODE: A Deep-learning Framework for Condensing Enhancers and Refining Boundaries with Large-scale Functional Assays

MotivationMapping distal regulatory elements, such as enhancers, is a cornerstone for elucidating how genetic variations may influence diseases. Previous enhancer-prediction methods have used either unsupervised approaches or supervised methods with limited training data. Moreover, past approaches have implemented enhancer discovery as a binary classification problem without accurate boundary detection, producing low-resolution annotations with superfluous regions and reducing the statistical power for downstream analyses (e.g. causal variant mapping and functional validations). Here, we addressed these challenges via a two-step model called Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays (DECODE). First, we employed direct enhancer-activity readouts from novel functional characterization assays, such as STARR-seq, to train a deep neural network for accurate cell-type-specific enhancer prediction. Second, to improve the annotation resolution, we implemented a weakly supervised object detection framework for enhancer localization with precise boundary detection (to a 10 bp resolution) using Gradient-weighted Class Activation Mapping.ResultsOur DECODE binary classifier outperformed a state-of-the-art enhancer prediction method by 24% in transgenic mouse validation. Furthermore, the object detection framework can condense enhancer annotations to only 13% of their original size, and these compact annotations have significantly higher conservation scores and genome-wide association study variant enrichments than the original predictions. Overall, DECODE is an effective tool for enhancer classification and precise localization.

Hierarchical Imitation Learning with Contextual Bandits for Dynamic Treatment Regimes

Imitation learning has been proved to be effective in mimicking experts’ behaviors from their demonstrations without access to explicit reward signals. Meanwhile, complex tasks, e.g., dynamic treatment regimes for patients with comorbidities, often suggest significant variability in expert demonstrations with multiple sub-tasks. In these cases, it could be difficult to use a single flat policy to handle tasks of hierarchical structures. In this paper, we propose the hierarchical imitation learning model, HIL, to jointly learn latent high-level policies and sub-policies (for individual sub-tasks) from expert demonstrations without prior knowledge. First, HIL learns sub-policies by imitating expert trajectories with the sub-task switching guidance from high-level policies. Second, HIL collects the feedback from its sub-policies to optimize high-level policies, which is modeled as a contextual multi-arm bandit that sequentially selects the best sub-policies at each time step based on the contextual information derived from demonstrations. Compared with state-of-the-art baselines on real-world medical data, HIL improves the likelihood of patient survival and provides better dynamic treatment regimes with the exploitation of hierarchical structures in expert demonstrations.

On Single-User Interactive Beam Alignment in Millimeter Wave Systems: Impact of Feedback Delay

Narrow beams are key to wireless communications in millimeter wave frequency bands. Beam alignment (BA) allows the base station (BS) to adjust the direction and width of the beam used for communication. During BA, the BS transmits a number of scanning beams covering different angular regions. The goal is to minimize the expected width of the uncertainty region (UR) that includes the angle of departure of the user. Conventionally, in interactive BA, it is assumed that the feedback corresponding to each scanning packet is received prior to transmission of the next one. However, in practice, the feedback delay could be larger because of propagation or system constraints. This paper investigates BA strategies that operate under arbitrary fixed feedback delays. This problem is analyzed through a source coding perspective where the feedback sequences are viewed as source codewords. It is shown that these codewords form a codebook with a particular characteristic which is used to define a new class of codes called d—unimodal codes. By analyzing the properties of these codes, a lower bound on the minimum achievable expected beamwidth is provided. The results reveal potential performance improvements in terms of the BA duration it takes to achieve a fixed expected width of the UR over the state-of-the-art BA methods which do not consider the effect of delay.

An Efficient Approach for Placing Distributed Fiber Optic Sensors with Concurrent Sensing Capability

We propose an efficient approach for placing distributed fiber optic sensors (DFOS) with concurrent sensing capability. It consumes 5.7% to 9.5% fewer sensors than that using DFOS without concurrent sensing, for covering the same network.

Field Trial of Cable Safety Protection and Road Traffic Monitoring over Operational 5G Transport Network with Fiber Sensing and On-Premise AI Technologies

We report the distributed-fiber-sensing field trial results over a 5G-transport-network. A standard communication fiber is used with real-time AI processing for cable self-protection, cable-cut threat assessment and road traffic monitoring in a long-term continuous test.