Multi-source Inductive Knowledge Graph Transfer

Multi-source Inductive Knowledge Graph Transfer Large-scale information systems, such as knowledge graphs (KGs), enterprise system networks, often exhibit dynamic and complex activities. Recent research has shown that formalizing these information systems as graphs can effectively characterize the entities (nodes) and their relationships (edges). Transferring knowledge from existing well-curated source graphs can help construct the target graph of newly-deployed systems faster and better which no doubt will benefit downstream tasks such as link prediction and anomaly detection for new systems. However, current graph transferring methods are either based on a single source, which does not sufficiently consider multiple available sources, or not selectively learns from these sources. In this paper, we propose MSGT-GNN, a graph knowledge transfer model for efficient graph link prediction from multiple source graphs. MSGT-GNN consists of two components: the Intra-Graph Encoder, which embeds latent graph features of system entities into vectors, and the graph transferor, which utilizes graph attention mechanism to learn and optimize the embeddings of corresponding entities from multiple source graphs, in both node level and graph level. Experimental results on multiple real-world datasets from various domains show that MSGT-GNN outperforms other baseline approaches in the link prediction and demonstrate the merit of attentive graph knowledge transfer and the effectiveness of MSGT-GNN.

SIGL: Securing Software Installations Through Deep Graph Learning

SIGL: Securing Software Installations Through Deep Graph Learning Many users implicitly assume that software can only be exploited after it is installed. However, recent supply-chain attacks demonstrate that application integrity must be ensured during installation itself. We introduce SIGL, a new tool for detecting malicious behavior during software installation. SIGL collects traces of system call activity, building a data provenance graph that it analyzes using a novel autoencoder architecture with a graph long short-term memory network (graph LSTM) for the encoder and a standard multilayer perceptron for the decoder. SIGL flags suspicious installations as well as the specific installation-time processes that are likely to be malicious. Using a test corpus of 625 malicious installers containing real-world malware, we demonstrate that SIGL has a detection accuracy of 96%, outperforming similar systems from industry and academia by up to 87% in precision and recall and 45% in accuracy. We also demonstrate that SIGL can pinpoint the processes most likely to have triggered malicious behavior, works on different audit platforms and operating systems, and is robust to training data contamination and adversarial attack. It can be used with application-specific models, even in the presence of new software versions, as well as application-agnostic meta-models that encompass a wide range of applications and installers.

This is Why We Can’t Cache Nice Things: Lightning-Fast Threat Hunting using Suspicion-Based Hierarchical Storage

This is Why We Can’t Cache Nice Things: Lightning-Fast Threat Hunting using Suspicion-Based Hierarchical Storage Recent advances in the causal analysis can accelerate incident response time, but only after a causal graph of the attack has been constructed. Unfortunately, existing causal graph generation techniques are mainly offline and may take hours or days to respond to investigator queries, creating greater opportunity for attackers to hide their attack footprint, gain persistency, and propagate to other machines. To address that limitation, we present Swift, a threat investigation system that provides high-throughput causality tracking and real-time causal graph generation capabilities. We design an in-memory graph database that enables space-efficient graph storage and online causality tracking with minimal disk operations. We propose a hierarchical storage system that keeps forensically-relevant part of the causal graph in main memory while evicting rest to disk. To identify the causal graph that is likely to be relevant during the investigation, we design an asynchronous cache eviction policy that calculates the most suspicious part of the causal graph and caches only that part in the main memory. We evaluated Swift on a real-world enterprise to demonstrate how our system scales to process typical event loads and how it responds to forensic queries when security alerts occur. Results show that Swift is scalable, modular, and answers forensic queries in real-time even when analyzing audit logs containing tens of millions of events.

VESSELS: Efficient and Scalable Deep Learning Prediction on Trusted Processors

VESSELS: Efficient and Scalable Deep Learning Prediction on Trusted Processors Deep learning systems on the cloud are increasingly targeted by attacks that attempt to steal sensitive data. Intel SGX has been proven effective to protect the confidentiality and integrity of such data during computation. However, state-of-the-art SGX systems still suffer from substantial performance overhead induced by the limited physical memory of SGX. This limitation significantly undermines the usability of deep learning systems due to their memory-intensive characteristics.In this paper, we provide a systematic study on the inefficiency of the existing SGX systems for deep learning prediction with a focus on their memory usage. Our study has revealed two causes of the inefficiency in the current memory usage paradigm: large memory allocation and low memory reusability. Based on this insight, we present Vessels, a new system that addresses the inefficiency and overcomes the limitation on SGX memory through memory usage optimization techniques. Vessels identifies the memory allocation and usage patterns of a deep learning program through model analysis and creates a trusted execution environment with an optimized memory pool, which minimizes the memory footprint with high memory reusability. Our experiments demonstrate that, by significantly reducing the memory footprint and carefully scheduling the workloads, Vessels can achieve highly efficient and scalable deep learning prediction while providing strong data confidentiality and integrity with SGX.

A Generic Edge-Empowered Graph Convolutional Network via Node-Edge Mutual Enhancement

A Generic Edge-Empowered Graph Convolutional Network via Node-Edge Mutual Enhancement Graph Convolutional Networks (GCNs) have shown to be a powerful tool for analyzing graph-structured data. Most of previous GCN methods focus on learning a good node representation by aggregating the representations of neighboring nodes, whereas largely ignoring the edge information. Although few recent methods have been proposed to integrate edge attributes into GCNs to initialize edge embeddings, these methods do not work when edge attributes are (partially) unavailable. Can we develop a generic edge-empowered framework to exploit node-edge enhancement, regardless of the availability of edge attributes? In this paper, we propose a novel framework EE-GCN that achieves node-edge enhancement. In particular, the framework EE-GCN includes three key components: (i) Initialization: this step is to initialize the embeddings of both nodes and edges. Unlike node embedding initialization, we propose a line graph-based method to initialize the embedding of edges regardless of edge attributes. (ii) Feature space alignment: we propose a translation-based mapping method to align edge embedding with node embedding space, and the objective function is penalized by a translation loss when both spaces are not aligned. (iii) Node-edge mutually enhanced updating: node embedding is updated by aggregating embedding of neighboring nodes and associated edges, while edge embedding is updated by the embedding of associated nodes and itself. Through the above improvements, our framework provides a generic strategy for all of the spatial-based GCNs to allow edges to participate in embedding computation and exploit node-edge mutual enhancement. Finally, we present extensive experimental results to validate the improved performances of our method in terms of node classification, link prediction, and graph classification.

APTrace: A Responsive System for Agile Enterprise Level Causality Analysis

APTrace: A Responsive System for Agile Enterprise Level Causality Analysis While backtracking analysis has been successful in assisting the investigation of complex security attacks, it faces a critical dependency explosion problem. To address this problem, security analysts currently need to tune backtracking analysis manually with different case-specific heuristics. However, existing systems fail to fulfill two important system requirements to achieve effective backtracking analysis. First, there need flexible abstractions to express various types of heuristics. Second, the system needs to be responsive in providing updates so that the progress of backtracking analysis can be frequently inspected, which typically involves multiple rounds of manual tuning. In this paper, we propose a novel system, APTrace, to meet both of the above requirements. As we demonstrate in the evaluation, security analysts can effectively express heuristics to reduce more than 99.5% of irrelevant events in the backtracking analysis of real-world attack cases. To improve the responsiveness of backtracking analysis, we present a novel execution-window partitioning algorithm that significantly reduces the waiting time between two consecutive updates (especially, 57 times reduction for the top 1% waiting time).

You Are What You Do: Hunting Stealthy Malware via Data Provenance Analysis

You Are What You Do: Hunting Stealthy Malware via Data Provenance Analysis o subvert recent advances in perimeter and host security, the attacker community has developed and employed various attack vectors to make a malware much more stealthy than before to penetrate the target system and prolong its presence. The advanced malware, or stealthy malware, impersonates or abuses benign applications and legitimate system tools to minimize its footprints in the target system. One example of such stealthy malware is fileless malware, which resides its malicious logic mostly in the memory of well-trusted processes. It is difficult for traditional detection tools, such as malware scanners, to detect it, as the malware normally does not expose its malicious payload in a file and hides its malicious behaviors among the benign behaviors of the processes.In this paper, we present PROVDETECTOR, a provenance-based approach for detecting stealthy malware. The intuition behind PROVDETECTOR is that although a stealthy malware may impersonate or abuse a benign process, it still exposes its malicious behaviors in the OS (operating system) level provenance. Based on this intuition, PROVDETECTOR first employs a novel selection algorithm to identify possibly malicious parts in the OS level provenance data of a process. Then, it applies a neural embedding and machine learning pipeline to automatically detect any behavior that deviates significantly from normal behaviors. We evaluate our approach on a large provenance dataset from an enterprise network and demonstrate that it achieves very high detection performance (an average F1 score of 0.974) of stealthy malware. Further, we conduct thorough interpretability studies to understand the internals of the learned machine learning models.