SAQL: A Stream-based Query System for Real-Time Abnormal System Behavior Detection

Publication Date: 8/17/2018

Event: The 27th USENIX Security Symposium (USENIX Security 2018)

Reference: pp. 639-656, 2018

Authors: Peng Gao, Princeton University; Xusheng Xiao, Case Western Reserve University; Ding Li, NEC Laboratories America, Inc.; Zhichun Li, NEC Laboratories America, Inc.; Kangkook Jee, NEC Laboratories America, Inc.; Zhenyu Wu, NEC Laboratories America, Inc.; Chung Hwan Kim, NEC Laboratories America, Inc.; Sanjeev R. Kulkarni, Princeton University; Prateek Mittal, Princeton University

Abstract: Recently, advanced cyber attacks, which consist of a sequence of steps that involve many vulnerabilities and hosts, compromise the security of many well-protected businesses. This has led to the solutions that ubiquitously monitor system activities in each host (big data) as a series of events, and search for anomalies (abnormal behaviors) for triaging risky events. Since fighting against these attacks is a time-critical mission to prevent further damage, these solutions face challenges in incorporating expert knowledge to perform timely anomaly detection over the large-scale provenance data.To address these challenges, we propose a novel stream-based query system that takes as input, a real-time event feed aggregated from multiple hosts in an enterprise, and provides an anomaly query engine that queries the event feed to identify abnormal behaviors based on the specified anomalies. To facilitate the task of expressing anomalies based on expert knowledge, our system provides a domain-specific query language, SAQL, which allows analysts to express models for (1) rule-based anomalies, (2) time-series anomalies, (3) invariant-based anomalies, and (4) outlier-based anomalies. We deployed our system in NEC Labs America comprising 150 hosts and evaluated it using 1.1TB of real system monitoring data (containing 3.3 billion events). Our evaluations on a broad set of attack behaviors and micro-benchmarks show that our system has a low detection latency (<2s) and a high system throughput (110,000 events/s; supporting ~4000 hosts), and is more efficient in memory utilization than the existing stream-based complex event processing systems.

Publication Link: