The University of Texas at Dallas (UT Dallas), established in 1969, is a top public university located in one of the nation’s fastest-growing metropolitan regions. With over 31,000 students across seven schools, UT Dallas is recognized for its educational excellence, innovative programs, distinguished faculty, and research that makes a meaningful impact. NEC Labs America partners with the University of Texas at Dallas to study graph-based AI, anomaly detection, and robust sensor data integration for cyber-physical systems. Please read about our latest news and collaborative publications with the University of Texas at Dallas.

Posts

APTrace: A Responsive System for Agile Enterprise Level Causality Analysis

While backtracking analysis has been successful in assisting the investigation of complex security attacks, it faces a critical dependency explosion problem. To address this problem, security analysts currently need to tune backtracking analysis manually with different case-specific heuristics. However, existing systems fail to fulfill two important system requirements to achieve effective backtracking analysis. First, there need flexible abstractions to express various types of heuristics. Second, the system needs to be responsive in providing updates so that the progress of backtracking analysis can be frequently inspected, which typically involves multiple rounds of manual tuning. In this paper, we propose a novel system, APTrace, to meet both of the above requirements. As we demonstrate in the evaluation, security analysts can effectively express heuristics to reduce more than 99.5% of irrelevant events in the backtracking analysis of real-world attack cases. To improve the responsiveness of backtracking analysis, we present a novel execution-window partitioning algorithm that significantly reduces the waiting time between two consecutive updates (especially, 57 times reduction for the top 1% waiting time).

You Are What You Do: Hunting Stealthy Malware via Data Provenance Analysis

To subvert recent advances in perimeter and host security, the attacker community has developed and employed various attack vectors to make malware much more stealthy than before to penetrate the target system and prolong its presence. The advanced malware, or stealthy malware, impersonates or abuses benign applications and legitimate system tools to minimize its footprints in the target system. One example of such stealthy malware is fileless malware, which resides its malicious logic mostly in the memory of well-trusted processes. It is difficult for traditional detection tools, such as malware scanners, to detect it, as the malware normally does not expose its malicious payload in a file and hides its malicious behaviors among the benign behaviors of the processes.In this paper, we present PROVDETECTOR, a provenance-based approach for detecting stealthy malware. The intuition behind PROVDETECTOR is that although a stealthy malware may impersonate or abuse a benign process, it still exposes its malicious behaviors in the OS (operating system) level provenance. Based on this intuition, PROVDETECTOR first employs a novel selection algorithm to identify possibly malicious parts in the OS level provenance data of a process. Then, it applies a neural embedding and machine learning pipeline to automatically detect any behavior that deviates significantly from normal behaviors. We evaluate our approach on a large provenance dataset from an enterprise network and demonstrate that it achieves very high detection performance (an average F1 score of 0.974) of stealthy malware. Further, we conduct thorough interpretability studies to understand the internals of the learned machine learning models.

LogLens: A Real-time Log Analysis System

Administrators of most user-facing systems depend on periodic log data to get an idea of the health and status of production applications. Logs report information, which is crucial to diagnose the root cause of complex problems. In this paper, we present a real-time log analysis system called LogLens that automates the process of anomaly detection from logs with no (or minimal) target system knowledge and user specification. In LogLens, we employ unsupervised machine learning based techniques to discover patterns in application logs, and then leverage these patterns along with the real-time log parsing for designing advanced log analytics applications. Compared to the existing systems which are primarily limited to log indexing and search capabilities, LogLens presents an extensible system for supporting both stateless and stateful log analysis applications. Currently, LogLens is running at the core of a commercial log analysis solution handling millions of logs generated from the large-scale industrial environments and reported up to 12096x man-hours reduction in troubleshooting operational problems compared to the manual approach.