Anomaly Detection on Web-User Behaviors through Deep Learning

The modern Internet has witnessed the proliferation of web applications that play a crucial role in the branding process among enterprises. Web applications provide a communication channel between potential customers and business products. However, web applications are also targeted by attackers due to sensitive information stored in these applications. Among web-related attacks, there exists a rising but more stealthy attack where attackers first access a web application on behalf of normal users based on stolen credentials. Then attackers follow a sequence of sophisticated steps to achieve the malicious purpose. Traditional security solutions fail to detect relevant abnormal behaviors once attackers login to the web application. To address this problem, we propose WebLearner, a novel system to detect abnormal web-user behaviors. As we demonstrate in the evaluation, WebLearner has an outstanding performance. In particular, it can effectively detect abnormal user behaviors with over 96% for both precision and recall rates using a reasonably small amount of normal training data.

VESSELS: Efficient and Scalable Deep Learning Prediction on Trusted Processors

Deep learning systems on the cloud are increasingly targeted by attacks that attempt to steal sensitive data. Intel SGX has been proven effective to protect the confidentiality and integrity of such data during computation. However, state-of-the-art SGX systems still suffer from substantial performance overhead induced by the limited physical memory of SGX. This limitation significantly undermines the usability of deep learning systems due to their memory-intensive characteristics.In this paper, we provide a systematic study on the inefficiency of the existing SGX systems for deep learning prediction with a focus on their memory usage. Our study has revealed two causes of the inefficiency in the current memory usage paradigm: large memory allocation and low memory reusability. Based on this insight, we present Vessels, a new system that addresses the inefficiency and overcomes the limitation on SGX memory through memory usage optimization techniques. Vessels identifies the memory allocation and usage patterns of a deep learning program through model analysis and creates a trusted execution environment with an optimized memory pool, which minimizes the memory footprint with high memory reusability. Our experiments demonstrate that, by significantly reducing the memory footprint and carefully scheduling the workloads, Vessels can achieve highly efficient and scalable deep learning prediction while providing strong data confidentiality and integrity with SGX.

Voting Based Approaches For Differentially Private Federated Learning

Differentially Private Federated Learning (DPFL) is an emerging field with many applications. Gradient averaging-based DPFL methods require costly communication rounds and hardly work with large capacity models due to the explicit dimension dependence in its added noise. In this work, inspired by knowledge transfer non federated privacy learning from Papernot et al.(2017, 2018), we design two new DPFL schemes, by voting among the data labels returned from each local model, instead of averaging the gradients, which avoids the dimension dependence and significantly reduces the communication cost. Theoretically, by applying secure multi party computation, we could exponentially amplify the (data dependent) privacy guarantees when the margin of the voting scores are large. Extensive experiments show that our approaches significantly improve the privacy utility trade off over the state of the arts in DPFL.

New Methods for Non-Destructive Underground Fiber Localization using Distributed Fiber Optic Sensing Technology

To the best of our knowledge, we present the first underground fiber cable position detection methods using distributed fiber optic sensing (DFOS) technology. Meter level localization accuracy is achieved in the results.

3D Finger Vein Biometric Authentication with Photoacoustic Tomography

Biometric authentication is the recognition of human identity via unique anatomical features. The development of novel methods parallels widespread application by consumer devices, law enforcement, and access control. In particular, methods based on finger veins, as compared to face and fingerprints, obviate privacy concerns and degradation due to wear, age, and obscuration. However, they are two-dimensional (2D) and are fundamentally limited by conventional imaging and tissue-light scattering. In this work, for the first time, to the best of our knowledge, we demonstrate a method of three-dimensional (3D) finger vein biometric authentication based on photoacoustic tomography. Using a compact photoacoustic tomography setup and a novel recognition algorithm, the advantages of 3D are demonstrated via biometric authentication of index finger vessels with false acceptance, false rejection, and equal error rates <1.23%, <9.27%, and <0.13%, respectively, when comparing one finger, a false acceptance rate improvement >10× when comparing multiple fingers, and <0.7% when rotating fingers ±30.

Anomalous Event Sequence Detection

Anomaly detection has been widely applied in modern data-driven security applications to detect abnormal events/entities that deviate from the majority. However, less work has been done in terms of detecting suspicious event sequences/paths, which are better discriminators than single events/entities for distinguishing normal and abnormal behaviors in complex systems such as cyber-physical systems. A key and challenging step in this endeavor is how to discover those abnormal event sequences from millions of system event records in an efficient and accurate way. To address this issue, we propose NINA, a network diffusion-based algorithm for identifying anomalous event sequences. Experimental results on both static and streaming data show that NINA is efficient (processes about 2 million records per minute) and accurate.

RFGo: A Seamless Self-checkout System for Apparel Stores Using RFID

Retailers are aiming to enhance customer experience by automating the checkout process. The key impediment here is the effort to manually align the product barcode with the scanner, requiring sequential handling of items without blocking the line-of-sight of the laser beam. While recent systems such as Amazon Go eliminate human involvement using an extensive array of cameras, we propose a privacy-preserving alternative, RFGo, that identifies products using passive RFID tags. Foregoing continuous monitoring of customers throughout the store, RFGo scans the products in a dedicated checkout area that is large enough for customers to simply walk in and stand until the scan is complete (in two seconds). Achieving such low-latency checkout is not possible with traditional RFID readers, which decode tags using one antenna at a time. To overcome this, RFGo includes a custom-built RFID reader that simultaneously decodes a tag’s response from multiple carrier-level synchronized antennas enabling a large set of tag observations in a very short time. RFGo then feeds these observations to a neural network that accurately distinguishes the products within the checkout area from those that are outside. We build a prototype of RFGo and evaluate its performance in challenging scenarios. Our experiments show that RFGo is extremely accurate, fast and well-suited for practical deployment in apparel stores.

Redefining Passive in Backscattering with Commodity Devices

The recent innovation of frequency-shifted (FS) backscatter allows for backscattering with commodity devices, which are inherently half-duplex. However, their reliance on oscillators for generating the frequency-shifting signal on the tag, forces them to incur the transient phase of the oscillator before steady-state operation. We show how the oscillator’s transient phase can pose a fundamental limitation for battery-less tags, resulting in significantly low bandwidth efficiencies, thereby limiting their practical usage.To this end, we propose a novel approach to FS-backscatter called xSHIFT that shifts the core functionality of FS away from the tag and onto the commodity device, thereby eliminating the need for on-tag oscillators altogether. The key innovation in xSHIFT lies in addressing the formidable challenges that arise in making this vision a reality. Specifically, xSHIFT’s design is built on the construct of beating twin carrier tones through a non-linear device to generate the desired FS signal – while the twin RF carriers are generated externally through a careful embedding into the resource units of commodity WiFi transmissions, the beating is achieved through a carefully-designed passive tag circuitry. We prototype xSHIFT’s tag, which is the same form factor as RFID Gen 2 tags, and characterize its promising real-world performance. We believe xSHIFT demonstrates one of the first, truly passive tag designs that has the potential to bring commodity backscatter to consumer spaces.

Prediction of Early Recurrence of Hepatocellular Carcinoma after Resection using Digital Pathology Images Assessed by Machine Learning

Hepatocellular carcinoma (HCC) is a representative primary liver cancer caused by long-term and repetitive liver injury. Surgical resection is generally selected as the radical cure treatment. Because the early recurrence of HCC after resection is associated with low overall survival, the prediction of recurrence after resection is clinically important. However, the pathological characteristics of the early recurrence of HCC have not yet been elucidated. We attempted to predict the early recurrence of HCC after resection based on digital pathologic images of hematoxylin and eosin-stained specimens and machine learning applying a support vector machine (SVM). The 158 HCC patients meeting the Milan criteria who underwent surgical resection were included in this study. The patients were categorized into three groups: Group I, patients with HCC recurrence within 1 year after resection (16 for training and 23 for test), Group II, patients with HCC recurrence between 1 and 2 years after resection (22 and 28), and Group III, patients with no HCC recurrence within 4 years after resection (31 and 38). The SVM-based prediction method separated the three groups with 89.9% (80/89) accuracy. Prediction of Groups I was consistent for all cases, while Group II was predicted to be Group III in one case, and Group III was predicted to be Group II in 8 cases. The use of digital pathology and machine learning could be used for highly accurate prediction of HCC recurrence after surgical resection, especially that for early recurrence. Currently, in most cases after HCC resection, regular blood tests and diagnostic imaging are used for follow-up observation, however, the use of digital pathology coupled with machine learning offers potential as a method for objective postoprative follow-up observation.

Node Classification in Temporal Graphs through Stochastic Sparsification and Temporal Structural Convolution

Node classification in temporal graphs aims to predict node labels based on historical observations. In real-world applications, temporal graphs are complex with both graph topology and node attributes evolving rapidly, which poses a high overfitting risk to existing graph learning approaches. In this paper, we propose a novel Temporal Structural Network (TSNet) model, which jointly learns temporal and structural features for node classification from the sparsified temporal graphs. We show that the proposed TSNet learns how to sparsify temporal graphs to favor the subsequent classification tasks and prevent overfitting from complex neighborhood structures. The effective local features are then extracted by simultaneous convolutions in temporal and spatial domains. Using the standard stochastic gradient descent and backpropagation techniques, TSNet iteratively optimizes sparsification and node representations for subsequent classification tasks. Experimental study on public benchmark datasets demonstrates the competitive performance of the proposed model in node classification. Besides, TSNet has the potential to help domain experts to interpret and visualize the learned models.