Graph Mining is the process of extracting patterns, structures, or information from graph-structured data. In a graph, nodes represent entities, and edges represent relationships between these entities. Graph mining can uncover relationships, clusters, or anomalies within complex networked data.

Posts

Mastering Long-Tail Complexity on Graphs: Characterization, Learning, and Generalization

In the context of long-tail classification on graphs, the vast majority of existing work primarily revolves around the development of model debiasing strategies, intending to mitigate class imbalances and enhance the overall performance. Despite the notable success, there is very limited literature that provides a theoretical tool for characterizing the behaviors of long-tail classes in graphs and gaining insight into generalization performance in real-world scenarios. To bridge this gap, we propose a generalization bound for long-tail classification on graphs by formulating the problem in the fashion of multi-task learning, i.e., each task corresponds to the prediction of one particular class. Our theoretical results show that the generalization performance of long-tail classification is dominated by the overall loss range and the task complexity. Building upon the theoretical findings, we propose a novel generic framework Hier-Tail for long-tail classification on graphs. In particular, we start with a hierarchical task grouping module that allows us to assign related tasks into hypertasks and thus control the complexity of the task space; then, we further design a balanced contrastive learning module to adaptively balance the gradients of both head and tail classes to control the loss range across all tasks in a unified fashion. Extensive experiments demonstrate the effectiveness of HierTail in characterizing long-tail classes on real graphs, which achieves up to 12.9% improvement over the leading baseline method in balanced accuracy.

Multi-source Inductive Knowledge Graph Transfer

Multi-source Inductive Knowledge Graph Transfer Large-scale information systems, such as knowledge graphs (KGs), enterprise system networks, often exhibit dynamic and complex activities. Recent research has shown that formalizing these information systems as graphs can effectively characterize the entities (nodes) and their relationships (edges). Transferring knowledge from existing well-curated source graphs can help construct the target graph of newly-deployed systems faster and better which no doubt will benefit downstream tasks such as link prediction and anomaly detection for new systems. However, current graph transferring methods are either based on a single source, which does not sufficiently consider multiple available sources, or not selectively learns from these sources. In this paper, we propose MSGT-GNN, a graph knowledge transfer model for efficient graph link prediction from multiple source graphs. MSGT-GNN consists of two components: the Intra-Graph Encoder, which embeds latent graph features of system entities into vectors, and the graph transferor, which utilizes graph attention mechanism to learn and optimize the embeddings of corresponding entities from multiple source graphs, in both node level and graph level. Experimental results on multiple real-world datasets from various domains show that MSGT-GNN outperforms other baseline approaches in the link prediction and demonstrate the merit of attentive graph knowledge transfer and the effectiveness of MSGT-GNN.

Anomalous Event Sequence Detection

Anomaly detection has been widely applied in modern data-driven security applications to detect abnormal events/entities that deviate from the majority. However, less work has been done in terms of detecting suspicious event sequences/paths, which are better discriminators than single events/entities for distinguishing normal and abnormal behaviors in complex systems such as cyber-physical systems. A key and challenging step in this endeavor is how to discover those abnormal event sequences from millions of system event records in an efficient and accurate way. To address this issue, we propose NINA, a network diffusion-based algorithm for identifying anomalous event sequences. Experimental results on both static and streaming data show that NINA is efficient (processes about 2 million records per minute) and accurate.