Mastering Long-Tail Complexity on Graphs: Characterization, Learning, and Generalization

Publication Date: 8/28/2024

Event: 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2024)

Reference: pp. 3045-3056, 2024

Authors: Haohui Wang, Virginia Tech; Baoyu Jing, University of Illinois Urbana-Champaign; Kaize Ding, Northwestern University; Yada Zhu, IBM Research; Wei Cheng, NEC Laboratories America, Inc.; Si Zhang, Meta; Yonghui Fan, Amazon AGI; Liqing Zhang, Virginia Tech; Dawei Zhou, Virginia Tech

Abstract: In the context of long-tail classification on graphs, the vast majority of existing work primarily revolves around the development of model debiasing strategies, intending to mitigate class imbalances and enhance the overall performance. Despite the notable success, there is very limited literature that provides a theoretical tool for characterizing the behaviors of long-tail classes in graphs and gaining insight into generalization performance in real-world scenarios. To bridge this gap, we propose a generalization bound for long-tail classification on graphs by formulating the problem in the fashion of multi-task learning, i.e., each task corresponds to the prediction of one particular class. Our theoretical results show that the generalization performance of long-tail classification is dominated by the overall loss range and the task complexity. Building upon the theoretical findings, we propose a novel generic framework Hier-Tail for long-tail classification on graphs. In particular, we start with a hierarchical task grouping module that allows us to assign related tasks into hypertasks and thus control the complexity of the task space; then, we further design a balanced contrastive learning module to adaptively balance the gradients of both head and tail classes to control the loss range across all tasks in a unified fashion. Extensive experiments demonstrate the effectiveness of HierTail in characterizing long-tail classes on real graphs, which achieves up to 12.9% improvement over the leading baseline method in balanced accuracy.

Publication Link: https://dl.acm.org/doi/10.1145/3637528.3671880