Self-Attention, also known as intra-attention, is a mechanism used in deep learning models, particularly in natural language processing tasks. It enables the model to weigh the importance of different words in a sequence relative to each other. Each word in the sequence contributes to the representation of every other word, allowing the model to capture relationships and dependencies within the input sequence.


Self-Attentive Attributed Network Embedding Through Adversarial Learning

Network embedding aims to learn the low-dimensional representations/embeddings of vertices which preserve the structure and inherent properties of the networks. The resultant embeddings are beneficial to downstream tasks such as vertex classification and link prediction. A vast majority of real-world networks are coupled with a rich set of vertex attributes, which could be potentially complementary in learning better embeddings. Existing attributed network embedding models, with shallow or deep architectures, typically seek to match the representations in topology space and attribute space for each individual vertex by assuming that the samples from the two spaces are drawn uniformly. The assumption, however, can hardly be guaranteed in practice. Due to the intrinsic sparsity of sampled vertex sequences and incompleteness in vertex attributes, the discrepancy between the attribute space and the network topology space inevitably exists. Furthermore, the interactions among vertex attributes, a.k.a cross features, have been largely ignored by existing approaches. To address the above issues, in this paper, we propose Nettention, a self-attentive network embedding approach that can efficiently learn vertex embeddings on attributed network. Instead of sample-wise optimization, Nettention aggregates the two types of information through minimizing the difference between the representation distributions in the low-dimensional topology and attribute spaces. The joint inference is encapsulated in a generative adversarial training process, yielding better generalization performance and robustness. The learned distributions consider both locality-preserving and global reconstruction constraints which can be inferred from the learning of the adversarially regularized autoencoders. Additionally, a multi-head self-attention module is developed to explicitly model the attribute interactions. Extensive experiments on benchmark datasets have verified the effectiveness of the proposed Nettention model on a variety of tasks, including vertex classification and link prediction.