Multi-Faceted Knowledge-Driven Pre-training for Product Representation Learning

Publication Date: 9/28/2022

Event: IEEE Transactions on Knowledge and Data Engineering

Reference: pp. 1-12, 2022

Authors: Denghui Zhang, Rutgers University; Yanchi Liu, NEC Laboratories America, Inc.; Zixuan Yuan, Rutgers University; Yanjie Fu, University of Central Florida; Hui Xiong, Rutgers University; Haifeng Chen, NEC Laboratories America, Inc.

Abstract: As a key component of e-commerce computing, product representation learning (PRL) provides benefits for a variety of applications, including product matching, search, and categorization. The existing PRL approaches have poor language understanding ability due to their inability to capture contextualized semantics. In addition, the learned representations by existing methods are not easily transferable to new products. Inspired by the recent advance of pre-trained language models (PLMs), we make the attempt to adapt PLMs for PRL to mitigate the above issues. In this article, we develop KINDLE, a Knowledge-drIven pre-trainiNg framework for proDuct representation LEarning, which can preserve the contextual semantics and multi-faceted product knowledge robustly and flexibly. Specifically, we first extend traditional one-stage pre-training to a two-stage pre-training framework and exploit a deliberate knowledge encoder to ensure a smooth knowledge fusion into PLM. In addition, we propose a multi-objective heterogeneous embedding method to represent thousands of knowledge elements. This helps KINDLE calibrate knowledge noise and sparsity automatically by replacing isolated classes as training targets in knowledge acquisition tasks. Furthermore, an input-aware gating network is proposed to select the most relevant knowledge for different downstream tasks. Finally, extensive experiments have demonstrated the advantages of KINDLE over the state-of-the-art baselines across three downstream tasks.

Publication Link: https://ieeexplore.ieee.org/document/9869708