Large Language Models Can Be Contextual Privacy Protection Learners

Publication Date: 11/13/2024

Event: The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024)

Reference: pp. 14179–14201, 2024

Authors: Yijia Xiao, University of California, Los Angeles; Yiqiao Jin, Georgia Institute of Technology; Yushi Bai, Tsinghua University; Yue Wu, University of California, Los Angeles; Xianjun Yang, University of California, Santa Barbara; Xiao Luo, University of California, Los Angeles; Wenchao Yu, NEC Laboratories America, Inc.; Xujiang Zhao, NEC Laboratories America, Inc.; Yanchi Liu, NEC Laboratories America, Inc.; Quanquan Gu, University of California, Los Angeles; Haifeng Chen, NEC Laboratories America, Inc.; Wei Wang, University of California, Los Angeles; Wei Cheng, NEC Laboratories America, Inc.

Abstract: The proliferation of Large Language Models (LLMs) has driven considerable interest in fine-tuning them with domain-specific data to create specialized language models. Nevertheless, such domain-specific fine-tuning data often contains contextually sensitive personally identifiable information (PII). Direct fine-tuning LLMs on this data without privacy protection poses a risk of data leakage of sensitive PII during inference time. To address this challenge, we introduce Contextual Privacy Protection Language Models (CPPLM), a novel paradigm for fine-tuning LLMs that effectively injects domain-specific knowledge while safeguarding inference-time data privacy. Our work offers a theoretical analysis for model design and delves into various techniques such as corpus curation, penalty-based unlikelihood in training loss, and instruction-based tuning, etc. Extensive experiments across diverse datasets and scenarios demonstrate the effectiveness of our approaches. In particular, instruction tuning with both positive and negative examples, stands out as a promising method, effectively protecting private data while enhancing the model s knowledge. Our work underscores the potential for Large Language Models as robust contextual privacy protection learners.

Publication Link: https://aclanthology.org/2024.emnlp-main.785/