A Survey on Detection of LLMs-Generated Content

Publication Date: 11/13/2024

Event: The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024)

Reference: pp. 9786–9805, 2024

Authors: Xianjun Yang, University of California, Santa Barbara; Liangming Pan, University of California, Santa Barbara; Xuandong Zhao, University of California, Santa Barbara; Haifeng Chen, NEC Laboratories America, Inc.; Linda Petzold, University of California, Santa Barbara; William Yang Wang, University of California, Santa Barbara; Wei Cheng, NEC Laboratories America, Inc.

Abstract: The burgeoning capabilities of advanced large language models (LLMs) such as ChatGPT have led to an increase in synthetic content generation with implications across a variety of sectors, including media, cybersecurity, public discourse, and education. As such, the ability to detect LLMs-generated content has become of paramount importance. We aim to provide a detailed overview of existing detection strategies and benchmarks, scrutinizing their differences and identifying key challenges and prospects in the field, advocating for more adaptable and robust models to enhance detection accuracy. We also posit the necessity for a multi-faceted approach to defend against various attacks to counter the rapidly advancing capabilities of LLMs. To the best of our knowledge, this work is the first comprehensive survey on the detection in the era of LLMs. We hope it will provide a broad understanding of the current landscape of LLMs-generated content detection, and we have maintained a website to consistently update the latest research as a guiding reference for researchers and practitioners.

Publication Link: https://aclanthology.org/2024.findings-emnlp.572.pdf