Posts

Provable Membership Inference Privacy

In applications involving sensitive data, such as finance and healthcare, the necessity for preserving data privacy can be a significant barrier to machine learning model development.Differential privacy (DP) has emerged as one canonical standard for provable privacy. However, DP’s strong theoretical guarantees often come at the cost of a large drop in its utility for machine learning; and DP guarantees themselves are difficult to interpret. In this work, we propose a novel privacy notion, membership inference privacy (MIP), as a steptowards addressing these challenges. We give a precise characterization of the relationship between MIP and DP, and show that in some cases, MIP can be achieved using less amountof randomness compared to the amount required for guaranteeing DP, leading to smaller drop in utility. MIP guarantees are also easily interpretable in terms of the success rate of membership inference attacks in a simple random subsampling setting. As a proof of concept, we also provide a simple algorithm for guaranteeing MIP without needing to guarantee DP.

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM). Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level. We apply this approach to a case study of scientific peer review in AI conferences that took place after the release of ChatGPT: ICLR 2024, NeurIPS 2023, CoRL 2023 and EMNLP 2023. Our results suggest that between 6.5% and 16.9% of text submitted as peer reviews to these conferences could have been substantially modified by LLMs, i.e. beyond spell-checking or minor writing updates. The circumstances in which generated text occurs offer insight into user behavior: the estimated fraction of LLM-generated text is higher in reviews which report lower confidence, were submitted close to the deadline, and from reviewers who are less likely to respond to author rebuttals. We also observe corpus-level trends in generated text which may be too subtle to detect at the individual level, and discuss the implications of such trends on peer review. We call for future interdisciplinary work to examine how LLM use is changing our information and knowledge practices.

NEC Labs America Team Heading to NeurIPS23 in New Orleans

NEC Labs America is proud to be a Silver Sponsor for NeurIPS 2023 in New Orleans from December 10-16. Visit our booth to meet our team and learn about our intern opportunities in machine learning, data science, media analytics and integrated systems. Also, our Vijay Kumar.B.G, Samuel Schulter & Manmohan Chandraker, along with Zaid Khan, Northeastern University and Yun Fu, UC San Diego will present a paper, Exploring Question Decomposition for Zero-Shot VQA.