Stanford University is a global leader in research and innovation, known for excellence in AI, medicine, and engineering. Its proximity to Silicon Valley fuels groundbreaking entrepreneurship and discovery. We have collaborated with Stanford University on cutting-edge AI research, including vision-language modeling, self-supervised learning, and neural architecture optimization. Our joint work advances the development of scalable, data-efficient AI systems for real-world deployment. Please read about our latest news and collaborative publications with Stanford University.

Posts

National Intern Day at NEC Laboratories America: Celebrating the Next Generation of Innovators

On National Intern Day, NEC Laboratories America celebrates the bright minds shaping tomorrow’s technology. Each summer, interns from top universities work side-by-side with our researchers on real-world challenges in AI, cybersecurity, data science, and more. From groundbreaking research to team-building events, our interns contribute fresh ideas and bold thinking that power NEC’s innovation engine.

Solving Inverse Problems via a Score-Based Prior: An Approximation-Free Posterior Sampling Approach

Diffusion models (DMs) have proven to be effective in modeling high-dimensional distributions, leading to their widespread adoption for representing complex priors in Bayesian inverse problems (BIPs). However, current DM-based posterior sampling methods proposed for solving common BIPs rely on heuristic approximations to the generative process. To exploit the generative capability of DMs and avoid the usage of such approximations, we propose an ensemble-based algorithm that performs posterior sampling without the use of heuristic approximations. Our algorithm is motivated by existing works that combine DM-based methods with the sequential Monte Carlo (SMC) method. By examining how the prior evolves through the diffusion process encoded by the pre-trained score function, we derive a modified partial differential equation (PDE) governing the evolution of the corresponding posterior distribution. This PDE includes a modified diffusion term and a reweighting term, which can be simulated via stochastic weighted particle methods. Theoretically, we prove that the error between the true posterior distribution canbe bounded in terms of the training error of the pre-trained score function and the ]number of particles in the ensemble. Empirically, we validate our algorithm on several inverse problems in imaging to show that our method gives more accurate reconstructions compared to existing DM-based methods.

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM). Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level. We apply this approach to a case study of scientific peer review in AI conferences that took place after the release of ChatGPT: ICLR 2024, NeurIPS 2023, CoRL 2023 and EMNLP 2023. Our results suggest that between 6.5% and 16.9% of text submitted as peer reviews to these conferences could have been substantially modified by LLMs, i.e. beyond spell-checking or minor writing updates. The circumstances in which generated text occurs offer insight into user behavior: the estimated fraction of LLM-generated text is higher in reviews which report lower confidence, were submitted close to the deadline, and from reviewers who are less likely to respond to author rebuttals. We also observe corpus-level trends in generated text which may be too subtle to detect at the individual level, and discuss the implications of such trends on peer review. We call for future interdisciplinary work to examine how LLM use is changing our information and knowledge practices.

Provable Membership Inference Privacy

In applications involving sensitive data, such as finance and healthcare, the necessity for preserving data privacy can be a significant barrier to machine learning model development.Differential privacy (DP) has emerged as one canonical standard for provable privacy. However, DP’s strong theoretical guarantees often come at the cost of a large drop in its utility for machine learning; and DP guarantees themselves are difficult to interpret. In this work, we propose a novel privacy notion, membership inference privacy (MIP), as a steptowards addressing these challenges. We give a precise characterization of the relationship between MIP and DP, and show that in some cases, MIP can be achieved using less amountof randomness compared to the amount required for guaranteeing DP, leading to smaller drop in utility. MIP guarantees are also easily interpretable in terms of the success rate of membership inference attacks in a simple random subsampling setting. As a proof of concept, we also provide a simple algorithm for guaranteeing MIP without needing to guarantee DP.