NEC Labs America Attends CVPR 2026 in Denver, CO June 3-7, 2026

June 2, 2026|byNEC Labs America|inEvents|tags3d scene editing, AUTOPILOT workshop, CVF, IEEE, abhishek aich, ai, ali alshami, aniket roy, anomaly detection, autonomous driving, bingbing zhuang, computer vision, cvpr, cvpr 2026, deep patel, diffusion models, foundation models, francesco pittaluga, jingchen sun, knowledge distillation, large language models, ma publication, machine learning, manmohan chandraker, media analytics, nvidia, rutgers university, shaobo han, university at buffalo, wataru kohno, wayve, zaid tasneem, ziyu jiang, zoox

NEC Labs America headed to Denver for CVPR 2026, one of the most prestigious gatherings in artificial intelligence and computer science. The IEEE/CVF Conference on Computer Vision and Pattern Recognition brings together researchers, engineers, and innovators from around the world to share breakthroughs in computer vision, machine learning, and pattern recognition.

Running June 3 through June 7, CVPR 2026 is a premier destination for anyone working at the frontier of visual AI. The conference draws thousands of attendees across workshops, tutorials, demos, and an expansive expo floor, making it one of the most dynamic events in the field. For us, it represents a valuable opportunity to connect with the global research community, explore cutting-edge developments, and bring fresh insights back to the region. Stay tuned for updates, takeaways, and highlights from our time at CVPR 2026.

Presentations

AUTOPILOT Workshop

Autonomous Understanding Through Open-world Perception and Integrated Language Models for On-road Tasks

Workshop In conjunction with the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026 – Denver)
Manmohan Chandraker, Speaker
Ali K. AlShami, Organizing Committee
June 3, 2026
Website: https://www.autopilot-cvpr.net/

The AUTOPILOT Workshop at CVPR 2026 brought together some of the brightest minds in autonomous driving for a full-day event in Denver. Now in its third edition, it proved once again why it has become one of the most anticipated gatherings in the computer vision community. Attendees were treated to four outstanding keynotes from top industry leaders, including Jose M. Alvarez of NVIDIA, Bat El Shlomo of ZOOX, and Matthew Alun Brown of Wayve. NEC Labs America shone brightly, with Manmohan Chandraker delivering a standout keynote and Ali K. AlShami serving as a lead organizer, helping bring the entire event to life.

The technical program was equally impressive, with seven archival papers published in the official CVPR 2026 proceedings and 21 additional non-archival papers covering everything from vision-language models to real-time collision anticipation. Kaggle competition winners from institutions across three continents took the stage to share their approaches, making for some of the most engaging presentations of the day. From open-world hazard detection to on-vehicle deployment of foundation models, AUTOPILOT 2026 tackled the hardest problems in autonomous driving with rigor and creativity. A landmark moment for the field and thanks to everyone who attended and presented. See you next year for our 4^th edition!

Previous Next

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Uncertainty-Aware Knowledge Distillation for Multimodal Large Language Models

Poster Presentation
Authors: Jingchen Sun (Presenter), Shaobo Han (Pictured, Presenter), Deep Patel, Wataru Kohno, Can Jin, Changyou Chen
Code is available at: https://github.com/Jingchensun/beta-kd

Abstract: Knowledge distillation establishes a learning paradigm that learns from both data supervision and teacher guidance. However, the optimal weighting between learning from data and learning from the teacher is hard to determine, as some samples are data-noisy while others are teacher-uncertain. This raises a pressing need to adaptively balance data and teacher supervision. We propose Beta-weighted Knowledge Distillation \textbf{-KD}, an adaptive, uncertainty-aware knowledge distillation framework that supports arbitrary distillation objectives under a unified Bayesian formulation. Specifically, we model teacher signals as a Gibbs prior over student activations and use amortized optimization to jointly infer activations and weighting parameters , leading to a closed-form, uncertainty-aware weighting. Extensive experiments distilling a 1.7B-parameter student from MobileVLM-7B demonstrate that -KD consistently outperforms existing methods under different loss combination settings. Moreover, large-scale distillation and evaluations on six multimodal benchmarks further confirm the effectiveness of the proposed approach.

Object-Aware 4D Human Motion Generation

Virtual Presentation
Presenter: Deep Patel
June 4, 2026
Paper: https://www.nec-labs.com/blog/object-aware-4d-human-motion-generation/

Abstract: Recent advances in video diffusion models have enabled the generation of high-quality videos. However, these videos still suffer from unrealistic deformations, semantic violations, and physical inconsistencies that are largely rooted in the absence of 3D physical priors. To address these challenges, we propose an object-aware 4D human motion generation framework grounded in 3D Gaussian representations and motion diffusion priors. With pre-generated 3D humans and objects, our method, Motion Score Distilled Interaction (MSDI), employs the spatial and prompt semantic information in large language models (LLMs) and motion priors through the proposed Motion Diffusion Score Distillation Sampling (MSDS). The combination of MSDS and LLMs enables our spatial-aware motion optimization, which distills score gradients from pre-trained motion diffusion models, to refine human motion while respecting object and semantic constraints. Unlike prior methods requiring joint training on limited interaction datasets, our zero-shot approach avoids retraining and generalizes to out-of-distribution object aware human motions. Experiments demonstrate that our framework produces natural and physically plausible human motions that respect 3D spatial context, offering a scalable solution for realistic 4D generation.

Anomaly Detection with Foundation Models (ADFM) Workshop

Workshop In conjunction with the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026 – Denver)
Abhishek Aich, Organizing Committee
June 4, 2026
Website: https://adfmw.github.io/cvpr26/

Abhishek Aich of our Media Analytics department is serving as an organizer of the Anomaly Detection with Foundation Models (ADFM) Workshop at CVPR 2026 in Denver on June 4th. Foundation models have rapidly transformed fields ranging from healthcare and cybersecurity to finance and industrial systems. Yet one critical capability remains underexplored: the use of these powerful models for anomaly detection. As organizations increasingly rely on AI in high-stakes environments, the ability to identify unusual patterns, out-of-distribution inputs, and edge cases becomes essential to safety and reliability. ADFM 2026 workshop addresses this gap directly by providing a dedicated forum for researchers and practitioners to share recent breakthroughs, examine technical and ethical implications, and explore paths toward more robust and explainable anomaly detection systems.

AV Simulation Team

The NEC Laboratories America AV simulation team, led by Manmohan Chandraker, will present multiple papers at CVPR on Agentic Simulation. Hit us up to chat about training and validation on the long tail: World models, 3D scene editing, Diffusion models and Embodied and Physical AI.

Team: Zaid Tasneem, Ziyu Jiang, Aniket Roy, and Francesco Pittaluga.

LangDriveCTRL: Natural Language Controllable Driving Scene Editing with Multi-modal Agents

Project Leads: Yun He (intern), Zaid Tasneem
Thursday, June 4th from 3:15 to 4:15 PM, SAD Workshop, Room 102/104
Project: https://yunhe24.github.io/langdrivectrl/

Abstract: LangDriveCTRL is a natural-language-controllable framework for editing real-world driving videos to synthesize diverse traffic scenarios. It represents each video as an explicit 3D scene graph, decomposing the scene into a static background and dynamic object nodes. To enable fine-grained editing and realism, it introduces a feedback-driven agentic pipeline. An Orchestrator converts user instructions into executable graphs that coordinate specialized multi-modal agents and tools. An Object Grounding Agent aligns free-form text with target object nodes in the scene graph; a Behavior Editing Agent generates multi-object trajectories from language instructions; and a Behavior Reviewer Agent iteratively reviews and refines the generated trajectories. The edited scene graph is rendered and harmonized using a video diffusion tool, and then further refined by a Video Reviewer Agent to ensure photorealism and appearance alignment. LangDriveCTRL supports both object node editing (removal, insertion, and replacement) and multi-object behavior editing from natural-language instructions. Quantitatively, it achieves nearly higher instruction alignment than the previous SoTA, with superior photorealism, structural preservation, and traffic realism.

HorizonWeaver: Generalizable Multi-Level Semantic Editing for Driving Scenes

Project Leads: Mauricio Soroco (intern), Ziyu Jiang
Friday, June 5 from 7:00 to 8:30 AM, Findings Posters, ExHall A
Project: https://msoroco.github.io/horizonweaver/
Paper: https://www.nec-labs.com/blog/horizonweaver-generalizable-multi-level-semantic-editing-for-driving-scenes/

Abstract: Ensuring safety in autonomous driving requires scalable generation of realistic, controllable driving scenes beyond what real-world testing provides. Yet existing instruction guided image editors, trained on object-centric or artistic data, struggle with dense, safety-critical driving layouts. We propose HorizonWeaver, which tackles three fundamental challenges in driving scene editing: (1) multi-level granularity, requiring coherent object- and scene-level edits in dense environments; (2) rich high-level semantics, preserving diverse objects while following detailed instructions; and (3) ubiquitous domain shifts, handling changes in climate, layout, and traffic across unseen environments. The core of HorizonWeaver is a set of complementary contributions across data, model, and training: (1) Data: Large-scale dataset generation, where we build a paired real/synthetic dataset from Boreas, nuScenes, and Argoverse2 to improve generalization; (2) Model: Language-Guided Masks for fine-grained editing, where semantics-enriched masks and prompts enable precise, language-guided edits; and (3) Training: Content preservation and instruction alignment, where joint losses enforce scene consistency and instruction fidelity. Together, HorizonWeaver provides a scalable framework for photorealistic, instruction-driven editing of complex driving scenes, collecting 255K images across 13 editing categories and outperforming prior methods in L1, CLIP, and DINO metrics, achieving +46.4% user preference and improving BEV segmentation IoU by +33%.

HorizonForge: Driving Scene Editing with Any Trajectories and Any Vehicles

Project Leads: Yifan Wang (intern), Ziyu Jiang
Saturday, June 6th from 4:45 to 6:45 PM, Poster Session 4 (Main), ExHall A
Project: https://horizonforge.github.io/
Paper: https://www.nec-labs.com/blog/horizonforge-driving-scene-editing-with-any-trajectories-and-any-vehicles/
Collaborators: Matthias Zwicker, Chenyu You, Wuyang Chen, Abhishek Aich, Bingbing Zhuang.

Abstract: Controllable driving scene generation is critical for realistic and scalable autonomous driving simulation, yet existing approaches struggle to jointly achieve photorealism and precise control. We introduce HorizonForge, a unified framework that reconstructs scenes as editable Gaussian Splats and Meshes, enabling fine-grained 3D manipulation and language-driven vehicle insertion. Edits are rendered through a noise-aware video diffusion process that enforces spatial and temporal consistency, producing diverse scene variations in a single feed-forward pass without per-trajectory optimization. To standardize evaluation, we further propose HorizonSuite, a comprehensive benchmark spanning ego- and agent-level editing tasks such as trajectory modifications and object manipulation. Extensive experiments show that Gaussian-Mesh representation delivers substantially higher fidelity than alternative 3D representations, and that temporal priors from video diffusion are essential for coherent synthesis. Combining these findings, HorizonForge establishes a simple yet powerful paradigm for photorealistic, controllable driving simulation, achieving an 83.4% user-preference gain and a 25.19% FID improvement over the second-best state-of-the-art method.

Read About Our Future and Past Events

NEC Labs America Attends ICML 2026 Seoul, South Korea July 6-11, 2026

June 24, 2026

NEC Laboratories America researchers are heading to Seoul this July for ICML 2026, the Forty-Third International Conference on Machine Learning. One of the most prestigious gatherings in the field, ICML draws academic and industry researchers from around the world to share work spanning machine learning, artificial intelligence, data science, and their many applications.

NEC Labs America Attends ACL 2026 San Diego July 2-7, 2026

June 23, 2026

NEC Laboratories America heads to ACL 2026 in San Diego, California, July 2–7, to present accepted papers spanning knowledge updating and memory control in large language models, task-aware cultural alignment, uncertainty-aware reasoning, and adaptive chain-of-thought optimization, representing some of the most active frontiers in NLP and AI research today.

NEC Labs America Attends OECC June 28 – July 2, 2026

June 15, 2026

NEC Laboratories America is proud to participate in OECC 2026, the 31st Opto-Electronics and Communications Conference, taking place in Busan, South Korea. We look forward to connecting with the international photonics and communications community and sharing the work we're doing to shape the next generation of optical networks.