Making Video AI Fast Enough for the Real World

State-of-the-art video models are accurate but too slow for live deployment. This work transfers their knowledge into causal streaming models that process video frames in real time, achieving 4x lower latency with competitive accuracy across action detection and pedestrian intent tasks.

PhyCo: Learning Controllable Physical Priors for Generative Motion

Modern video diffusion models excel at appearance synthesis but still struggle with physical consistency: objects drift, collisions lack realistic rebound, and material responses seldom match their underlying properties. We present PhyCo, a framework that introduces continuous, interpretable, and physically grounded control into video generation. Our approach integrates three key components: (i) a large-scale dataset of over 100K photorealistic simulation videos where friction, restitution, deformation, and force are systematically varied across diverse scenarios; (ii) physics-supervised fine-tuning of a pretrained diffusion model using a ControlNet conditioned on pixel-aligned physical property maps; and (iii) VLM-guided reward optimization, where a fine-tuned vision-language model evaluates generated videos with targeted physics queries and provides differentiable feedback. This combination enables a generative model to produce physically consistent and controllable outputs through variations in physical attributes-without any simulator or geometry reconstruction at inference. On the Physics-IQ benchmark, PhyCo significantly improves physical realism over strong baselines, and human studies confirm clearer and more faithful control over physical attributes. Our results demonstrate a scalable path toward physically consistent, controllable generative video models that generalize beyond synthetic training environments.

Mix-Clap: Adaptive Fusion of Knowledge-Distilled Audio Embeddings for Noise-Aware Audio-Language Models

Real-world deployment requires sound event and acoustic scene classification systems to remain reliable in noisy, diverse environments on resource-constrained devices. Although contrastive language-audio pretraining (CLAP) models with Transformer-based audio encoders achieve strong zero-shot performance, their computational cost hinders deployment. In this paper, we propose Mix-CLAP, a computationally efficient, noise-aware CLAP model with knowledge-distilled audio encoders. Our method includes: (1) a two-stage knowledge distillation from teacher embeddings to two lightweight student encoders?one on clean audio, the other on noisy audio, and (2) adaptive inference that combines their embeddings together with a fusion parameter and minimizes the parameterized entropy at test time. Experiments show that Mix-CLAP with MobileNetV3-based audio encoders greatly improves computational efficiency, while achieving a comparable average accuracy of 52.58% to the Transformer-based CLAP model at 52.83% on the recorded ESC50 datasets with different devices including microphones and fiber-optic distributed acoustic sensors under diverse conditions, making it suitable for real-world, resource-constrained applications.

Learning to Tune OpticalWANs: A Field Deployment of Noise Models in Optical Networks

Accurately modeling optical signal transmission is critical foroptimizing network performance, particularly in large-scalefiber optic networks operated by Internet Service Providers.In this work, we develop a Gaussian Noise model for a NewYork state ISP’s optical backbone. Our model accounts for allmajor network components, including amplifiers, fiber spans,reconfigurable optical add-drop multiplexers, and transceivers.By accurately predicting end-to-end signal-to-noise ratio, ourmodel provides a foundation for network performance analysisand optimization. Then, we leverage hyperparameter searchtechniques—commonly used in machine learning—to identifyamplifier gain settings that improve signal quality. By treatingthe model as an opaque box, we systematically search foramplifier configurations that maximize the predicted end-to-end SNR while maintaining practical network constraints. Wevalidate our approach through a field deployment by applyingoptimized amplifier gain settings in a live ISP network. Ourresults show a significant improvement in optical signal quality,achieving a 2 dB increase in SNR on a single wavelength 1.

Event Classification by Physics-Informed Inpainting for Distributed Multichannel Acoustic Sensor with Partially Degraded Channels

Distributed multichannel acoustic sensing (DMAS) enables large-scale sound event classification (SEC), but performance drops when many channels are degraded and when sensor layouts at test time differ from training layouts. We propose a learning-free, physics-informed inpainting frontend based on reverse time migration (RTM). In this approach, observed multichannel spectrograms are first back-propagated on a 3D grid using an analytic Green’s function to form a scene-consistent image, and then forward-projected to reconstruct inpainted signals before log–mel feature extraction and transformer-based classification. We evaluate the method on ESC-50 with 50 sensors and three layouts (circular, linear, right-angle), where per-channel SNRs are sampled from ?30 to 0 dB. Compared with an AST baseline, scaling-sparsemax channel selection, and channel-swap augmentation, the proposed RTM frontend achieves the best or competitive accuracy across all layouts, improving accuracy by 13.1 points on the right-angle layout (from 9.7% to 22.8%). Correlation analyses show that spatial weights align more strongly with SNR than with channel–source distance, and that higher SNR–weight correlation corresponds to higher SEC accuracy. These results demonstrate that a reconstruct-then-project, physics-based preprocessing effectively complements learning-only methods for DMAS under layout-open configurations and severe channel degradation.

Solving Inverse Problems via a Score-Based Prior: An Approximation-Free Posterior Sampling Approach

Diffusion models (DMs) have proven to be effective in modeling high-dimensional distributions, leading to their widespread adoption for representing complex priors in Bayesian inverse problems (BIPs). However, current DM-based posterior sampling methods proposed for solving common BIPs rely on heuristic approximations to the generative process. To exploit the generative capability of DMs and avoid the usage of such approximations, we propose an ensemble-based algorithm that performs posterior sampling without the use of heuristic approximations. Our algorithm is motivated by existing work that combines DM-based methods with the sequential Monte Carlo (SMC) method. By examining how the prior evolves through the diffusion process encoded by the pre-trained score function, we derive a modified partial differential equation (PDE) governing the evolution of the corresponding posterior distribution. This PDE includes a modified diffusion term and a reweighting term, which can be simulated via stochastic weighted particle methods. Theoretically, we prove that the error between the true posterior and the empirical distribution of the generated samples can be bounded in terms of the training error of the pre-trained score function and the number of particles in the ensemble. Empirically, we validate our algorithm on several inverse problems in imaging to show that our method gives more accurate reconstructions compared to existing DM-based methods.

GNPy as a Benchmark for Open and Disaggregated Optical Networks

The evolution toward open and partially disaggregated optical networks has introduced new, to our knowledge,requirements on how transmission performance is evaluated and compared across technologies, vendors, and deployment scenarios. In this context, sound benchmarking practices are essential to ensure that quality-of-transmission (QoT) assessments are reproducible, transparent, and meaningful beyond isolated experimental demonstrations. QoT estimation plays a central role in these practices, as it directly impacts network planning,commissioning, automation, and long-term technology selection in heterogeneous optical infrastructures. This paper discusses benchmarking practices for optical transmission in open networks using the open-source GNPy library as a reference digital model. The contribution of this work lies in formalizing how a transparent, vendor-agnostic QoT estimator can be used as a common benchmarking baseline across research and industry. Representative experimental validations spanning short-reach, multiband, and multi-vendor flex-grid transmission scenarios are reviewed and reframed as benchmarking baselines, establishing evidence-based expectations on achievable accuracy and applicability limits under realistic operating conditions. Finally, the paper illustrates how reference QoT models are employed in industry-facing benchmarking workflows,including closed-loop interactions with standardization bodies, multi-vendor planning and automation,procurement processes and strategic network evolution toward emerging architectures.

RunAgent: Interpreting Natural-Language Plans with Constraint-Guided Execution (arXiv)

Humans solve problems by executing targeted plans, yet large language models (LLMs) remain unreliable for structured workflow execution. We propose RunAgent, a multi-agent plan execution platform that interprets natural-language plans while enforcing stepwise execution through constraints and rubrics. RunAgent bridges the expressiveness of natural language with the determinism of programming via an agentic language with explicit control constructs (e.g., texttt(Unknown sysvar: (IF)), texttt(Unknown sysvar: (GOTO)), texttt(Unknown sysvar: (FORALL))). Beyond verifying syntactic and semantic verification of the step output, which is performed based on the specific instruction of each step, RunAgent autonomously derives and validates constraints based on the description of the task and its instance at each step. RunAgent also dynamically selects among LLM-based reasoning, tool usage, and code generation and execution (e.g., in Python), and incorporates error correction mechanisms to ensure correctness. Finally, RunAgent filters the context history by retaining only relevant information during the execution of each step. Evaluations on Natural-plan and SciBench Datasets demonstrate that RunAgent outperforms baseline LLMs and state-of-the-art PlanGEN methods.

How Our AI Contributed to NASA’s Artemis Missions

NEC Laboratories America’s AI research played a role in NASA’s Artemis missions, helping analyze complex spacecraft data at scale. Our System Invariant Analysis Technology enables faster insights, improved anomaly detection, and greater confidence in mission readiness for deep space exploration.

Rethinking Molecular Drug Design: From Generation to Control

Designing drug molecules is no longer just about generation, but control. NEC Laboratories America introduces MolDiffdAE, a diffusion-based framework that enables precise, multi-objective tuning of 3D molecular properties. By learning a semantic space, researchers can efficiently guide design, accelerating drug discovery and exploration of chemical space.