Wataru Kohno NEC Labs AmericaWataru Kohno is a Researcher in the Optical Networking and Sensing Department at NEC Laboratories America. He earned his Ph.D. in Physics from Hokkaido University in Japan, where he built a strong foundation in the physics and engineering principles underlying optical communications.

He has also authored a number of research papers in condensed matter physics, where he investigated fundamental problems in electronic structures, correlated electron systems, and quantum materials, contributing to a deeper theoretical understanding of how microscopic physical principles give rise to novel material properties; a selection of these works can be found on his Google Scholar profile. With his background in fundamental theoretical condensed matter physics, his work focuses on distributed acoustic sensing (DAS) and fiber-optic communication technologies, where he develops methods to transform optical fibers into highly sensitive, passive sensors capable of detecting vibration, movement, and pressure across vast geographic areas. These innovations enable real-time situational awareness without requiring active electronics at the sensing points, making large-scale monitoring systems both scalable and unobtrusive. His research supports a range of critical applications, including perimeter security, seismic monitoring, and infrastructure protection. His academic training continues to inform his applied research at NEC, where he bridges theoretical advances in photonics with real-world sensing challenges.

At NEC Laboratories America, he has contributed to multiple cutting-edge projects that expand the capabilities of distributed acoustic sensing. His recent work includes the development of advanced vibrometry techniques, recognition systems that adapt fiber sensing for downstream applications, and AI-enhanced methods for real-time detection in critical infrastructure such as power grids. By combining deep expertise in optics with practical engineering approaches, his research is helping create intelligent sensing platforms that can deliver reliable, real-time monitoring solutions for global security and infrastructure needs.

Posts

NEC Labs America Attends OECC June 28 – July 2, 2026

NEC Laboratories America is proud to participate in OECC 2026, the 31st Opto-Electronics and Communications Conference, taking place in Busan, South Korea. We look forward to connecting with the international photonics and communications community and sharing the work we’re doing to shape the next generation of optical networks.

NEC Labs America Attends CVPR 2026 in Denver, CO June 3-7, 2026

NEC Labs America headed to Denver for CVPR 2026, one of the most prestigious gatherings in computer vision, machine learning, and pattern recognition. The IEEE/CVF Conference on Computer Vision and Pattern Recognition brought innovators from around the world to share breakthroughs.

Training Small AI Models Without Blindly Trusting Big Teacher Models

Machine learning is shifting from learning from data alone to learning from both data and teacher models. Beta-KD uses uncertainty-aware Bayesian weighting to train compact multimodal AI without blindly trusting every teacher signal.

Mix-Clap: Adaptive Fusion of Knowledge-Distilled Audio Embeddings for Noise-Aware Audio-Language Models

Real-world deployment requires sound event and acoustic scene classification systems to remain reliable in noisy, diverse environments on resource-constrained devices. Although contrastive language-audio pretraining (CLAP) models with Transformer-based audio encoders achieve strong zero-shot performance, their computational cost hinders deployment. In this paper, we propose Mix-CLAP, a computationally efficient, noise-aware CLAP model with knowledge-distilled audio encoders. Our method includes: (1) a two-stage knowledge distillation from teacher embeddings to two lightweight student encoders?one on clean audio, the other on noisy audio, and (2) adaptive inference that combines their embeddings together with a fusion parameter and minimizes the parameterized entropy at test time. Experiments show that Mix-CLAP with MobileNetV3-based audio encoders greatly improves computational efficiency, while achieving a comparable average accuracy of 52.58% to the Transformer-based CLAP model at 52.83% on the recorded ESC50 datasets with different devices including microphones and fiber-optic distributed acoustic sensors under diverse conditions, making it suitable for real-world, resource-constrained applications.

Event Classification by Physics-Informed Inpainting for Distributed Multichannel Acoustic Sensor with Partially Degraded Channels

Distributed multichannel acoustic sensing (DMAS) enables large-scale sound event classification (SEC), but performance drops when many channels are degraded and when sensor layouts at test time differ from training layouts. We propose a learning-free, physics-informed inpainting frontend based on reverse time migration (RTM). In this approach, observed multichannel spectrograms are first back-propagated on a 3D grid using an analytic Green’s function to form a scene-consistent image, and then forward-projected to reconstruct inpainted signals before log–mel feature extraction and transformer-based classification. We evaluate the method on ESC-50 with 50 sensors and three layouts (circular, linear, right-angle), where per-channel SNRs are sampled from ?30 to 0 dB. Compared with an AST baseline, scaling-sparsemax channel selection, and channel-swap augmentation, the proposed RTM frontend achieves the best or competitive accuracy across all layouts, improving accuracy by 13.1 points on the right-angle layout (from 9.7% to 22.8%). Correlation analyses show that spatial weights align more strongly with SNR than with channel–source distance, and that higher SNR–weight correlation corresponds to higher SEC accuracy. These results demonstrate that a reconstruct-then-project, physics-based preprocessing effectively complements learning-only methods for DMAS under layout-open configurations and severe channel degradation.

Uncertainty-Aware Knowledge Distillation for Multimodal Large Language Models

Knowledge distillation establishes a learning paradigm that leverages both data supervision and teacher guidance. However, determining the optimal balance between learning from data and learning from the teacher is challenging, as some samples may be noisy while others are subject to teacher uncertainty. This motivates the need for adaptively balancing data and teacher supervision. We propose Beta-weighted Knowledge Distillation (Beta-KD), an uncertainty-aware distillation framework that adaptively modulates how much the student relies on teacher guidance. Specifically, we formulate teacher–student learning from a unified Bayesian perspective and interpret teacher supervision as a Gibbs prior over student activations. This yields a closed-form, uncertainty-aware weighting mechanism and supports arbitrary distillation objectives and their combinations. Extensive experiments on multimodal VQA benchmarks demonstrate that distilling student Vision-Language Models from a large teacher VLM consistently improves performance. The results show that Beta-KD outperforms existing knowledge distillation methods.

Sound Event Classification meets Data Assimilation with Distributed Fiber-Optic Sensing

Distributed Fiber-Optic Sensing (DFOS) is a promising technique for large-scale acoustic monitoring. However, its wide variation in installation environments and sensor characteristics causes spatial heterogeneity. This heterogeneity makes it difficult to collect representative training data. It also degrades the generalization ability of learning-based models, such as fine-tuning methods, under a limited amount of training data. To address this, we formulate Sound Event Classification (SEC) as data assimilation in an embedding space. Instead of training models, we infer sound event classes by combining pretrained audio embeddings with simulated DFOS signals. Simulated DFOS signals are generated by applying various frequency responses and noise patterns to microphone data, which allows for diverse prior modeling of DFOS conditions. Our method achieves out-of-domain (OOD) robust classification without requiring model training. The proposed method achieved accuracy improvements of 6.42, 14.11, and 3.47 percentage points compared with conventional zero-shot and two types of fine-tune methods, respectively. By employing the simulator in the framework of data assimilation, the proposed method also enables precise estimation of physical parameters from observed DFOS signals.

Multiple Sensor-head Phase-sensitive Optical Time-domain Laser Vibrometer

We propose a hybrid remote and distributed vibration sensing system based on phase-sensitive optical time-domain reflectometry with collimator-based sensor heads. We demonstrate dual-laser vibrometers that detects nm-scale displacements of remote targets.

Trainingless Adaptation of Pretrained Models for Environmental Sound Classification

Deep neural network (DNN)-based models for environmental sound classification are not robust against a domain to which training data do not belong, that is, out-of-distribution or unseen data. To utilize pretrained models for the unseen domain, adaptation methods, such as finetuning and transfer learning, are used with rich computing resources, e.g., the graphical processing unit (GPU). However, it is becoming more difficult to keep up with research trends for those who have poor computing resources because state-of-the-art models are becoming computationally resource-intensive. In this paper, we propose a trainingless adaptation method for pretrained models for environmental sound classification. To introduce the trainingless adaptation method, we first propose an operation of recovering time–frequency-ish (TF-ish) structures in intermediate layers of DNN models. We then propose the trainingless frequency filtering method for domain adaptation, which is not a gradient-based optimization widely used. The experiments conducted using the ESC-50 dataset show that the proposed adaptation method improves the classification accuracy by 20.40 percentage points compared with the conventional method.

Text-guided Device-realistic Sound Generation for Fiber-based Sound Event Classification

Recent advancements in unique acoustic sensing devices and large-scale audio recognition models have unlocked new possibilities for environmental sound monitoring and detection. However, applying pretrained models to non-conventional acoustic sensors results in performance degradation due to domain shifts, caused by differences in frequency response and noise characteristics from the original training data. In this study, we introduce a text-guided framework for generating new datasets to retrain models specifically for these non-conventional sensors efficiently. Our approach integrates text-conditional audio generative models with two additional steps: (1) selecting audio samples based on text input to match the desired sounds, and (2) applying domain transfer techniques using recorded impulse responses and background noise to simulate the characteristics of the sensors. We demonstrate this process by generating emulated signals for fiber-optic Distributed Acoustic Sensors (DAS), creating datasets similar to the recorded ESC-50 dataset. The generated signals are then used to train a classifier, which outperforms few-shot learning approaches in environmental sound classification.