Summer Interns 2024

Learn about the amazing group of interns who joined us at Princeton and San Jose campuses this summer. Their hard work, fresh perspectives, and dedication have truly made an impact across the board, from cutting-edge research projects to innovative software development initiatives.

Safe-Sim: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries

Evaluating the performance of autonomous vehicle planning algorithms necessitates simulating long-tail safety-critical traffic scenarios. However, traditional methods for generating such scenarios often fall short in terms of controllability and realism; they also neglect the dynamics of agent interactions. To address these limitations, we introduce Safe-Sim, a novel diffusion-based controllable closed-loop safety-critical simulation framework. Our approach yields two distinct advantages: 1) generating realistic long-tail safety-critical scenarios that closely reflect real-world conditions, and 2) providing controllable adversarial behavior for more comprehensive and interactive evaluations. We develop a novel approach to simulate safety-critical scenarios through an adversarial term in the denoising process of diffusion models, which allows an adversarial agent to challenge a planner with plausible maneuvers while all agents in the scene exhibit reactive and realistic behaviors. Furthermore, we propose novel guidance objectives and a partial diffusion process that enables users to control key aspects of the scenarios, such as the collision type and aggressiveness of the adversarial agent, while maintaining the realism of the behavior. We validate our framework empirically using the nuScenes and nuPlan datasets across multiple planners, demonstrating improvements in both realism and controllability. These findings affirm that diffusion models provide a robust and versatile foundation for safety-critical, interactive traffic simulation, extending their utility across the broader autonomous driving landscape.

Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment

Learning to localize temporal boundaries of procedure steps in instructional videos is challenging due to the limited availability of annotated large-scale training videos. Recent works focus on learning the cross-modal alignment between video segments and ASR-transcripted narration texts through contrastive learning. However, these methods fail to account for the alignment noise, i.e., irrelevant narrations to the instructional task in videos and unreliable timestamps in narrations. To address these challenges, this work proposes a novel training framework. Motivated by the strong capabilities of Large Language Models (LLMs) in procedure understanding and text summarization, we first apply an LLM to filter out task-irrelevant information and summarize task-related procedure steps (LLM-steps) from narrations. To further generate reliable pseudo-matching between the LLM-steps and the video for training, we propose the Multi-Pathway Text-Video Alignment (MPTVA) strategy. The key idea is to measure alignment between LLM-steps and videos via multiple pathways, including: (1) step-narration-video alignment using narration timestamps, (2) direct step-to-video alignment based on their long-term semantic similarity, and (3) direct step-to-video alignment focusing on short-term fine-grained semantic similarity learned from general video domains. The results from different pathways are fused to generate reliable pseudo step-video matching. We conducted extensive experiments across various tasks and problem settings to evaluate our proposed method. Our approach surpasses state-of-the-art methods in three downstream tasks: procedure step grounding, step localization, and narration grounding by 5.9%, 3.1%, and 2.8%.

TrafficLens: Multi-Camera Traffic Video Analysis Using LLMs

Traffic cameras are essential in urban areas, playing a crucial role in intelligent transportation systems. Multiple cameras at intersections enhance law enforcement capabilities, traffic management, and pedestrian safety. However, efficiently managing and analyzing multi-camera feeds poses challenges due to the vast amount of data. Analyzing such huge video data requires advanced analytical tools. While Large Language Models (LLMs) like ChatGPT, equipped with retrieval-augmented generation (RAG) systems, excel in text-based tasks, integrating them into traffic video analysis demands converting video data into text using a Vision-Language Model (VLM), which is time-consuming and delays the timely utilization of traffic videos for generating insights and investigating incidents. To address these challenges, we propose TrafficLens, a tailored algorithm for multi-camera traffic intersections. TrafficLens employs a sequential approach, utilizing overlapping coverage areas of cameras. It iteratively applies VLMs with varying token limits, using previous outputs as prompts for subsequent cameras, enabling rapid generation of detailed textual descriptions while reducing processing time. Additionally, TrafficLens intelligently bypasses redundant VLM invocations through an object-level similarity detector. Experimental results with real-world datasets demonstrate that TrafficLens reduces video-to-text conversion time by up to 4× while maintaining information accuracy.

Accelerating Distributed Machine Learning with an Efficient AllReduce Routing Strategy

We propose an efficient routing strategy for AllReduce transfers, which compromise of the dominant traffic in machine learning-centric datacenters, to achieve fast parameter synchronization in distributed machine learning, improving the average training time by 9%.

Remote Sensing for Power Grid Fuse Tripping Using AI-Based Fiber Sensing with Aerial Telecom Cables

For the first time, we demonstrate remote sensing of pole-mounted fuse-cutout blowing in a power grid setup using telecom fiber cable. The proposed frequency-based AI model achieves over 98% detection accuracy using distributed fiber sensing data.

Measuring the Transceivers Back-to-Back BER-OSNR Characteristic Using Only a Variable Optical Attenuator

We propose a transceiver back-to-back BER-OSNR characterization method that requires only a single VOA; it leverages the receiver SNR degradation caused by received power attenuation. Experiments using commercial transceivers show that the measurement error is less than 0.2 dB in the Q-factor.

Machine Learning Model for EDFA Predicting SHB Effects

Experiments show that machine learning model of an EDFA is capable of modelling spectral hole burning effects accurately. As a result, it significantly outperforms black-box models that neglect inhomogeneous effects. Model achieves a record average RMSE of 0.0165 dB between the model predictions and measurements.

First Field Demonstration of Hollow-Core Fibre Supporting Distributed Acoustic Sensing and DWDM Transmission

We demonstrate a method for measuring the backscatter coefficient of hollow-core fibre (HCF), and show the feasibility of distributed acoustic sensing (DAS) with simultaneous 9.6-Tb/s DWDM transmission over a 1.6-km field-deployed HCF cable.

Extension of the Local-Optimization Global-Optimization (LOGO) Launch Power Strategy to Multi-Band Optical Networks

We propose extending the LOGO strategy for launch power settings to multi-band scenarios, maintaining low complexity while addressing key inter-band nonlinear effects and accurate amplifier models. This methodology simplifies multi-band optical multiplex section control, providing an immediate, descriptive estimation of optimized launch power.