Real-Time Network-Aware Roadside LiDAR Data Compression

LiDAR technology has emerged as a pivotal tool in Intelligent Transportation Systems (ITS), providing unique capabilities that have significantly transformed roadside traffic applications. However, this transformation comes with a distinct challenge: the immense volume of data generated by LiDAR sensors. These sensors produce vast amounts of data every second, which can overwhelm both private and public 5G networks that are used to connect intersections. This data volume makes it challenging to stream raw sensor data across multiple intersections effectively. This paper proposes an efficient real-time compression method for roadside LiDAR data. Our approach exploits a special characteristic of roadside LiDAR data: the background points are consistent across all frames. We detect these background points and send them to edge servers only once. For each subsequent frame, we filter out the background points and compress only the remaining data. This process achieves significant temporal compression by eliminating redundant background data and substantial spatial compression by focusing only on the filtered points. Our method is sensor-agnostic, exceptionally fast, memory-efficient, and adaptable to varying network conditions. It offers a 2.5x increase in compression rates and improves application-level accuracy by 40% compared to current state-of-the-art methods.

CAMTUNER: Adaptive Video Analytics Pipelines via Real-time Automated Camera Parameter Tuning

In Video Analytics Pipelines (VAP), Analytics Units (AUs) such as object detection and face recognition operating on remote servers rely heavily on surveillance cameras to capture high-quality video streams to achieve high accuracy. Modern network cameras offer an array of parameters that directly influence video quality. While a few of such parameters, e.g., exposure, focus and white balance, are automatically adjusted by the camera internally, the others are not. We denote such camera parameters as non-automated (NAUTO) parameters. In this work, we first show that in a typical surveillance camera deployment, environmental condition changes can have significant adverse effect on the accuracy of insights from the AUs, but such adverse impact can potentially be mitigated by dynamically adjusting NAUTO camera parameters in response to changes in environmental conditions. Second, since most end-users lack the skill or understanding to appropriately configure these parameters and typically use a fixed parameter setting, we present CAMTUNER, to our knowledge, the first framework that dynamically adapts NAUTO camera parameters to optimize the accuracy of AUs in a VAP in response to adverse changes in environmental conditions. CAMTUNER is based on SARSA reinforcement learning and it incorporates two novel components: a light-weight analytics quality estimator and a virtual camera that drastically speed up offline RL training. Our controlled experiments and real-world VAP deployment show that compared to a VAP using the default camera setting, CAMTUNER enhances VAP accuracy by detecting 15.9% additional persons and 2.6%-4.2% additional cars (without any false positives) in a large enterprise parking lot. CAMTUNER opens up new avenues for elevating video analytics accuracy, transcending mere incremental enhancements achieved through refining deep-learning models.

A Smart Sensing Grid for Road Traffic Detection Using Terrestrial Optical Networks and Attention-Enhanced Bi-LSTM

We demonstrate the use of existing terrestrial optical networks as a smart sensing grid, employing a bidirectional long short-term memory (Bi-LSTM) model enhanced with an attention mechanism to detect road vehicles. The main idea of our approach is to deploy a fast, accurate and reliable trained deep learning model in each network element that is constantly monitoring the state of polarization (SOP) of data signals traveling through the optical line system (OLS). Consequently, this deployment approach enables the creation of a sensing smart grid that can continuously monitor wide areas and respond with notifications/alerts for road traffic situations. The model is trained on the synthetic dataset and tested on the real dataset obtained from the deployed metropolitan fiber cable in the city of Turin. Our model is able to achieve 99% accuracy for both synthetic and real datasets.

Optimal Single-User Interactive Beam Alignment with Feedback Delay

Communication in Millimeter wave (mmWave) band relies on narrow beams due to directionality, high path loss, and shadowing. One can use beam alignment (BA) techniques to find and adjust the direction of these narrow beams. In this paper, BA at the base station (BS) is considered, where the BS sends a set of BA packets to scan different angular regions while the user listens to the channel and sends feedback to the BS for each received packet. It is assumed that the packets and feedback received at the user and BS, respectively, can be correctly decoded. Motivated by practical constraints such as propagation delay, a feedback delay for each BA packet is considered. At the end of the BA, the BS allocates a narrow beam to the user including its angle of departure for data transmission and the objective is to maximize the resulting expected beamforming gain. A general framework for studying this problem is proposed based on which a lower bound on the optimal performance as well as an optimality achieving scheme are obtained. Simulation results reveal significant performance improvements over the state-of-the-art BA methods in the presence of feedback delay.

DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning

Visual reasoning (VR), which is crucial in many fields for enabling human-like visual understanding, remains highly challenging. Recently, compositional visual reasoning approaches, which leverage the reasoning abilities of large language models (LLMs) with integrated tools to solve problems, have shown promise as more effective strategies than end-to-end VR methods. However, these approaches face limitations, as frozen LLMs lack tool awareness in VR, leading to performance bottlenecks. While leveraging LLMs for reasoning is widely used in other domains, they are not directly applicable to VR due to limited training data, imperfect tools that introduce errors and reduce data collection efficiency in VR, and challenging in fine-tuning on noisy workflows. To address these challenges, we propose DWIM: i) Discrepancy-aware training Workflow generation, which assesses tool usage and extracts more viable workflows for training; and ii) Instruct-Masking fine-tuning, which guides the model to only clone effective actions, enabling the generation of more practical solutions. Our experiments demonstrate that DWIM achieves state-of-the-art performance across various VR tasks, exhibiting strong generalization on multiple widely-used datasets.

400-Gb/s mode division multiplexing-based bidirectional free space optical communication in real-time with commercial transponders

In this work, for the first time, we experimentally demonstrate mode division multiplexing-based bidirectional free space optical communication in real-time using commercial transponders. As proof of concept, via bidirectional pairs of Hermite-Gaussian modes (HG00, HG10, and HG01), using a Telecom Infra Project Phoenix compliant commercial 400G transponder, 400-Gb/s data signals (56-Gbaud, DP-16QAM) are bidirectionally transmitted error free, i.e., with less than 1e-2 pre-FEC BERs, over approximately 1-m of free space

Free-Space Optical Sensing Using Vector Beam Spectra

Vector beams are spatial modes that have spatially inhomogeneous states of polarization. Any light beam is a linear combination of vector beams, the coefficients of which comprise a vector beam “spectrum.” In this work, through numerical calculations, a novel method of free-space optical sensing is demonstrated using vector beam spectra, which are shown to be experimentally measurable via Stokes polarimetry. As proof of concept, vector beam spectra are numerically calculated for various beams and beam obstructions.

EdgeSync: Efficient Edge-Assisted Video Analytics via Network Contention-Aware Scheduling

With the advancement of 5G, edge-assisted video analytics has become increasingly popular, driven by the technology’s ability to support low-latency, high-bandwidth applications. However, in scenarios where multiple clients competing for network resources, network contention poses a significant challenge. In this paper, we propose a novel scheduling algorithm that intelligently batches and aligns the offloading of multiple video analytics clients to optimize both network and edge server resource utilization while meeting the Service Level Objective (SLO). Experiment with a cellular network testbed shows that our approach successfully processes 93% or more of inference requests from 7 different clients to the edge server while meeting the SLOs, whereas other approaches achieve a lower success rate, ranging from 65% to 85% under the same condition.

Attribute-Centric Compositional Text-to-Image Generation

Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing thedata distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions,which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attributecentriccompositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novelimage-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes.Wefurther propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions.We validateour framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization ofACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency

G-Litter Marine Litter Dataset Augmentation with Diffusion Models and Large Language Models on GPU Acceleration

Marine litter detection is crucial for environmental monitoring, yet the imbalance in existing datasets limits model performance in identifying various types of waste accurately. This paper presents an efficient data augmentation pipeline that combines generative diffusion models (e.g., Stable Diffusion) and Large Language Models (LLMs) to expand the G-Litter dataset, a marine litter dataset designed for autonomous detection in heterogeneous environments. Leveraging scalable diffusion models for image generation and Alpaca LLMs for diverse prompt generation, our approach augments underrepresented classes by generating over 200 additional images per class, significantly improving the dataset’s balance. Training G-Litter augmented dataset using YOLOv8 for object detection demonstrated an increase in detection performance, improving recall by 7.82% and mAP50 by 3.87% (compared with baseline results). This study emphasizes the potential for combining generative AI with HPC resources to automate data augmentation on large-scale, unstructured datasets, particularly in edge computing contexts for real-time marine monitoring. The models were tested on real videos captured during simulated missions, demonstrating a superior ability to detect submerged objects in dynamic scenarios. These results highlight the potential of generative AI techniques to improve dataset quality and detection model performance, laying the foundation for further expansion in real-time marine monitoring.