Local and Global Optimization Methods for Optical Line Control Based on Quality of Transmission

The ever-increasing demand for data traffic in recent decades has pushed network operators to give importance to the aspect of infrastructure control to facilitate its scalability and maximize its capacity. A generic lightpath (LP) is deployed starting from a traffic request between a given pair of nodes in a network. LPs are operated in the network based on an estimate of the quality of transmission (QoT), which is derived from the physical layer characteristics of a selected route. Regardless of the model used to estimate QoT, it is necessary to calibrate themodel to maximize its accuracy and define minimum design margins. The model calibration process depends significantly on the type of data that can be collected in the field (i.e., type of metric, resolution) and therefore on the available monitoring devices. In this work, a systematic evaluation of the QoT estimation is carried out on a multi-span erbium-doped-fiber-amplified optical line system (OLS) using in the first case only total power monitors and in the second experimentally emulating optical channel monitors (OCMs). Given the type of monitoring devices available, three different physical models are calibrated, and six optimization methods are used to define the optimal configuration of the target gain and tilt parameters of the optical amplifiers, jointly optimizing the working point of all amplifiers (global approach) or proceeding span by span (local approach). Subsequently, the OLS was set in each configuration obtained, and the generalized signal-to-noise ratio (GSNR) profile was measured at the end.

Multi-Agent Simulator for Carbon Neutrality: The Technology the World Has Been Waiting For

Today, each country, government, and enterprise are urged to take effective action to fight against climate change; however, an efficient method has not been found. Even a way to accurately calculate Scope 3 carbon emissions has yet to be developed. The technology of a multi-agent simulator could be an essential step in solving worldwide challenges. We interviewed the researchers about the details of this technology.

iRAG: An Incremental Retrieval Augmented Generation System for Videos

Retrieval augmented generation (RAG) systems combine the strengths of language generation and information retrieval to power many real-world applications like chatbots. Use of RAG for combined understanding of multimodal data such as text, images and videos is appealing but two critical limitations exist: one-time, upfront capture of all content in large multimodal data as text descriptions entails high processing times, and not all information in the rich multimodal data is typically in the text descriptions. Since the user queries are not known apriori, developing a system for multimodal to text conversion and interactive querying of multimodal data is challenging.To address these limitations, we propose iRAG, which augments RAG with a novel incremental workflow to enable interactive querying of large corpus of multimodal data. Unlike traditional RAG, iRAG quickly indexes large repositories of multimodal data, and in the incremental workflow, it uses the index to opportunistically extract more details from select portions of the multimodal data to retrieve context relevant to an interactive user query. Such an incremental workflow avoids long multimodal to text conversion times, overcomes information loss issues by doing on-demand query-specific extraction of details in multimodal data, and ensures high quality of responses to interactive user queries that are often not known apriori. To the best of our knowledge, iRAG is the first system to augment RAG with an incremental workflow to support efficient interactive querying of large, real-world multimodal data. Experimental results on real-world long videos demonstrate 23x to 25x faster video to text ingestion, while ensuring that quality of responses to interactive user queries is comparable to responses from a traditional RAG where all video data is converted to text upfront before any querying.

Radio-Frequency Linear Analysis and Optimization of Silicon Photonic Neural Networks

Broadband analog signal processors utilizing silicon photonics have demonstrated a significant impact in numerous application spaces, offering unprecedented bandwidths, dynamic range, and tunability. In the past decade, microwave photonic techniques have been applied to neuromorphic processing, resulting in the development of novel photonic neural network architectures. Neuromorphic photonic systems can enable machine learning capabilities at extreme bandwidths and speeds. Herein, low-quality factor microring resonators are implemented to demonstrate broadband optical weighting. In addition, silicon photonic neural network architectures are critically evaluated, simulated, and optimized from a radio-frequency performance perspective. This analysis highlights the linear front-end of the photonic neural network, the effects of linear and nonlinear loss within silicon waveguides, and the impact of electrical preamplification.

Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation

A powerful architecture for universal segmentation relies on transformers that encode multi-scale image features and decode object queries into mask predictions. With efficiency being a high priority for scaling such models, we observed that the state-of-the-art method Mask2Former uses >50% of its compute only on the transformer encoder. This is due to the retention of a full-length token-level representation of all backbone feature scales at each encoder layer. With this observation, we propose a strategy termed PROgressive Token Length SCALing for Efficient transformer encoders (PRO-SCALE) that can be plugged-in to the Mask2Former style segmentation architectures to significantly reduce the computational cost. The underlying principle of PRO-SCALE is: progressively scale the length of the tokens with the layers of the encoder. This allows PRO-SCALE to reduce computations by a large margin with minimal sacrifice in performance (?52% GFLOPs reduction with no drop in performance on COCO dataset). We validate our frame work on multiple public benchmarks.

Efficient Transformer Encoders for Mask2Former-style Models

Vision transformer based models bring significant improvements for image segmentation tasks. Although these architectures offer powerful capabilities irrespective of specific segmentation tasks, their use of computational resources can be taxing on deployed devices. One way to overcome this challenge is by adapting the computation level to the specific needs of the input image rather than the current one size-fits-all approach. To this end, we introduce ECO-M2F or EffiCient TransfOrmer Encoders for Mask2Former-style models. Noting that the encoder module of M2F-style models incur high resource-intensive computations, ECO-M2F provides a strategy to self-select the number of hidden layers in the encoder, conditioned on the input image. To enable this self-selection ability for providing a balance between performance and computational efficiency, we present a three-step recipe. The first step is to train the parent architecture to enable early exiting from the encoder. The second step is to create a derived dataset of the ideal number of encoder layers required for each training example. The third step is to use the aforementioned derived dataset to train a gating network that predicts the number of encoder layers to be used, conditioned on input image. Additionally, to change the computational-accuracy trade off, only steps two and three need to be repeated which significantly reduces retraining time. Experiments on the public datasets show that the proposed approach reduces expected encoder computational cost while maintaining performance, adapts to various user compute resources, is flexible in architecture configurations, and can be extended beyond the segmentation task to object detection.

Low-rank Constrained Multichannel Signal Denoising Considering Channel-dependent Sensitivity Inspired by Self-supervised Learning for Optical Fiber Sensing

Optical fiber sensing is a technology wherein audio, vibrations, and temperature are detected using an optical fiber; especially the audio/vibrations-aware sensing is called distributed acoustic sensing (DAS). In DAS, observed data, which is comprised of multichannel data, has suffered from severe noise levels because of the optical noise or the installation methods. In conventional methods for denoising DAS data, signal-processing- or deep-neural-network (DNN)-based models have been studied. The signal-processing-based methods have the interpretability, i.e., non-black box. The DNN-based methods are good at flexibility designing network architectures and objective functions, that is, priors. However, there is no balance between the interpretability and the flexibility of priors in the DAS studies. The DNN-based methods also require a large amount of training data in general. To address the problems, we propose a DNN-structure signal-processing-based denoising method in this paper. As the priors of DAS, we employ spatial knowledge; low rank and channel-dependent sensitivity using the DNN-based structure.The result of fiber-acoustic sensing shows that the proposed method outperforms the conventional methods and the robustness to the number of the spatial ranks. Moreover, the optimized parameters of the proposed method indicate the relationship with the channel sensitivity; the interpretability.

Provable Membership Inference Privacy

In applications involving sensitive data, such as finance and healthcare, the necessity for preserving data privacy can be a significant barrier to machine learning model development.Differential privacy (DP) has emerged as one canonical standard for provable privacy. However, DP’s strong theoretical guarantees often come at the cost of a large drop in its utility for machine learning; and DP guarantees themselves are difficult to interpret. In this work, we propose a novel privacy notion, membership inference privacy (MIP), as a steptowards addressing these challenges. We give a precise characterization of the relationship between MIP and DP, and show that in some cases, MIP can be achieved using less amountof randomness compared to the amount required for guaranteeing DP, leading to smaller drop in utility. MIP guarantees are also easily interpretable in terms of the success rate of membership inference attacks in a simple random subsampling setting. As a proof of concept, we also provide a simple algorithm for guaranteeing MIP without needing to guarantee DP.

Link Loss Analysis of Integrated Linear Weight Bank within Silicon Photonic Neural Network

Over the last decade, silicon photonic neural networks have demonstrated the possibility of photonic-enabled machine learning at the edge. These systems enable low-latency ultra-wideband classifications, channel estimations, and many other signal characterization tasks within wireless environments. While these proof-of-concept experiments have yielded promising results, poor device and architectural designs have resulted in sub-optimal bandwidth and noise performance. As a result, the application space of this technology has been limited to GHz bandwidths and high signal-to-ratio input signals. By applying a microwave photonic perspective to these systems, the authors demonstrate high-bandwidth operation while optimizing for RF performance metrics: instantaneous bandwidth, link loss, noise figure, and dynamic range. The authors explore the extended capabilities due to these improved metrics and potential architectures to continue further optimization. The authors introduce novel architectures and RF analysis for RF-optimized neuromorphic photonic hardware.

Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement

Visual program synthesis is a promising approach to exploit the reasoning abilities of large language models for compositional computer vision tasks. Previous work has used few-shot prompting with frozen LLMs to synthesize visual programs. Training an LLM to write better visual programs is an attractive prospect, but it is unclear how to accomplish this. No dataset of visual programs for training exists, and acquisition of a visual program dataset cannot be easily crowdsourced due to the need for expert annotators. To get around the lack of direct supervision, we explore improving the program synthesis abilities of an LLM using feedback from interactive experience. We propose a method where we exploit existing annotations for a vision-language task to improvise a coarse reward signal for that task, treat the LLM as a policy, and apply reinforced self-training to improve the visual program synthesis ability of the LLM for that task. We describe a series of experiments on object detection, compositional visual question answering, and image-text retrieval, and show that in each case, the self-trained LLM outperforms or performs on par with few-shot frozen LLMs that are an order of magnitude larger. Website: https://zaidkhan.me/ViReP/