Entries by NEC Labs America

Parametric Augmentation for Time Series Contrastive Learning

Modern techniques like contrastive learning have been effectively used in many areas, including computer vision, natural language processing, and graph-structured data. Creating positive examples that assist the model in learning robust and discriminative representations is a crucial stage in contrastive learning approaches. Usually, preset human intuition directs the selection of relevant data augmentations. Due to patterns that are easily recognized by humans, this rule of thumb works well in the vision and language domains. However, it is impractical to visually inspect the temporal structures in time series. The diversity of time series augmentations at both the dataset and instance levels makes it difficult to choose meaningful augmentations on the fly. In this study, we address this gap by analyzing time series data augmentation using information theory and summarizing the most commonly adopted augmentations in a unified format. We then propose a contrastive learning framework with parametric augmentation, AutoTCL, which can be adaptively employed to support time series representation learning. The proposed approach is encoder-agnostic, allowing it to be seamlessly integrated with different backbone encoders. Experiments on univariate forecasting tasks demonstrate the highly competitive results of our method, with an average 6.5% reduction in MSE and 4.7% in MAE over the leading baselines. In classification tasks, AutoTCL achieves a 1.2% increase in average accuracy. The source code is available at https://github.com/AslanDing/AutoTCL.

Improving Open Information Extraction with Large Language Models: A Study on Demonstration Uncertainty

Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text, typically in the form of (subject, relation, object) triples. Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks due to two key issues. First, LLMs struggle to distinguish irrelevant context from relevant relations and generate structured output due to the restrictions on fine-tuning the model. Second, LLMs generate responses autoregressively based on probability, which makes the predicted relations lack confidence. In this paper, we assess the capabilities of LLMs in improving the OIE task. Particularly, we propose various in-context learning strategies to enhance LLM’s instruction-following ability and a demonstration uncertainty quantification module to enhance the confidence of the generated relations. Our experiments on three OIE benchmark datasets show that our approach holds its own against established supervised methods, both quantitatively and qualitatively.

DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

Large language models (LLMs) have notably enhanced the fluency and diversity of machine-generated text. However, this progress also presents a significant challenge in detecting the origin of a given text, and current research on detection methods lags behind the rapid evolution of LLMs. Conventional training-based methods have limitations in flexibility, particularly when adapting to new domains, and they often lack explanatory power. To address this gap, we propose a novel training-free detection strategy called Divergent N-Gram Analysis (DNA-GPT). Given a text, we first truncate it in the middle and then use only the preceding portion as input to the LLMs to regenerate the new remaining parts. By analyzing the differences between the original and new remaining parts through N-gram analysis in black-box or probability divergence in white-box, we can clearly illustrate significant discrepancies between machine-generated and human-written text. We conducted extensive experiments on the most advanced LLMs from OpenAI, including text-davinci-003, GPT-3.5-turbo, and GPT-4, as well as open-source models such as GPT-NeoX-20B and LLaMa-13B. Results show that our zero-shot approach exhibits state-of-the-art performance in distinguishing between human and GPT-generated text on four English and one German dataset, outperforming OpenAI’s own classifier, which is trained on millions of text. Additionally, our methods provide reasonable explanations and evidence to support our claim, which is a unique feature of explainable detection. Our method is also robust under the revised text attack and can additionally solve model sourcing.

Deep Learning-Based Real-Time Rate Control for Live Streaming on Wireless Networks

Providing wireless users with high-quality video content has become increasingly important. However, ensuring consistent video quality poses challenges due to variable encodedbitrate caused by dynamic video content and fluctuating channel bitrate caused by wireless fading effects. Suboptimal selection of encoder parameters can lead to video quality loss due to underutilized bandwidth or the introduction of video artifacts due to packet loss. To address this, a real-time deep learning-based H.264 controller is proposed. This controller leverages instantaneous channel quality data driven from the physical layer, along with the video chunk, to dynamically estimate the optimal encoder parameters with a negligible delay in real-time. The objective is to maintain an encoded video bitrate slightly below the available channel bitrate. Experimental results, conducted on both QCIF dataset and a diverse selection of random videos from public datasets, validate the effectiveness of the approach. Remarkably, improvements of 10-20 dB in PSNR with respect to the state-of-the art adaptive bitrate video streaming is achieved, with an average packet drop rate as low as 0.002.

Strategic Preys Make Acute Predators: Enhancing Camouflaged Object Detectors by Generating Camouflaged Objects

Camouflaged object detection (COD) is the challenging task of identifying camouflaged objects visually blended into surroundings. Albeit achieving remarkable success, existing COD detectors still struggle to obtain precise results in some challenging cases. To handle this problem, we draw inspiration from the prey-vs-predator game that leads preys to develop better camouflage and predators to acquire more acute vision systems and develop algorithms from both the prey side and the predator side. On the prey side, we propose an adversarial trainingframework, Camouflageator, which introduces an auxiliary generator to generate more camouflaged objects that are harder for a COD method to detect. Camouflageator trains the generator and detector in an adversarial way such that the enhanced auxiliary generator helps produce a stronger detector. On the predator side, we introduce a novel COD method, called Internal Coherence and Edge Guidance (ICEG), which introduces a camouflaged feature coherence module to excavate the internal coherence of camouflaged objects, striving to obtain morecomplete segmentation results. Additionally, ICEG proposes a novel edge-guided separated calibration module to remove false predictions to avoid obtaining ambiguous boundaries. Extensive experiments show that ICEG outperforms existing COD detectors and Camouflageator is flexible to improve various COD detectors, including ICEG, which brings state-of-the-art COD performance.

CLAP: Cost and Latency-Aware Placement of Microservices on the Computing Continuum

For microservices-based real-time stream processing applications, computing at the edge delivers fast responses for low workloads, but as workload increases, the response time starts to slow down due to limited compute capacity. Abundant compute capacity in the cloud delivers fast responses even for higher workloads but incurs very high cost of operation. For applications which can tolerate latencies up to a certain limit, using either of them has one or the other drawback and for different applications and edge infrastructures, it is non-trivial to decide when to use only edge resources and when to leverage cloud resources. In this paper, we propose CLAP, which dynamically understands the relationship between workload and application latency, and automatically adjusts placement of microservices across edge and cloud computing continuum, with the goal of jointly reducing latency as well as cost of running microservices based streaming applications. CLAP leverages Reinforcement Learning (RL) technique to learn the optimal placement for a given workload and based on the learnings, adjusts placement of microservices as the application workload changes. We conduct experiments with real-world video analytics applications and show that CLAP adapts placement of microservices in response to varying workloads and achieves low latency for applications in a cost-efficient manner. Particularly, we show that for two real world video analytics applications i.e. human attributes and face recognition, CLAP is able to reduce average cost (across 4 days at different locations) by 47% and 58% for human attributes detection and face recognition application, respectively, while consistently maintaining latency below the tolerable limit.

Local and Global Optimization Methods for Optical Line Control Based on Quality of Transmission

The ever-increasing demand for data traffic in recent decades has pushed network operators to give importance to the aspect of infrastructure control to facilitate its scalability and maximize its capacity. A generic lightpath (LP) is deployed starting from a traffic request between a given pair of nodes in a network. LPs are operated in the network based on an estimate of the quality of transmission (QoT), which is derived from the physical layer characteristics of a selected route. Regardless of the model used to estimate QoT, it is necessary to calibrate themodel to maximize its accuracy and define minimum design margins. The model calibration process depends significantly on the type of data that can be collected in the field (i.e., type of metric, resolution) and therefore on the available monitoring devices. In this work, a systematic evaluation of the QoT estimation is carried out on a multi-span erbium-doped-fiber-amplified optical line system (OLS) using in the first case only total power monitors and in the second experimentally emulating optical channel monitors (OCMs). Given the type of monitoring devices available, three different physical models are calibrated, and six optimization methods are used to define the optimal configuration of the target gain and tilt parameters of the optical amplifiers, jointly optimizing the working point of all amplifiers (global approach) or proceeding span by span (local approach). Subsequently, the OLS was set in each configuration obtained, and the generalized signal-to-noise ratio (GSNR) profile was measured at the end.

Multi-Agent Simulator for Carbon Neutrality: The Technology the World Has Been Waiting For

Today, each country, government, and enterprise are urged to take effective action to fight against climate change; however, an efficient method has not been found. Even a way to accurately calculate Scope 3 carbon emissions has yet to be developed. The technology of a multi-agent simulator could be an essential step in solving worldwide challenges. We interviewed the researchers about the details of this technology.

iRAG: An Incremental Retrieval Augmented Generation System for Videos

Retrieval augmented generation (RAG) systems combine the strengths of language generation and information retrieval to power many real-world applications like chatbots. Use of RAG for combined understanding of multimodal data such as text, images and videos is appealing but two critical limitations exist: one-time, upfront capture of all content in large multimodal data as text descriptions entails high processing times, and not all information in the rich multimodal data is typically in the text descriptions. Since the user queries are not known apriori, developing a system for multimodal to text conversion and interactive querying of multimodal data is challenging.To address these limitations, we propose iRAG, which augments RAG with a novel incremental workflow to enable interactive querying of large corpus of multimodal data. Unlike traditional RAG, iRAG quickly indexes large repositories of multimodal data, and in the incremental workflow, it uses the index to opportunistically extract more details from select portions of the multimodal data to retrieve context relevant to an interactive user query. Such an incremental workflow avoids long multimodal to text conversion times, overcomes information loss issues by doing on-demand query-specific extraction of details in multimodal data, and ensures high quality of responses to interactive user queries that are often not known apriori. To the best of our knowledge, iRAG is the first system to augment RAG with an incremental workflow to support efficient interactive querying of large, real-world multimodal data. Experimental results on real-world long videos demonstrate 23x to 25x faster video to text ingestion, while ensuring that quality of responses to interactive user queries is comparable to responses from a traditional RAG where all video data is converted to text upfront before any querying.