Publications Archives | Page 2 of 64

Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation

April 24, 2025/in Publications/by NEC Labs America

A powerful architecture for universal segmentation relies on transformers that encode multi-scale image features and decode object queries into mask predictions. With efficiency being a high priority for scaling such models, we observed that the state-of-the-art method Mask2Former uses >50% of its compute only on the transformer encoder. This is due to the retention of a full-length token-level representation of all backbone feature scales at each encoder layer. With this observation, we propose a strategy termed PROgressive Token Length SCALing for Efficient transformer encoders (PRO-SCALE) that can be plugged-in to the Mask2Former segmentation architecture to significantly reduce the computational cost. The underlying principle of PRO-SCALE is: progressively scale the length of the tokens with the layers of the encoder. This allows PRO-SCALE to reduce computations by a large margin with minimal sacrifice in performance (?52% encoder and ? 27% overall GFLOPs reduction with no drop in performance on COCO dataset). Experiments conducted on public benchmarks demonstrates PRO-SCALEs flexibility in architectural configurations, and exhibits potential for extension beyond the settings of segmentation tasks to encompass object detection. Code available here: https://github.com/abhishekaich27/proscale-pytorch

TSLA: Unified Time Series and Language Model

April 10, 2025/in Publications/by NEC Labs America

Real-world time series data often require analysis or interpretation from domain experts. Some tasks, like time series question answering, involve both time series and natural language questions, posing challenges for single-modality language models to understand their interaction. To this end, we present TSLA (Time Series Language Model), a framework designed to enhance the language model with the understanding of time series data for multi-modality tasks. TSLA comprises three key components. (1) Time Series Tokenizer learns how to represent time series data into discrete tokens, making it more manageable for language models. (2) Joint (Pre-)Training on task-agnostic time series and text data integrates time series tokens and text tokens to model the interplay between time series and language concepts. (3) Multi-task Instruction Tuning fine-tunes the pretrained TSLA for various downstream tasks relevant to user interests. For evaluation, we applied TSLA to time series data from human motions on four tasks: time series captioning, time series question answering, text-based time series synthesis, and text-based time series continuation. The results demonstrate TSLAs effectiveness in handling multiple time series analysis tasks, pointing the way for future research endeavors.

CLAP-S: Support Set Based Adaptation for Downstream Fiber-optic Acoustic Recognition

April 9, 2025/in Publications/by NEC Labs America

Contrastive Language-Audio Pretraining (CLAP) models have demonstrated unprecedented performance in various acoustic signal recognition tasks. Fiber-optic-based acoustic recognition is one of the most important downstream tasks and plays a significant role in environmental sensing. Adapting CLAP for fiber-optic acoustic recognition has become an active research area. As a non-conventional acoustic sensor, fiberoptic acoustic recognition presents a challenging, domain-specific, low-shot deployment environment with significant domain shifts due to unique frequency response and noise characteristics. To address these challenges, we propose a support-based adaptation method, CLAP-S, which linearly interpolates a CLAP Adapter with the Support Set, leveraging both implicit knowledge through fine-tuning and explicit knowledge retrieved from memory for cross-domain generalization. Experimental results show that our method delivers competitive performance on both laboratory recorded fiber-optic ESC-50 datasets and a real-world fiber optic gunshot-firework dataset. Our research also provides valuable insights for other downstream acoustic recognition tasks.

Text-guided Device-realistic Sound Generation for Fiber-based Sound Event Classification

April 9, 2025/in Publications/by NEC Labs America

Recent advancements in unique acoustic sensing devices and large-scale audio recognition models have unlocked new possibilities for environmental sound monitoring and detection. However, applying pretrained models to non-conventional acoustic sensors results in performance degradation due to domain shifts, caused by differences in frequency response and noise characteristics from the original training data. In this study, we introduce a text-guided framework for generating new datasets to retrain models specifically for these non-conventional sensors efficiently. Our approach integrates text-conditional audio generative models with two additional steps: (1) selecting audio samples based on text input to match the desired sounds, and (2) applying domain transfer techniques using recorded impulse responses and background noise to simulate the characteristics of the sensors. We demonstrate this process by generating emulated signals for fiber-optic Distributed Acoustic Sensors (DAS), creating datasets similar to the recorded ESC-50 dataset. The generated signals are then used to train a classifier, which outperforms few-shot learning approaches in environmental sound classification.

Trainingless Adaptation of Pretrained Models for Environmental Sound Classification

April 9, 2025/in Publications/by NEC Labs America

Deep neural network (DNN)-based models for environmental sound classification are not robust against a domain to which training data do not belong, that is, out-of-distribution or unseen data. To utilize pretrained models for the unseen domain, adaptation methods, such as finetuning and transfer learning, are used with rich computing resources, e.g., the graphical processing unit (GPU). However, it is becoming more difficult to keep up with research trends for those who have poor computing resources because state-of-the-art models are becoming computationally resource-intensive. In this paper, we propose a trainingless adaptation method for pretrained models for environmental sound classification. To introduce the trainingless adaptation method, we first propose an operation of recovering timefrequency-ish (TF-ish) structures in intermediate layers of DNN models. We then propose the trainingless frequency filtering method for domain adaptation, which is not a gradient-based optimization widely used. The experiments conducted using the ESC-50 dataset show that the proposed adaptation method improves the classification accuracy by 20.40 percentage points compared with the conventional method.

On Synthesizing Data for Context Attribution in Question Answering

April 7, 2025/in Publications/by NEC Labs America

Question Answering (QA) accounts for a significant portion of LLM usage “in the wild”. However, LLMs sometimes produce false or misleading responses, also known as “hallucinations”. Therefore, grounding the generated answers in contextually provided information — i.e., providing evidence for the generated text — is paramount for LLMs’ trustworthiness. Providing this information is the task of context attribution. In this paper, we systematically study LLM-based approaches for this task, namely we investigate (i) zero-shot inference, (ii) LLM ensembling, and (iii) fine-tuning of small LMs on synthetic data generated by larger LLMs. Our key contribution is SynQA: a novel generative strategy for synthesizing context attribution data. Given selected context sentences, an LLM generates QA pairs that are supported by these sentences. This leverages LLMs’ natural strengths in text generation while ensuring clear attribution paths in the synthetic training data. We show that the attribution data synthesized via SynQA is highly effective for fine-tuning small LMs for context attribution in different QA tasks and domains. Finally, with a user study, we validate the usefulness of small LMs (fine-tuned on synthetic data from SynQA) in context attribution for QA.

LLM-based Distributed Code Generation and Cost-Efficient Execution in the Cloud

April 6, 2025/in Publications/by NEC Labs America

The advancement of Generative Artificial Intelligence (AI), particularly Large Language Models (LLMs), is reshaping the software industry by automating code generation. Many LLM-driven distributed processing systems rely on serial code generation constrained by predefined libraries, limiting flexibility and adaptability. While some approaches enhance performance through parallel execution or optimize edge-cloud distributed processing for specific domains, they often overlook the cost implications of deployment, restricting scalability and economic feasibility across diverse cloud environments. This paper presents DiCE-C, a system that eliminates these constraints by starting directly from a natural language query. DiCE-C dynamically identifies available tools at runtime, programmatically refines LLM prompts, and employs a stepwise approachfirst generating serial code and then transforming it into distributed code. This adaptive methodology enables efficient distributed execution without dependence on specific libraries. By leveraging high-level parallelism at the Application Programming Interface (API) level and managing API execution as services within a Kubernetes-based runtime, DiCE-C reduces idle GPU time and facilitates the use of smaller, cost-effective GPU instances. Experiments with a vision-based insurance application demonstrate that DiCE-C reduces cloud operational costs by up to 72% when using smaller GPUs (A6000 and A4000 GPU machines vs. A100 GPU machine) and by 32% when using identical GPUs (A100 GPU machines). This flexible and cost-efficient approach makes DiCE-C a scalable solution for deploying LLM-generated vision applications in cloud environments.

1.2 Tb/s/l Real Time Mode Division Multiplexing Free Space Optical Communication with Commercial 400G Open and Disaggregated Transponders

April 3, 2025/in Publications/by NEC Labs America

We experimentally demonstrate real time mode division multiplexing free space optical communication with commercial 400G open and disaggregated transponders. As proof of concept,using HG00, HG10, and HG01 modes, we transmit 1.2 Tb/s/l (3´1l´400Gb/s) error free.

DiffOptics: A Conditional Diffusion Model for Fiber Optics Sensing Data Imputation

April 3, 2025/in Publications/by NEC Labs America

We present a generative AI framework based on a conditional diffusion model for distributed acoustic sensing (DAS) data imputation. The proposed DiffOptics model generates high-quality DAS data of various acoustic events using telecom fiber cables.

Dual Privacy Protection for Distributed Fiber Sensing with Disaggregated Inference and Fine-tuning of Memory-Augmented Networks

April 3, 2025/in Publications/by NEC Labs America

We propose a memory-augmented model architecture with disaggregated computation infrastructure for fiber sensing event recognition. By leveraging geo-distributed computingresources in optical networks, this approach empowers end-users to customize models while ensuring dual privacy protection.

Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation

TSLA: Unified Time Series and Language Model

CLAP-S: Support Set Based Adaptation for Downstream Fiber-optic Acoustic Recognition

Text-guided Device-realistic Sound Generation for Fiber-based Sound Event Classification

Trainingless Adaptation of Pretrained Models for Environmental Sound Classification

On Synthesizing Data for Context Attribution in Question Answering

1.2 Tb/s/l Real Time Mode Division Multiplexing Free Space Optical Communication with Commercial 400G Open and Disaggregated Transponders

DiffOptics: A Conditional Diffusion Model for Fiber Optics Sensing Data Imputation

Dual Privacy Protection for Distributed Fiber Sensing with Disaggregated Inference and Fine-tuning of Memory-Augmented Networks

Contact Us

About Us

Our Pages

Read Our Blog Posts