Machine Learning | Publications

MACHINE LEARNING

PROJECTS

PEOPLE

PATENTS

Publications

Solving Inverse Problems via a Score-Based Prior: An Approximation-Free Posterior Sampling Approach

June 5, 2025/https://arxiv.org

Diffusion models (DMs) have proven to be effective in modeling high-dimensional distributions, leading to their widespread adoption for representing complex priors in Bayesian inverse problems (BIPs). However, current DM-based posterior sampling methods proposed for solving common BIPs rely on heuristic

A Quantum Variational Autoencoder Utilizing Regularized Mixed-state Latent Representations

April 11, 2025/Physical Review A

A major challenge in near-term quantum computing is its application to large real-world datasets due to scarce quantum hardware resources. One approach to enabling tractable quantum models for such datasets involves finding low-dimensional representations that preserve essential information for downstream

On Synthesizing Data for Context Attribution in Question Answering

April 7, 2025/arXiv

Question Answering (QA) accounts for a significant portion of LLM usage “in the wild”. However, LLMs sometimes produce false or misleading responses, also known as “hallucinations”. Therefore, grounding the generated answers in contextually provided information — i.e., providing evidence for the generated

Attribute-Centric Compositional Text-to-Image Generation

March 13, 2025/International Journal of Computer Vision

Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing thedata distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions,which raises public concerns about their robustness and

Learning Disentangled Equivariant Representation for Explicitly Controllable 3D Molecule Generation

March 4, 2025/The 39th Annual AAAI Conference on Artificial Intelligence

We consider the conditional generation of 3D drug-like molecules with explicit control over molecular properties such as drug-like properties (e.g., Quantitative Estimate of Druglikenessor Synthetic Accessibility score) and effectively binding to specific protein sites. To tackle this problem, we propose

Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection

March 4, 2025/WACV 2025

Action detection aims to detect (recognize and localize) human actions spatially and temporally in videos. Existing approaches focus on the closed-set setting where an action detector is trained and tested on videos from a fixed set of action categories. However, this constrained setting is not viable

Domain-Guided Weight Modulation for Semi-Supervised Domain Generalization

March 3, 2025/WACV 2025

Unarguably deep learning models capable of generalizing to unseen domain data while leveraging a few labels are of great practical significance due to low developmental costs. In search of this endeavor we study the challenging problem of semi-supervised domain generalization (SSDG) where the goal is

Reducing Hallucinations of Medical Multimodal Large Language Models with Visual Retrieval-Augmented Generation

February 25, 2025/https://arxiv.org

Multimodal Large Language Models (MLLMs) have shown impressive performance in vision and text tasks. However, hallucination remains a major challenge, especially in fields like healthcare where details are critical. In this work, we show how MLLMs may be enhanced to support Visual RAG (V-RAG), a retrieval-augmented

Discrete-Continuous Variational Optimization with Local Gradients

December 15, 2024/OPT2024: 16th Annual Workshop on Optimization for Machine Learning (part of NeurIPS 2024)

Variational optimization (VO) offers a general approach for handling objectives which may involve discontinuities, or whose gradients are difficult to calculate. By introducing a variational distribution over the parameter space, such objectives are smoothed, and rendered amenable to VO methods. Local

Subgroup Discovery with the Cox Model

December 15, 2024/NeurIPS 2024 Interpretable AI workshop

We study the problem of subgroup discovery with Cox regression models and introduce a method for finding an interpretable subset of the data on which a Cox model is highly accurate. Our method relies on two technical innovations: the emph (Unknown sysvar: (expected prediction entropy)), a novel metric

Understanding Transcriptional Regulatory Redundancy by Learnable Global Subset Perturbations

December 5, 2024/The 16th Asian Conference on Machine Learning

Transcriptional regulation through cis-regulatory elements (CREs) is crucial for numerous biological functions, with its disruption potentially leading to various diseases. It is well-known that these CREs often exhibit redundancy, allowing them to compensate for each other in response to external

Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection

December 1, 2024/https://arxiv.org

Action detection aims to detect (recognize and localize) human actions spatially and temporally in videos. Existing approaches focus on the closed-set setting where an action detector is trained and tested on videos from a fixed set of action categories. However, this constrained setting is not viable

Matching Confidences and Softened Target Occurrences for Calibration

November 27, 2024/Digital Image Computing: Techniques & Applications (DICTA 2024)

The problem of calibrating deep neural networks (DNNs) is gaining attention, as these networks are becoming central to many real-world applications. Different attempts have been made to counter the poor calibration of DNNs. Amongst others, train-time calibration methods have unfolded as an effective

A Variational Graph Partitioning Approach to Modeling Protein Liquid-liquid Phase Separation

November 20, 2024/Cell Press (journal family)

Graph neural networks (GNNs) have emerged as powerful tools for representation learning. Their efficacy depends on their having an optimal underlying graph. In many cases, the most relevant information comes from specific subgraphs. In this work, we introduce a GNN-based framework (graph-partitioned

Multi-hop Evidence Pursuit Meets the Web: Team Papelo at FEVER 2024

November 15, 2024/Seventh Fact Extraction and VERification Workshop (FEVER)

Separating disinformation from fact on the web has long challenged both the search and the reasoning powers of humans. We show that the reasoning power of large language models (LLMs) and the retrieval power of modern search engines can be combined to automate this process and explainably verify claims.

Variational methods for Learning Multilevel Genetic Algorithms using the Kantorovich Monad

November 15, 2024/https://arxiv.org

Levels of selection and multilevel evolutionary processes are essential concepts in evolutionary theory, and yet there is a lack of common mathematical models for these core ideas. Here, we propose a unified mathematical framework for formulating and optimizing multilevel evolutionary processes and genetic

Exploring the Role of Reasoning Structures for Constructing Proofs in Multi-Step Natural Language Reasoning with Large Language Models

November 12, 2024/The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024)

When performing complex multi-step reasoning tasks, the ability of Large Language Models (LLMs) to derive structured intermediate proof steps is important for ensuring that the models truly perform the desired reasoning and for improving models’ explainability. This paper is centered around a focused

Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment

September 29, 2024/The 18th European Conference on Computer Vision ECCV 2024

Learning to localize temporal boundaries of procedure steps in instructional videos is challenging due to the limited availability of annotated large-scale training videos. Recent works focus on learning the cross-modal alignment between video segments and ASR-transcripted narration texts through contrastive

Predicting Spatially Resolved Gene Expression via Tissue Morphology using Adaptive Spatial GNNs (ECCB)

September 16, 2024/2024 European Conference on Computational Biology (ECCB)

Spatial transcriptomics technologies, which generate a spatial map of gene activity, can deepen the understanding of tissue architecture and its molecular underpinnings in health and disease. However, the high cost makes these technologies difficult to use in practice. Histological images co-registered

MCTR: Multi Camera Tracking Transformer

September 1, 2024/https://arxiv.org

Multi-camera tracking plays a pivotal role in various real-world applications. While end-to-end methods have gained significant interest in single-camera tracking, multi-camera tracking remains predominantly reliant on heuristic techniques. In response to this gap, this paper introduces Multi-Camera

Spatially Informed Gene Signatures for Response to Immunotherapy in Melanoma

August 15, 2024/Clinical Cancer Research

We aim to improve the prediction of response or resistance to immunotherapies in patients with melanoma. This goal is based on the hypothesis that current gene signatures predicting immunotherapy outcomes show only modest accuracy due to the lack of spatial information about cellular functions and molecular

zeta-QVAE: A Quantum Variational Autoencoder utilizing Regularized Mixed-state Latent Representations

August 1, 2024/https://arxiv.org

A major challenge in near-term quantum computing is its application to large real-world datasets due to scarce quantum hardware resources. One approach to enabling tractable quantum models for such datasets involves compressing the original data to manageable dimensions while still representing essential

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

July 21, 2024/The 41st International Conference on Machine Learning (ICML 2024)

We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM). Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world

LLMs and MI Bring Innovation to Material Development Platforms

July 3, 2024/NEC Technical Journal, Special Issue on Revolutionizing Business Practices with Generative AI

In this paper, we introduce efforts to apply large language models (LLMs) to the field of material development. NEC is advancing the development of a material development platform. By applying core technologies corresponding to two material development steps, namely investigation activities (Read paper/patent)

Weakly-Supervised Temporal Action Localization with Multi-Modal Plateau Transformers

June 18, 2024/CVPR 2024 3rd Workshop on Learning with Limited Labelled Data for Image and Video Understanding

Weakly Supervised Temporal Action Localization (WSTAL) aims to jointly localize and classify action segments in untrimmed videos with only video level annotations. To leverage video level annotations most existing methods adopt the multiple instance learning paradigm where frame/snippet level action

Learning from Synthetic Human Group Activities

June 17, 2024/CVPR 2024

The study of complex human interactions and group activities has become a focal point in human-centric computer vision. However, progress in related tasks is often hindered by the challenges of obtaining large-scale labeled datasets from real-world scenarios. To address the limitation, we introduce M3Act,

Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos

June 17, 2024/CVPR 2024

In this paper we explore the capability of an agent to construct a logical sequence of action steps thereby assembling a strategic procedural plan. This plan is crucial for navigating from an initial visual observation to a target visual outcome as depicted in real-life instructional videos. Existing

Predicting Spatially Resolved Gene Expression via Tissue Morphology using Adaptive Spatial GNNs

June 3, 2024/https://www.biorxiv.org

Motivation Spatial transcriptomics technologies, which generate a spatial map of gene activity, can deepen the understanding of tissue architecture and its molecular underpinnings in health and disease. However, the high cost makes these technologies difficult to use in practice. Histological images

Improving Test-Time Adaptation For Histopathology Image Segmentation: Gradient-To-Parameter Ratio Guided Feature Alignment

May 28, 2024/21st IEEE International Symposium on Biomedical Imaging (ISBI 2024)

In the field of histopathology, computer-aided systems face significant challenges due to diverse domain shifts. They include variations in tissue source organ, preparation and scanningprotocols. These domain shifts can significantly impact algorithms performance in histopathology tasks, such as cancer

Impeller: A Path-based Heterogeneous Graph Learning Method for Spatial Transcriptomic Data Imputation

May 28, 2024/Bioinformatics

Recent advances in spatial transcriptomics allow spatially resolved gene expression measurements with cellular or even sub-cellular resolution, directly characterizing the complex spatiotemporal gene expression landscape and cell-to-cell interactions in their native microenvironments. Due to technology

Evaluating Cellularity Estimation Methods: Comparing AI Counting with Pathologists’ Visual Estimates

May 28, 2024/Diagnostics

The development of next-generation sequencing (NGS) has enabled the discovery of cancer-specific driver gene alternations, making precision medicine possible. However, accurategenetic testing requires a sufficient amount of tumor cells in the specimen. The evaluation of tumor content ratio (TCR) from

Strategic Preys Make Acute Predators: Enhancing Camouflaged Object Detectors by Generating Camouflaged Objects

May 7, 2024/ICLR 2024

Camouflaged object detection (COD) is the challenging task of identifying camouflaged objects visually blended into surroundings. Albeit achieving remarkable success, existing COD detectors still struggle to obtain precise results in some challenging cases. To handle this problem, we draw inspiration

Provable Membership Inference Privacy

April 9, 2024/Transactions on Machine Learning Research

In applications involving sensitive data, such as finance and healthcare, the necessity for preserving data privacy can be a significant barrier to machine learning model development.Differential privacy (DP) has emerged as one canonical standard for provable privacy. However, DPs strong theoretical

Self-Consistent Decoding for More Factual Open Responses

February 29, 2024/https://arxiv.org

Self-consistency has emerged as a powerful method for improving the accuracy of short answers generated by large language models. As previously defined, it only concerns the accuracy of a final answer parsed from generated text. In this work, we extend the idea to open response generation, by integrating

Disentangled Wasserstein Autoencoder for T-Cell Receptor Engineering

December 10, 2023/NeurIPS 2023

In protein biophysics, the separation between the functionally important residues (forming the active site or binding surface) and those that create the overall structure (the fold) is a well-established and fundamental concept. Identifying and modifying those functional sites is critical for protein

Weakly-supervised Concealed Object Segmentation with SAM-based Pseudo Labeling and Multi-scale Feature Grouping

December 10, 2023/NeurIPS 2023

Weakly-Supervised Concealed Object Segmentation (WSCOS) aims to segment objects well blended with surrounding environments using sparsely-annotated data for model training. It remains a challenging task since (1) it is hard to distinguish concealed objects from the background due to the intrinsic similarity

Source-Free Domain Adaptive Fundus Image Segmentation with Class-Balanced Mean Teacher

October 9, 2023/MICCAI 2023

This paper studies source-free domain adaptive fundus image segmentation which aims to adapt a pretrained fundus segmentation model to a target domain using unlabeled images. This is a challenging task because it is highly risky to adapt a model only using unlabeled data. Most existing methods tackle

Degradation-Resistant Unfolding Network for Heterogeneous Image Fusion

October 2, 2023/ICCV 2023

Heterogeneous image fusion (HIF) aims to enhance image quality by merging complementary information of images captured by different sensors. Early model-based approaches have strong interpretability while being limited by non-adaptive feature extractors with poor generalizability.

Few-Shot Video Classification via Representation Fusion and Promotion Learning

October 2, 2023/ICCV 2023

Recent few-shot video classification (FSVC) works achieve promising performance by capturing similarity across support and query samples with different temporal alignment strategies or learning discriminative features via Transformer block within each episode. However, they ignore two important issues:

MSI: Maximize Support-Set Information for Few-Shot Segmentation

October 2, 2023/ICCV 2023

Few-Shot Segmentation FSS (Few-shot segmentation) aims to segment a target class using a small number of labeled images (support set). To extract information relevant to the target class, a dominant approach in best performing FSS methods removes background features using a support mask. We observe that

Personalized Semantics Excitation for Federated Image Classification

October 2, 2023/ICCV 2023

Federated learning casts a light on the collaboration of distributed local clients with privacy protected to attain a more generic global model. However, significant distribution shift in input/label space across different clients makes it challenging to well generalize to all clients, which motivates

Automatically Evaluating Opinion Prevalence in Opinion Summarization

August 7, 2023/ECNLP 6 (KDD 2023)

When faced with a large number of product reviews, it is not clear that a human can remember all of them and weight opinions representatively to write a good reference summary. Wepropose an automatic metric to test the prevalence of the opinions that a summary expresses, based on counting the number

Improving Cross-Domain Detection with Self-Supervised Learning

June 19, 2023/CVPR 2023, 2nd Workshop on Learning with Limited Labelled Data for Image and Video Understanding

Cross-Domain Detection (XDD) aims to train a domain-adaptive object detector using unlabeled images from a target domain and labeled images from a source domain. Existing approaches achieve this either by aligning the feature maps or the region proposals from the two domains, or by transferring the style

Camouflaged Object Detection with Feature Decomposition and Edge Reconstruction

June 18, 2023/CVPR 2023

Camouflaged object detection (COD) aims to address the tough issue of identifying camouflaged objects visually blended into the surrounding backgrounds. COD is a challenging task due to the intrinsic similarity of camouflaged objects with the background, as well as their ambiguous boundaries. Existing

Conditional Image-to-Video Generation with Latent Flow Diffusion Models

June 18, 2023/CVPR 2023

Conditional image-to-video (cI2V) generation aims to synthesize a new plausible video starting from an image (e.g., a person’s face) and a condition (e.g., an action class label like smile). The key challenge of the cI2V task lies in the simultaneous generation of realistic spatial appearance and temporal

Exploring Compositional Visual Generation with Latent Classifier Guidance

June 18, 2023/CVPR 2023 - Generative Models for Computer Vision Workshop

Diffusion probabilistic models have achieved enormous success in the field of image generation and manipulation. In this paper, we explore a novel paradigm of using the diffusion model and classifier guidance in the latent semantic space for compositional visual tasks. Specifically, we train latent diffusion

Source-Free Video Domain Adaptation with Spatial-Temporal-Historical Consistency Learning

June 18, 2023/CVPR 2023

Source-free domain adaptation (SFDA) is an emerging research topic that studies how to adapt a pretrained source model using unlabeled target data. It is derived from unsupervised domain adaptation but has the advantage of not requiring labeled source data to learn adaptive models. This makes it particularly

Towards Realizing the Value of Labeled Target Samples: a Two-Stage Approach for Semi-Supervised Domain Adaptation

June 4, 2023/2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)

Semi-Supervised Domain Adaptation (SSDA) is a recently emerging research topic that extends from the widely-investigated Unsupervised Domain Adaptation (UDA) by further having a few target samples labeled, i.e., the model is trained with labeled source samples, unlabeled target samples as well as a few

T-Cell Receptor Optimization with Reinforcement Learning and Mutation Polices for Precision Immunotherapy

April 16, 2023/RECOMB 2023

T cells monitor the health status of cells by identifying foreign peptides displayed on their surface. T-cell receptors (TCRs), which are protein complexes found on the surface of T cells, are able to bind to these peptides. This process is known as TCR recognition and constitutes a key step for immune

Adversarial Alignment for Source Free Object Detection

February 7, 2023/AAAI 2023

Source-free object detection (SFOD) aims to transfer a detector pre-trained on a label-rich source domain to an unlabeled target domain without seeing source data. While most existing SFOD methods generate pseudo labels via a source-pretrained model to guide training, these pseudo labels usually contain

Binding Peptide Generation for MHC Class I Proteins with Deep Reinforcement Learning

January 23, 2023/Bioinformatics

Motivation: MHC Class I protein plays an important role in immunotherapy by presenting immunogenic peptides to anti-tumor immune cells. The repertoires of peptides for various MHC Class I proteins are distinct, which can be reflected by their diverse binding motifs. To characterize binding motifs for

Real-time ConcealedWeapon Detection on 3D Radar Images forWalk-through Screening System

January 3, 2023/WACV 2023

This paper presents a framework for real-time concealed weapon detection (CWD) on 3D radar images for walk-through screening systems. The walk-through screening system aims to ensure security in crowded areas by performing CWD on walking persons, hence it requires an accurate and real-time detection

On TCR Binding Predictors Failing to Generalize to Unseen Peptides

January 2, 2023/Frontiers in Immunology

Several recent studies investigate TCR-peptide/-pMHC binding prediction using machine learning or deep learning approaches. Many of these methods achieve impressive results on test sets, which include peptide sequences that are also included in the training set. In this work, we investigate how state

Attentive Variational Information Bottleneck for TCR–peptide interaction prediction

December 26, 2022/Bioinformatics

We present a multi-sequence generalization of Variational Information Bottleneck and call the resulting model Attentive Variational Information Bottleneck (AVIB). Our AVIB model leverages multi-head self-attention to implicitly approximate a posterior distribution over latent encodings conditioned on

KGxBoard: Explainable and Interactive Leaderboard for Evaluation of Knowledge Graph Completion Models

December 7, 2022/2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Knowledge Graphs (KGs) store information in the form of (head, predicate, tail)-triples. To augment KGs with new knowledge, researchers proposed models for KG Completion (KGC) tasks such as link prediction, i.e., answering (h, p, ?) or (?, p, t) queries. Such models are usually evaluated with averaged

COMPOSER: Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality

October 17, 2022/ECCV 2022

Group Activity Recognition detects the activity collectively performed by a group of actors, which requires compositional reasoning of actors and objects. We approach the task by modeling the video as tokens that represent the multi-scale semantic concepts in the video. We propose COMPOSER, a Multiscale

Unsupervised Anomaly Detection with Self-Training and Knowledge Distillation

October 16, 2022/IEEE International Conference in Image Processing

Anomaly Detection (AD) aims to find defective patterns or abnormal samples among data, and has been a hot research topic due to various real-world applications. While various AD methods have been proposed, most of them assume the availability of a clean (anomaly-free) training set, which, however, may

Analyzing Coreference and Bridging in Product Reviews

October 12, 2022/CRAC 2022, the Fifth Workshop on Computational Models of Reference, Anaphora and Coreference

Product reviews may have complex discourse including coreference and bridging relations to a main product, competing products, and interacting products. Current approaches to aspect-based sentiment analysis (ABSA) and opinion summarization largely ignore this complexity. On the other hand, existing systems

T-Cell Receptor-Peptide Interaction Prediction with Physical Model Augmented Pseudo-Labeling

August 14, 2022/KDD 2022

Predicting the interactions between T-cell receptors (TCRs) and peptides is crucial for the development of personalized medicine and targeted vaccine in immunotherapy. Current datasets for training deep learning models of this purpose remain constrained without diverse TCRs and peptides. To combat the

StyleT2I: Towards Compositional and High-Fidelity Text-to-Image Synthesis

June 21, 2022/CVPR 2022

Although progress has been made for text-to-image synthesis, previous methods fall short of generalizing to unseen or underrepresented attribute compositions in the input text. Lacking compositionality could have severe implications for robustness and fairness, e.g., inability to synthesize the face

Self-supervised Video Representation Learning with Cascade Positive Retrieval

June 19, 2022/CVPR: Workshop on Learning with Limited Labelled Data for Image and Video Understanding

Self-supervised video representation learning has been shown to effectively improve downstream tasks such as video retrieval and action recognition. In this paper, we present the Cascade Positive Retrieval (CPR) that successively mines positive examples w.r.t. the query for contrastive learning in a

Fast Few-shot Debugging for NLU Test Suites

May 26, 2022/Deep Learning Inside Out workshop at ACL 2022

We study few-shot debugging of transformer based natural language understanding models, using recently popularized test suites to not just diagnose but correct a problem. Given a few debugging examples of a certain phenomenon, and a held-out test set of the same phenomenon, we aim to maximize accuracy

AE-StyleGAN: Improved Training of Style-Based Auto-Encoders

January 4, 2022/WACV 2022

StyleGANs have shown impressive results on data generation and manipulation in recent years, thanks to its disentangled style latent space. A lot of efforts have been made in inverting a pretrained generator, where an encoder is trained ad hoc after the generator is trained in a two-stage fashion. In

SplitBrain: Hybrid Data and Model Parallel Deep Learning

January 3, 2022/arXiv

The recent success of deep learning applications has coincided with those widely available powerful computational resources for training sophisticated machine learning models with huge datasets. Nonetheless, training large models such as convolutional neural networks using model parallelism (as opposed

A Deep Generative Model for Molecule Optimization via One Fragment Modification

January 1, 2022/Nature Machine Intelligence

Molecule optimization is a critical step in drug development to improve the desired properties of drug candidates through chemical modification. We have developed a novel deep generative model, Modof, over molecular graphs for molecule optimization. Modof modifies a given molecule through the prediction

Retrieval, Analogy, and Composition: A framework for Compositional Generalization in Image Captioning

November 7, 2021/EMNLP 2021

Image captioning systems are expected to have the ability to combine individual concepts when describing scenes with concept combinations that are not observed during training. In spite of significant progress in image captioning with the help of the autoregressive generation framework, current approaches

Team Papelo at FEVEROUS: Multi-hop Evidence Pursuit

November 7, 2021/The Fourth Workshop on Fast Extraction and Verificiation (FEVER)

We develop a system for the FEVEROUS fact extraction and verification task that ranks an initial set of potential evidence and then pursues missing evidence in subsequent hops by trying to generate it, with a “next hop prediction module” whose output is matched against page elements in a predicted

Prediction of Non-Muscle Invasive Bladder Cancer Recurrence using Machine Learning of Quantitative Nuclear Features

October 29, 2021/Modern Pathology

Non-muscle invasive bladder cancer (NMIBC) generally has a good prognosis, however, recurrence after transurethral resection (TUR), the standard primary treatment, is a major problem. Clinical management after TUR has been based on risk classification using clinicopathological factors, but these classifications

Dual Projection Generative Adversarial Networks for Conditional Image Generation

October 11, 2021/ICCV 2021

Conditional Generative Adversarial Networks (cGANs) extend the standard unconditional GAN framework to learning joint data-label distributions from samples, and have been established as powerful generative models capable of generating high-fidelity imagery. A challenge of training such a model lies in

Learning Higher-order Object Interactions for Keypoint-based Video Understanding

October 11, 2021/SRVU - ICCV 2021 Workshop

Action recognition is an important problem that requires identifying actions in video by learning complex interactions across scene actors and objects. However, modern deep-learning based networks often require significant computation and may capture scene context using various modalities that further

Towards Robustness of Deep Neural Networks via Networks via Regularization

October 11, 2021/ICCV 2021

Recent studies have demonstrated the vulnerability of deep neural networks against adversarial examples. In-spired by the observation that adversarial examples often lie outside the natural image data manifold and the intrinsic dimension of image data is much smaller than its pixel space dimension, we

Overcoming Poor Word Embeddings with Word Definitions

August 5, 2021/SEM 2021 Workshop at ACL-IJCNLP 2021

Modern natural language understanding models depend on pretrained subword embeddings, but applications may need to reason about words that were never or rarely seen during pretraining. We show that examples that depend critically on a rarer word are more challenging for natural language inference models.

DECODE: A Deep-learning Framework for Condensing Enhancers and Refining Boundaries with Large-scale Functional Assays

July 25, 2021/ISMB/ECCB 2021

MotivationMapping distal regulatory elements, such as enhancers, is a cornerstone for elucidating how genetic variations may influence diseases. Previous enhancer-prediction methods have used either unsupervised approaches or supervised methods with limited training data. Moreover, past approaches have

Disentangled Recurrent Wasserstein Auto-Encoder

May 4, 2021/ICLR 2021

Learning disentangled representations leads to interpretable models and facilitates data generation with style transfer, which has been extensively studied on static data such as images in an unsupervised learning framework. However, only a few works have explored unsupervised disentangled sequential

Hopper: Multi-hop Transformer for Spatio-Temporal Reasoning

May 4, 2021/ICLR 2021

This paper considers the problem of spatiotemporal object-centric reasoning in videos. Central to our approach is the notion of object permanence, i.e., the ability to reason about the location of objects as they move through the video while being occluded, contained or carried by other objects. Existing

Ranking-based Convolutional Neural Network Models for Peptide-MHC Binding Prediction

May 1, 2021/Frontiers in Molecular Biosciences - Biological Modeling and Simulation

T-cell receptors can recognize foreign peptides bound to major histocompatibility complex (MHC) class-I proteins, and thus trigger the adaptive immune response. Therefore, identifying peptides that can bind to MHC class-I molecules plays a vital role in the design of peptide vaccines. Many computational

A Multi-Scale Conditional Deep Model for Tumor Cell Ratio Counting

February 17, 2021/SPIE Medical Imaging 2021: MI109: Digital and Computational Pathology

We propose a method to accurately obtain the ratio of tumor cells over an entire histological slide. We use deep fully convolutional neural network models trained to detect and classify cells on images of H&E-stained tissue sections. Pathologists’ labels consisting of exhaustive nuclei locations and

Improving neural network robustness through neighborhood preserving layers

January 15, 2021/Manifold Learning from Euclid to Riemann: Workshop at ICPR 2021

One major source of vulnerability of neural nets in classification tasks is from overparameterized fully connected layers near the end of the network. In this paper, we propose a new neighborhood preserving layer which can replace these fully connected layers to improve the network robustness. Networks

Prediction of Early Recurrence of Hepatocellular Carcinoma after Resection using Digital Pathology Images Assessed by Machine Learning

September 18, 2020/Modern Pathology

Hepatocellular carcinoma (HCC) is a representative primary liver cancer caused by long-term and repetitive liver injury. Surgical resection is generally selected as the radical cure treatment. Because the early recurrence of HCC after resection is associated with low overall survival, the prediction

Model-Based Autoencoders for Imputing Discrete single-cell RNA-seq Data

September 16, 2020/Methods (Elsevier)

Deep neural networks have been widely applied for missing data imputation. However, most existing studies have been focused on imputing continuous data, while discrete data imputation is under-explored. Discrete data is common in real world, especially in research areas of bioinformatics, genetics, and

Tripping Through Time: Efficient Localization of Activities in Videos

September 11, 2020/BMVC 2020

Localizing moments in untrimmed videos via language queries is a new and interesting task that requires the ability to accurately ground language into video. Previous works have approached this task by processing the entire video, often more than once, to localize relevant activities. In the real world

Improving Disentangled Text Representation Learning with Information Theoretical Guidance

July 5, 2020/ACL 2020

Learning disentangled representations of natural language is essential for many NLP tasks, e.g., conditional text generation, style transfer, personalized dialogue systems, etc. Similar problems have been studied extensively for other forms of data, such as images and videos. However, the discrete nature

15 Keypoints Is All You Need

June 14, 2020/CVPR 2020

Pose-tracking is an important problem that requires identifying unique human pose-instances and matching them temporally across different frames in a video. However, existing pose-tracking methods are unable to accurately model temporal relationships and require significant computation, often computing

S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement and Data Generation

June 14, 2020/CVPR 2020

We propose a sequential variational autoencoder to learn disentangled representations of sequential data (e.g., videos and audios) under self-supervision. Specifically, we exploit the benefits of some readily accessible supervision signals from input data itself or some off-the-shelf functional models

Generating Followup Questions for Interpretable Multi hop Question Answering

March 31, 2020/arXiv

We propose a framework for answering open domain multi hop questions in which partial information is read and used to generate followup questions, to finally be answered by a pretrained single hop answer extractor. This framework makes each hop interpretable, and makes the retrieval associated with later

Contextual Grounding of Natural Language Entities in Images

December 13, 2019/NeurIPS 2019 workshop on Visually Grounded Interaction and Language (ViGIL)

In this paper, we introduce a contextual grounding approach that captures the context in corresponding text entities and image regions to improve the grounding accuracy. Specifically, the proposed architecture accepts pre-trained text token embeddings and image object features from an off-the-shelf object

Contextual Grounding of Natural Language Phrases in Images

November 5, 2019/arXiv

In this paper, we introduce a contextual grounding approach that captures the context in corresponding text entities and image regions to improve the grounding accuracy. Specifically, the proposed architecture accepts pre-trained text token embeddings and image object features from an off-the-shelf object

On Novel Object Recognition: A Unified Framework for Discriminability and Adaptability

November 4, 2019/CIKM 2019

The rich and accessible labeled data fueled the revolutionary successes of deep learning in object recognition. However, recognizing objects of novel classes with limited supervision information provided, i.e., Novel Object Recognition (NOR), remains a challenging task. We identify in this paper two

Rethinking Zero-Shot Learning: A Conditional Visual Classification Perspective

October 27, 2019/ICCV 2019

Zero-shot learning (ZSL) aims to recognize instances of unseen classes solely based on the semantic descriptions of the classes. Existing algorithms usually formulate it as a semantic-visual correspondence problem, by learning mappings from one feature space to the other. Despite being reasonable, previous

Conditional GAN with Discriminative Filter Generation for Text-to-Video Synthesis

August 10, 2019/IJCAI 2019

Developing conditional generative models for text-to-video synthesis is an extremely challenging yet an important topic of research in machine learning. In this work, we address this problem by introducing Text-Filter conditioning Generative Adversarial Network (TFGAN), a conditional GAN model with a

Learning K-way D-dimensional Discrete Embedding for Hierarchical Data Visualization and Retrieval

August 10, 2019/IJCAI 2019

Traditional embedding approaches associate a real-valued embedding vector with each symbol or data point, which is equivalent to applying a linear transformation to “one-hot” encoding of discrete symbols or data objects. Despite simplicity, these methods generate storage-inefficient representations

Tripping Through Time: Efficient Temporal Localization of Activities in Videos

May 16, 2019/CVPR 2019

Localizing moments in untrimmed videos using language queries is a new task that requires the ability to accurately ground language into video. Existing approaches process the video, often more than once, to localize the activities and are inefficient. In this paper, we present TripNet, an end-to-end

A Deep Spatio-Temporal Fuzzy Neural Network for Passenger Demand Prediction

May 2, 2019/SDM 2019

In spite of its importance, passenger demand prediction is a highly challenging problem, because the demand is simultaneously influenced by the complex interactions among many spatial and temporal factors and other external factors such as weather. To address this problem, we propose a Spatio-TEmporal

Visual Entailment: A Novel Task for Fine-Grained Image Understanding

February 15, 2019/arXiv

Existing visual reasoning datasets, such as Visual Question Answering (VQA), often suffer from biases conditioned on the question, image or answer distributions. The recently proposed CLEVR dataset addresses these limitations and requires fine-grained reasoning, but the dataset is synthetic and consists

Visual Entailment Task for Visually-Grounded Language Learning

December 7, 2018/NeurIPS 2018 workshop on Visually Grounded Interaction and Language (ViGIL)

We introduce a new inference task – Visual Entailment (VE) – which differs from traditional Textual Entailment (TE) tasks whereby a premise is defined by an image, rather than a natural language sentence as in TE tasks. A novel dataset SNLI-VE is proposed for VE tasks based on the Stanford Natural Language

Optimal Transport Classifier: Defending Against Adversarial Attacks by Regularized Deep Embedding

December 3, 2018/arXiv

Recent studies have demonstrated the vulnerability of deep convolutional neural networks against adversarial examples. Inspired by the observation that the intrinsic dimension of image data is much smaller than its pixel space dimension and the vulnerability of neural networks grows with the input dimension,

Leveraging Knowledge Bases for Future Prediction with Memory Comparison Networks

November 15, 2018/AI Communications

Making predictions about what might happen in the future is important for reacting adequately in many situations. For example, observing that Man kidnaps girl may have the consequence that Man kills girl. While this is part of common sense reasoning for humans, it is not obvious how machines

Learning Context-Sensitive Convolutional Filters for Text Processing

October 31, 2018/EMNLP 2018

Convolutional neural networks (CNNs) have recently emerged as a popular building block for natural language processing (NLP). Despite their success, most existing CNN models employed in NLP share the same learned (and static) set of filters for all input sentences. In this paper, we consider an approach

Team Papelo: Transformer Networks at FEVER

October 31, 2018/EMNLP 2018

We develop a system for the FEVER fact extraction and verification challenge that uses a high precision entailment classifier based on transformer networks pretrained with language modeling, to classify a broad set of potential evidence. The precision of the entailment classifier allows us to enhance

Teaching Syntax by Adversarial Distraction

October 31, 2018/EMNLP 2018

Existing entailment datasets mainly pose problems which can be answered without attention to grammar or word order. Learning syntax requires comparing examples where different grammar and word order change the desired classification. We introduce several datasets based on synthetic transformations of

Parametric t-Distributed Stochastic Exemplar-centered Embedding

September 10, 2018/ECML 2018

Parametric embedding methods such as parametric t-distributed Stochastic Neighbor Embedding (pt-SNE) enables out-of-sample data visualization without further computationally expensive optimization or approximation. However, pt-SNE favors small mini-batches to train a deep neural network but large mini-batches

DeepConf: Automating Data Center Network Topologies Management with Machine Learning

August 24, 2018/ACM SIGCOMM 2018 Workshop on Network Meets AI & ML (NetAI 2018)

In recent years, many techniques have been developed to improve the performance and efficiency of data center networks. While these techniques provide high accuracy, they are often designed using heuristics that leverage domain-specific properties of the workload or hardware.In this vision paper, we

Baseline Needs More Love: On SimpleWord-Embedding-Based Models and Associated Pooling Mechanisms

July 15, 2018/ACL 2018

Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring substantial number of parameters and expensive computations. However, there has not been a rigorous evaluation regarding the added value of sophisticated compositional functions. In this paper,

Learning K-way D-dimensional Discrete Code For Compact Embedding Representations

July 10, 2018/ICML 2018

Conventional embedding methods directly associate each symbol with a continuous embedding vector, which is equivalent to applying a linear transformation based on a one-hot encoding of the discrete symbols. Despite its simplicity, such approach yields the number of parameters that grows linearly

Attend and Interact: Higher-Order Object Interactions for Video Understanding

June 18, 2018/CVPR 2018

Human actions often involve complex interactions across several inter-related objects in the scene. However, existing approaches to fine-grained video understanding or visual relationship detection often rely on single object representation or pairwise object relationships. Furthermore, learning interactions

Adaptive Feature Abstraction for Translating Video to Text

February 2, 2018/The Thirty-Second AAAI Conference on Artificial Intelligence

Previous models for video captioning often use the output from a specific layer of a Convolutional Neural Network (CNN) as video features. However, the variable context-dependent semantics in the video may make it more appropriate to adaptively select features from the multiple CNN layers. We propose

Video Generation From Text

February 2, 2018/The Thirty-Second AAAI Conference on Artificial Intelligence

Generating videos from text has proven to be a significant challenge for existing generative models. We tackle this problem by training a conditional generative model to extract both static and dynamic information from text. This is manifested in a hybrid framework, employing a Variational Autoencoder

Adaptive Memory Networks

February 1, 2018/arXiv

Adaptive Memory Networks We present Adaptive Memory Networks (AMN) that processes input-question pairs to dynamically construct a network architecture optimized for lower inference times for Question Answering (QA) tasks. AMN processes the input story to extract entities and stores them in memory banks.