The University of Warwick, established in 1965, is a British campus university located on the outskirts of Coventry, England. Despite its relatively young age, Warwick is regarded as one of the country’s leading institutions, highly ranked for teaching quality and research, with strong links to top companies for internships and career opportunities. NEC Laboratories America and the University of Warwick collaborated on the development of generative models, with a focus on image quality and training stability. Our research refined the architecture and optimization of GANs, contributing to more reliable and adaptable synthetic image generation.

Posts

Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zero-shot Classification and Retrieval of Videos

We present an audio-visual multimodal approach for the task of zero-shot learning (ZSL) for classification and retrieval of videos. ZSL has been studied extensively in the recent past but has primarily been limited to visual modality and to images. We demonstrate that both audio and visual modalities are important for ZSL for videos. Since a dataset to study the task is currently not available, we also construct an appropriate multimodal dataset with 33 classes containing 156, 416 videos, from an existing large scale audio event dataset. We empirically show that the performance improves by adding audio modality for both tasks of zero-shot classification and retrieval, when using multi-modal extensions of embedding learning methods. We also propose a novel method to predict the `dominant’ modality using a jointly learned modality attention network. We learn the attention in a semi-supervised setting and thus do not require any additional explicit labelling for the modalities. We provide qualitative validation of the modality specific attention, which also successfully generalizes to unseen test classes.