We aim to recognize actions in drone videos by domain adapting classifiers learned on mostly third person videos. We address both challenges, i.e., domain difference (a) due to visual variation as well as (b) due to different label sets, in the two domains.
We address the problem of human action classification in drone videos. Due to the high cost of capturing and labeling large-scale drone videos with diverse actions, we present unsupervised and semi-supervised domain adaptation approaches that leverage both the existing fully annotated action recognition datasets and unannotated (or only a few annotated) videos from drones. To study the emerging problem of drone-based action recognition, we create a new dataset, NEC-Drone, containing 5,250 videos to evaluate the task. We tackle both problem settings with 1) same and 2) different action label sets for the source (e.g., Kinectics dataset) and target domains (drone videos). We present a combination of video and instance-based adaptation methods, paired with either a classifier or an embedding-based framework to transfer the knowledge from source to target. Our results show that the proposed adaptation approach substantially improves the performance on these challenging and practical tasks.
Unsupervised and Semi-Supervised Domain Adaptation for Action Recognition from
Jinwo Choi, Gaurav Sharma, Manmohan Chandraker and Jia-Bin Huang
In IEEE Winter Conference on Applications of Computer Vision (WACV) 2020 [PDF][Bibtex]
The zip archive for NEC-Drone dataset for action recognition from drones contains:
A readme.txt file
2,079 annotated videos (frames) and their annotations.