Self-Supervised Object Detection and Retrieval Using Unlabeled Videos.

CVPR Workshops(2020)

引用 11|浏览72
暂无评分
摘要
Learning an object detection or retrieval system requires a large data set with manual annotations. Such data are expensive and time-consuming to create and therefore difficult to obtain on a large scale. In this work, we propose using the natural correlation in narrations and the visual presence of objects in video to learn an object detector and retriever without any manual labeling involved. We pose the problem as weakly supervised learning with noisy labels, and propose a novel object detection and retrieval paradigm under these constraints. We handle the background rejection by using contrastive samples and confront the high level of label noise with a new clustering score. Our evaluation is based on a set of ten objects with manual ground truth annotation in almost 5000 frames extracted from instructional videos from the web. We demonstrate superior results compared to state-of-the-art weakly-supervised approaches and report a strongly-labeled upper bound as well. While the focus of the paper is object detection and retrieval, the proposed methodology can be applied to a broader range of noisy weakly-supervised problems.
更多
查看译文
关键词
self-supervised object retrieval,self-supervised object detection,weakly-supervised problems,supervised approaches,instructional videos,manual ground truth annotation,label noise,retrieval paradigm,novel object detection,noisy labels,weakly supervised learning,manual labeling,object detector,visual presence,natural correlation,manual annotations,data set,unlabeled videos
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要