ISPRS-Annals

ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences

ISPRS-Annals

ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci.

2194-9050

Copernicus Publications

Göttingen, Germany

10.5194/isprs-annals-IV-1-W1-125-2017

VISUAL TRACKING UTILIZING OBJECT CONCEPT FROM DEEP LEARNING NETWORK

Xiao

¹ Yilmaz

¹ Lia

¹ ²

Photogrammetric Computer Vision Laboratory, The Ohio State University, USA

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

30 05 2017

IV-1/W1 125 132

2017

This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this licence, visit https://creativecommons.org/licenses/by/3.0/

This article is available from https://isprs-annals.copernicus.org/articles/IV-1-W1/125/2017/isprs-annals-IV-1-W1-125-2017.html

The full text article is available as a PDF file from https://isprs-annals.copernicus.org/articles/IV-1-W1/125/2017/isprs-annals-IV-1-W1-125-2017.pdf

Despite having achieved good performance, visual tracking is still an open area of research, especially when target undergoes serious appearance changes which are not included in the model. So, in this paper, we replace the appearance model by a concept model which is learned from large-scale datasets using a deep learning network. The concept model is a combination of high-level semantic information that is learned from myriads of objects with various appearances. In our tracking method, we generate the target’s concept by combining the learned object concepts from classification task. We also demonstrate that the last convolutional feature map can be used to generate a heat map to highlight the possible location of the given target in new frames. Finally, in the proposed tracking framework, we utilize the target image, the search image cropped from the new frame and their heat maps as input into a localization network to find the final target position. Compared to the other state-of-the-art trackers, the proposed method shows the comparable and at times better performance in real-time.