Person Re-identification and Tracking in Video Surveillance

Xie, Yanchun
(2020) Person Re-identification and Tracking in Video Surveillance. PhD thesis, University of Liverpool.

[img] Text
201203865_Jun2020.pdf - Unspecified

Download (26MB) | Preview


Video surveillance system is one of the most essential topics in the computer vision field. As the rapid and continuous increasement of using video surveillance cameras to obtain portrait information in scenes, it becomes a very important system for security and criminal investigations. Video surveillance system includes many key technologies, including the object recognition, the object localization, the object re-identification, object tracking, and by which the system can be used to identify or suspect the movements of the objects and persons. In recent years, person re-identification and visual object tracking have become hot research directions in the computer vision field. The re-identification system aims to recognize and identify the target of the required attributes, and the tracking system aims at following and predicting the movement of the target after the identification process. Researchers have used deep learning and computer vision technologies to significantly improve the performance of person re-identification. However, the study of person re-identification is still challenging due to complex application environments such as lightning variations, complex background transformations, low-resolution images, occlusions, and a similar dressing of different pedestrians. The challenge of this task also comes from unavailable bounding boxes for pedestrians, and the need to search for the person over the whole gallery images. To address these critical issues in modern person identification applications, we propose an algorithm that can accurately localize persons by learning to minimize intra-person feature variations. We build our model upon the state-of-the-art object detection framework, i.e., faster R-CNN, so that high-quality region proposals for pedestrians can be produced in an online manner. In addition, to relieve the negative effects caused by varying visual appearances of the same individual, we introduce a novel center loss that can increase the intra-class compactness of feature representations. The engaged center loss encourages persons with the same identity to have similar feature characteristics. Besides the localization of a single person, we explore a more general visual object tracking problem. The main task of the visual object tracking is to predict the location and size of the tracking target accurately and reliably in subsequent image sequences when the target is given at the beginning of the sequence. A visual object tracking algorithm with high accuracy, good stability, and fast inference speed is necessary. In this thesis, we study the updating problem for two kinds of tracking algorithms among the mainstream tracking approaches, and improve the robustness and accuracy. Firstly, we extend the siamese tracker with a model updating mechanism to improve their tracking robustness. A siamese tracker uses a deep convolutional neural network to obtain features and compares the new frame features with the target features in the first frame. The candidate region with the highest similarity score is considered as the tracking result. However, these kinds of trackers are not robust against large target variation due to the no-update matching strategy during the whole tracking process. To combat this defect, we propose an ensemble siamese tracker, where the final similarity score is also affected by the similarity with tracking results in recent frames instead of solely considering the first frame. Tracking results in recent frames are used to adjust the model for a continuous target change. Meanwhile, we combine adaptive candidate sampling strategy and large displacement optical flow method to improve its performance further. Secondly, we investigate the classic correlation filter based tracking algorithm and propose to provide a better model selection strategy by reinforcement learning. Correlation filter has been proven to be a useful tool for a number of approaches in visual tracking, particularly for seeking a good balance between tracking accuracy and speed. However, correlation filter based models are susceptible to wrong updates stemming from inaccurate tracking results. To date, little effort has been devoted to handling the correlation filter update problem. In our approach, we update and maintain multiple correlation filter models in parallel, and we use deep reinforcement learning for the selection of an optimal correlation filter model among them. To facilitate the decision process efficiently, we propose a decision-net to deal with target appearance modeling, which is trained through hundreds of challenging videos using proximal policy optimization and a lightweight learning network. An exhaustive evaluation of the proposed approach on the OTB100 and OTB2013 benchmarks show the effectiveness of our approach.

Item Type: Thesis (PhD)
Divisions: Faculty of Science and Engineering > School of Electrical Engineering, Electronics and Computer Science
Depositing User: Symplectic Admin
Date Deposited: 14 Aug 2020 09:15
Last Modified: 19 May 2022 11:38
DOI: 10.17638/03090640