Salient Object Detection and Segmentation in Video Surveillance



Yu, Siyue
(2022) Salient Object Detection and Segmentation in Video Surveillance. PhD thesis, University of Liverpool.

[img] Text
Siyue Yu_201113751.pdf - Unspecified

Download (12MB) | Preview

Abstract

Video surveillance outputs different portrait information of scenes such as crime investigation, security system, automatic driving system, and environmental monitoring. Recently, deep learning based video surveillance is also an essential topic in computer vision. The specific tasks include object tracking, video object segmentation, salient object detection, and video salient object detection. Thus, this thesis studies salient object detection and segmentation in video surveillance, mainly on video object segmentation and salient object detection. In video object segmentation, we study the case of given the first frame's mask and try to design a network that can adapt to different object appearance variations. Therefore, this thesis proposes a framework based on the non-local attention mechanism to localize and segment the target object in the current frame, referring to both the first frame with its given mask and the previous frame with its predicted mask. Our approach can achieve 86.5$\%$ IoU on DAVIS-2016 and 72.2$\%$ IoU on DAVIS-2017, with a speed of 0.11s per frame. Then for salient object detection, this thesis focuses on scribble annotations. However, scribbles fail to contain enough integral appearance information. To solve this problem. A local saliency coherence loss is proposed to assist partial cross-entropy loss and thereby help the network learn more complete object information. Further, A self-consist mechanism is designed to help the network not sensitive to different input scales. Our method can achieve comparable results compared with fully supervised methods. Our method achieves a new state-of-the-art performance on six benchmarks (e.g. for the ECSSD dataset: F_beta = 0.8995, E_xi = 0.9079 and MAE= 0.0489). Lastly, co-salient object detection is also studied. Recent methods explore both intra- and inter-image consistency through an attention mechanism. We find that existing attention mechanisms can only focus on limited related pixels. Thus, we propose a new framework with a self-contrastive loss to mine more related pixels to obtain comprehensive features. Our method obtains 0.598 for maximum F-measure for COCA. In this way, the tasks in this thesis are well handled and our methods can serve as new baselines for future works.

Item Type: Thesis (PhD)
Uncontrolled Keywords: Video Object Segmentation, Salient Object Detection, Co-salient Object Detection, Pixel Matching, Attention Mechanism, Feature Mining
Divisions: Faculty of Science and Engineering > School of Electrical Engineering, Electronics and Computer Science
Depositing User: Symplectic Admin
Date Deposited: 09 Nov 2022 15:28
Last Modified: 18 Jan 2023 20:41
DOI: 10.17638/03164778
Supervisors:
  • Lim, Eng
  • Xiao, Jimin
  • Goulermas, John
URI: https://livrepository.liverpool.ac.uk/id/eprint/3164778