Feature Purification Matters: Suppressing Outlier Propagation for Training-Free Open-Vocabulary Semantic Segmentation



Jin, Shuo, Yu, Siyue, Zhang, Bingfeng, Sun, Mingjie, Dong, Yi ORCID: 0000-0003-3047-7777 and Xiao, Jimin
(2025) Feature Purification Matters: Suppressing Outlier Propagation for Training-Free Open-Vocabulary Semantic Segmentation In: 2025 IEEE/CVF International Conference on Computer Vision (ICCV), 2025-10-19 - 2025-10-25.

[thumbnail of iccv.pdf] Text
iccv.pdf - Author Accepted Manuscript
Available under License Creative Commons Attribution.

Download (2MB) | Preview

Abstract

Training-free open-vocabulary semantic segmentation has advanced with vision-language models like CLIP, which exhibit strong zero-shot abilities. However, CLIP's attention mechanism often wrongly emphasises specific image tokens, namely outliers, which results in irrelevant over-activation. Existing approaches struggle with these outliers that arise in intermediate layers and propagate through the model, ultimately degrading spatial perception. In this paper, we propose a Self-adaptive Feature Purifier framework (SFP) to suppress propagated outliers and enhance semantic representations for open-vocabulary semantic segmentation. Specifically, based on an in-depth analysis of attention responses between image and class tokens, we design a selfadaptive outlier mitigator to detect and mitigate outliers at each layer for propagated feature purification. In addition, we introduce a semantic-aware attention enhancer to augment attention intensity in semantically relevant regions, which strengthens the purified feature to focus on objects. Further, we introduce a hierarchical attention integrator to aggregate multi-layer attention maps to refine spatially coherent feature representations for final segmentation. Our proposed SFP enables robust outlier suppression and object-centric feature representation, leading to a more precise segmentation. Extensive experiments show that our method achieves state-of-the-art performance and surpasses existing methods by an average of 4.6% mIoU on eight segmentation benchmarks. The code is released at: https://github.com/Kimsure/SFP.

Item Type: Conference Item (Unspecified)
Uncontrolled Keywords: 46 Information and Computing Sciences, 4603 Computer Vision and Multimedia Computation
Divisions: Faculty of Science & Engineering
Faculty of Science & Engineering > School of Computer Science & Informatics
Faculty of Science & Engineering > School of Computer Science & Informatics > Artificial Intelligence
Depositing User: Symplectic Admin
Date Deposited: 05 Dec 2025 08:41
Last Modified: 23 May 2026 10:47
DOI: 10.1109/iccv51701.2025.01887
Related Websites:
URI: https://livrepository.liverpool.ac.uk/id/eprint/3195869
Disclaimer: The University of Liverpool is not responsible for content contained on other websites from links within repository metadata. Please contact us if you notice anything that appears incorrect or inappropriate.