A failure to learn object shape geometry: Implications for convolutional neural networks as plausible models of biological vision



Heinke, Dietmar, Wachman, Peter, van Zoest, Wieske and Leek, E Charles ORCID: 0000-0001-9258-7504
(2021) A failure to learn object shape geometry: Implications for convolutional neural networks as plausible models of biological vision. VISION RESEARCH, 189. pp. 81-92.

[img] Text
VR-20-214_R1.pdf - Author Accepted Manuscript

Download (2MB) | Preview

Abstract

Here we examine the plausibility of deep convolutional neural networks (CNNs) as a theoretical framework for understanding biological vision in the context of image classification. Recent work on object recognition in human vision has shown that both global, and local, shape information is computed, and integrated, early during perceptual processing. Our goal was to compare the similarity in how object shape information is processed by CNNs and human observers. We tested the hypothesis that, unlike the human system, CNNs do not compute representations of global and local object geometry during image classification. To do so, we trained and tested six CNNs (AlexNet, VGG-11, VGG-16, ResNet-18, ResNet-50, GoogLeNet), and human observers, to discriminate geometrically possible and impossible objects. The ability to complete this task requires computation of a representational structure of shape that encodes both global and local object geometry because the detection of impossibility derives from an incongruity between well-formed local feature conjunctions and their integration into a geometrically well-formed 3D global shape. Unlike human observers, none of the tested CNNs could reliably discriminate between possible and impossible objects. Detailed analyses using gradient-weighted class activation mapping (GradCam) of CNN image feature processing showed that network classification performance was not constrained by object geometry. In contrast, if classification could be made based solely on local feature information in line drawings the CNNs were highly accurate. We argue that these findings reflect fundamental differences between CNNs and human vision in terms of underlying image processing structure. Notably, unlike human vision, CNNs do not compute representations of object geometry. The results challenge the plausibility of CNNs as a framework for understanding image classification in biological vision systems.

Item Type: Article
Uncontrolled Keywords: Visual processing, Convolutional neural networks, Shape processing
Divisions: Faculty of Health and Life Sciences
Faculty of Health and Life Sciences > Institute of Population Health
Depositing User: Symplectic Admin
Date Deposited: 01 Nov 2021 11:35
Last Modified: 18 Jan 2023 21:25
DOI: 10.1016/j.visres.2021.09.004
Related URLs:
URI: https://livrepository.liverpool.ac.uk/id/eprint/3142363