Are Large Kernels Better Teachers than Transformers for ConvNets?



Huang, T, Yin, L, Zhang, Z, Shen, L, Fang, M ORCID: 0000-0001-6745-286X, Pechenizkiy, M, Wang, Z and Liu, S
(2023) Are Large Kernels Better Teachers than Transformers for ConvNets? In: The Fortieth International Conference on Machine Learning, Hawaii Convention Center.

[img] PDF
ICML2023_Large_Kernel_Distillation-camera.pdf - Other

Download (560kB) | Preview

Abstract

This paper reveals a new appeal of the recently emerged large-kernel Convolutional Neural Networks (ConvNets): as the teacher in Knowledge Distillation (KD) for small-kernel ConvNets. While Transformers have led state-of-the-art (SOTA) performance in various fields with ever-larger models and labeled data, small-kernel ConvNets are considered more suitable for resource-limited applications due to the efficient convolution operation and compact weight sharing. KD is widely used to boost the performance of small-kernel ConvNets. However, previous research shows that it is not quite effective to distill knowledge (e.g., global information) from Transformers to small-kernel ConvNets, presumably due to their disparate architectures. We hereby carry out a first-of-its-kind study unveiling that modern large-kernel ConvNets, a compelling competitor to Vision Transformers, are remarkably more effective teachers for small-kernel ConvNets, due to more similar architectures. Our findings are backed up by extensive experiments on both logit-level and feature-level KD “out of the box”, with no dedicated architectural nor training recipe modifications. Notably, we obtain the best-ever pure ConvNet under 30M parameters with 83.1% top-1 accuracy on ImageNet, outperforming current SOTA methods including ConvNeXt V2 and Swin V2. We also find that beneficial characteristics of large-kernel ConvNets, e.g., larger effective receptive fields, can be seamlessly transferred to students through this large-to-small kernel distillation. Code is available at: https://github.com/VITA-Group/SLaK.

Item Type: Conference or Workshop Item (Unspecified)
Divisions: Faculty of Science and Engineering > School of Electrical Engineering, Electronics and Computer Science
Depositing User: Symplectic Admin
Date Deposited: 24 May 2023 08:45
Last Modified: 28 Oct 2023 22:46
URI: https://livrepository.liverpool.ac.uk/id/eprint/3170623