TY - GEN
T1 - Curriculum-based Teacher Ensemble for Robust Neural Network Distillation
AU - Panagiotatos, Georgios
AU - Passalis, Nikolaos
AU - Iosifidis, Alexandros
AU - Gabbouj, Moncef
AU - Tefas, Anastasios
N1 - EXT="Iosifidis, Alexandros"
EXT="Tefas, Anastasios"
PY - 2019/9
Y1 - 2019/9
N2 - Neural network distillation is used for transferring the knowledge from a complex teacher network into a lightweight student network, improving in this way the performance of the student network. However, neural distillation does not always lead to consistent results, with several factors affecting the efficiency of the knowledge distillation process. In this paper it is experimentally demonstrated that the selected teacher can indeed have a significant effect on knowledge transfer. To overcome this limitation, we propose a curriculum-based teacher ensemble that allows for performing robust and efficient knowledge distillation. The proposed method is motivated by the way that humans learn through a curriculum, as well as supported by recent findings that hints to the existence of critical learning periods in neural networks. The effectiveness of the proposed approach, compared to various distillation variants, is demonstrated using three image datasets and different network architectures.
AB - Neural network distillation is used for transferring the knowledge from a complex teacher network into a lightweight student network, improving in this way the performance of the student network. However, neural distillation does not always lead to consistent results, with several factors affecting the efficiency of the knowledge distillation process. In this paper it is experimentally demonstrated that the selected teacher can indeed have a significant effect on knowledge transfer. To overcome this limitation, we propose a curriculum-based teacher ensemble that allows for performing robust and efficient knowledge distillation. The proposed method is motivated by the way that humans learn through a curriculum, as well as supported by recent findings that hints to the existence of critical learning periods in neural networks. The effectiveness of the proposed approach, compared to various distillation variants, is demonstrated using three image datasets and different network architectures.
KW - neural network distillation
KW - knowledge transfer
KW - curriculum-based distillation
KW - lightweight deep learning
U2 - 10.23919/EUSIPCO.2019.8903112
DO - 10.23919/EUSIPCO.2019.8903112
M3 - Conference contribution
SN - 978-1-5386-7300-3
T3 - European Signal Processing Conference
BT - 2019 27th European Signal Processing Conference (EUSIPCO)
PB - IEEE
T2 - European Signal Processing Conference
Y2 - 1 January 1900
ER -