Neural network distillation is used for transferring the knowledge from a complex teacher network into a lightweight student network, improving in this way the performance of the student network. However, neural distillation does not always lead to consistent results, with several factors affecting the efficiency of the knowledge distillation process. In this paper it is experimentally demonstrated that the selected teacher can indeed have a significant effect on knowledge transfer. To overcome this limitation, we propose a curriculum-based teacher ensemble that allows for performing robust and efficient knowledge distillation. The proposed method is motivated by the way that humans learn through a curriculum, as well as supported by recent findings that hints to the existence of critical learning periods in neural networks. The effectiveness of the proposed approach, compared to various distillation variants, is demonstrated using three image datasets and different network architectures.
|Nimi||European Signal Processing Conference|
|Conference||EUROPEAN SIGNAL PROCESSING CONFERENCE|
|Ajanjakso||1/01/00 → …|