TY - BOOK
T1 - Feature Diversity in Neural Networks
T2 - Theory and Algorithms
AU - Laakom, Firas
PY - 2024
Y1 - 2024
N2 - The main strength of neural networks lies in their ability to generalize to unseen data. ‘Why and when do they generalize well?’ are two extremely important questions for a full understanding of this phenomenon and for developing better and more robust models. Several studies have explored these questions from different perspectives and proposed multiple measures/bounds that correlate well with generalization. The dissertation proposes a new perspective by focusing on the ‘feature diversity’ within the hidden layers. From this standpoint, neural networks are seen as a two-stage process, with the first stage being feature (representation) learning through the intermediate layers, followed by the final prediction layer. Empirically, it has been observed that learning a rich and diverse set of features is critical for achieving top performance. Yet, no theoretical justification exists. In this dissertation, we tackle this problem by theoretically analyzing the effect of the features’ diversity on the generalization performance. Specifically, we derive several Rademacher-based rigorous bounds for neural networks in different contexts and we demonstrate that, indeed, having more diverse features correlates well with better generalization performance. Moreover, inspired by these theoretical findings, we propose a new set of data-dependent diversity-inducing regularizers and we present an extensive empirical study confirming that the proposed regularizers enhance the performance of several state-of-the-art neural network models in multiple tasks. Beyond standard neural networks, we also explore different diversity-promoting strategies in different contexts, e.g., Energy-Based Models, autoencoders, and bag-of-features pooling layers and we show that learning diverse features helps consistently.
AB - The main strength of neural networks lies in their ability to generalize to unseen data. ‘Why and when do they generalize well?’ are two extremely important questions for a full understanding of this phenomenon and for developing better and more robust models. Several studies have explored these questions from different perspectives and proposed multiple measures/bounds that correlate well with generalization. The dissertation proposes a new perspective by focusing on the ‘feature diversity’ within the hidden layers. From this standpoint, neural networks are seen as a two-stage process, with the first stage being feature (representation) learning through the intermediate layers, followed by the final prediction layer. Empirically, it has been observed that learning a rich and diverse set of features is critical for achieving top performance. Yet, no theoretical justification exists. In this dissertation, we tackle this problem by theoretically analyzing the effect of the features’ diversity on the generalization performance. Specifically, we derive several Rademacher-based rigorous bounds for neural networks in different contexts and we demonstrate that, indeed, having more diverse features correlates well with better generalization performance. Moreover, inspired by these theoretical findings, we propose a new set of data-dependent diversity-inducing regularizers and we present an extensive empirical study confirming that the proposed regularizers enhance the performance of several state-of-the-art neural network models in multiple tasks. Beyond standard neural networks, we also explore different diversity-promoting strategies in different contexts, e.g., Energy-Based Models, autoencoders, and bag-of-features pooling layers and we show that learning diverse features helps consistently.
M3 - Doctoral thesis
SN - 978-952-03-3354-6
T3 - Tampere University Dissertations - Tampereen yliopiston väitöskirjat
BT - Feature Diversity in Neural Networks
PB - Tampere University
CY - Tampere
ER -