You are here: Home > Events > PhD defense - Yingyi Chen

PhD defense - Yingyi Chen

Deep Learning Models: Duality, Robustness and Generalization Properties

Start: 21/05/2024, 17:00
Location: 04.112 - Auditorium ON5, Herestraat 49, Leuven

Neural networks have achieved remarkable success in various domains such as
computer vision, natural language processing, reinforcement learning, robotics,
autonomous driving, etc. With the increase in computer computing power,
neural networks have evolved over the past decades from simple multi-layer
perceptrons to convolutional neural networks, and till now, large language
models based on Transformers. However, the increasing architecture capacities
of neural networks can also lead to overconfident predictions which might result
in severe consequences in real-world situations. To this end, in addition to
improve their prediction in a clean-data scenario, in this thesis, we consider
neural networks in settings assembling the real-world scenarios with noise resided
in the datasets. To be more exact, we focus on improving the robustness and
generalization abilities of neural networks in different learning tasks.
In this thesis, we study the robustness of convolutional neural networks against
label noise, where a significant portion of training data labels are incorrect.
To address over-fitting to noisy labels, we introduce compression inductive
bias, leveraging classical regularizations such as Dropout and Nested Dropout
to the networks. Additionally, we enhance performance by combining these
constraints with Co-teaching, a classical ensemble method for learning with
noisy labels. Theoretical validation includes a bias-variance decomposition
under compression regularization. Experimental results demonstrate that our
approach performs comparably or even better than state-of-the-art methods on
benchmarks including datasets with real-world label noise.
Given the ubiquitous use of Transformers and their outstanding performance in
various tasks, our focus shifts to exploring the robustness and generalization
of Vision Transformers (ViTs). In particular, we explore leveraging the jigsaw
puzzle solving problem as a self-supervised auxiliary loss of a standard ViT,
named Jigsaw-ViT for enhancing the robustness and generalization of ViTs.
In addition to the standard classification flow during the end-to-end training,
we introduce a jigsaw flow aiming to predict the absolute positions of input
patches by solving a classification problem. Despite its simplicity, Jigsaw-ViT
demonstrates improvements in both generalization and robustness over the
standard ViT, which is usually rather a trade-off. The efficacy of Jigsaw-ViT
is validated across benchmarks, including clean image datasets, learning with
noisy labels, and adversarial examples.
Next, we delve into the core mechanism that brings Transformer success, namely,
the self-attention mechanism. Specifically, we provide a novel perspective to
interpret self-attention with a primal-dual representation based on asymmetric
Kernel Singular Value Decomposition (KSVD), which fills the gap of dismissing
the asymmetry between theory and implementation. In this unsupervised setup,
we propose to remodel self-attention in the primal representation of the duality,
namely, Primal-Attention and to optimize it accordingly. The generalization
and efficiency of our new self-attention mechanism are validated on a series
of benchmarks, including time series, long sequence modelling, reinforcement
learning, image classification and language modelling.
Last, we find that due to large architecture capacities, Transformers can be
prone to suffering from poor robustness where the erratic outputs are over-
confident. To this end, we consider building uncertainty-aware Transformers so
as to make them more suitable for safety-critical tasks which have a high demand
for making rational decision under uncertainty. Specifically, we propose a new
self-attention mechanism for Transformer based on Sparse Variational Gaussian
Processes with kernel-eigen features to obtain better uncertainty quantification,
where the eigenvectors and eigenvalues of the attention matrix are obtained
by Primal-Attention. This leads to our Kernel-Eigen Pair Sparse Variational
Gaussian Processes (KEP-SVGP).

Please fill in this form if you are going to attend:

https://forms.gle/SbpNVgmdoFsiLFsv9

To follow the defense online, please use the following Google Meet link:

https://calendar.app.google/kS9c2JcGcncByYHR6

URL: https://www.kuleuven.be/doctoraatsverdediging/fiches/3E19/3E190094.htm

Organized by: Yingyi Chen