My current research is focused on understanding the benefits of over-parameterization in machine learning. In particular, I study implicit regularization and optimization for over-parameterized deep networks. My main goal is to use this theory for over-parameterized systems to reduce complexity in constructing and training deep networks used in practice.

Over-parameterized models are those which have more than enough parameters to perfectly interpolate training datasets (i.e. get 100% training accuracy in classification, get 0 error in regression, etc.). In classical machine learning, it is believed that over-parameterized models can lead to overfitting of the training dataset and will thus generalize poorly (i.e. perform poorly on test data). However, deep learning models used in practice today are over-parameterized yet generalize well (see Zhang et al. 2017), a phenomenon referred to as implicit regularization. Recent work (Belkin et al. 2019) reconciled this difference between theory and practice by introducing the double descent curve, which demonstrated that over-parameterization (well past the point of achieving zero training error) leads to improved generalization.

One goal of my work is to characterize implicit regularization in deep learning models. As an example, our recent work proves that over-parameterized autoencoders encoders implement associative memory by storing training examples as attractors: iterating an autoencoder on a random input leads to a training example. Additionally, we extend this result to an implementation of associative memory for sequences of examples. Here is a link to a talk on this work: MLTeaTalk 2020.

In the past, I have worked on building interpretable deep networks for early cancer detection. I have also worked on combinatorial problems in causal inference.