• Skip to main content
  • Skip to primary sidebar

Department of Electrical and Computer Engineering

The Computer Engineering and Systems Group

Texas A&M University College of Engineering
  • Research
  • News
  • People
    • Faculty
    • Joint Faulty
    • Staff
    • Students
  • Academics
    • Graduate Degrees
      • All Courses
    • Undergraduate
  • Seminars
    • CESG Seminars
    • Fishbowl Seminar Series
    • Computer Engineering Eminent Scholar Seminar Series
    • Topics In Systems Seminar
    • Related Seminars
  • Contact

CESG Seminar: Tomer Galanti

Posted on April 17, 2025 by Keshari Rijal

Friday, April 25, 2025
10:20 – 11:10 a.m. (CST) ETB 1020

Tomer Galanti
Assistant Professor, Computer Science & Engineering
Texas A&M University

Title: “SGD and Weight Decay Secretly Compress Your Neural Network”

Abstract:
Several empirical results have shown that replacing weight matrices with low-rank approximations results in only a small drop in accuracy, suggesting that the weight matrices at convergence may be close to low-rank. In this talk, we will explore the origins of the bias in Stochastic Gradient Descent (SGD) that leads to learning low-rank weight matrices when training neural networks. Our findings demonstrate that training neural networks with SGD and weight decay introduces a bias toward rank minimization in the weight matrices. We theoretically and empirically show that the rank of the weight matrices is controlled by the batch sizes, learning rate, and the amount of regularization. Unlike prior work, our analysis does not rely on assumptions about the data, convergence, or optimality of the weight matrices and applies to a broad range of neural network architectures, regardless of width or depth. Finally, we will discuss the connections between our analysis and other related properties, such as implicit regularization, generalization, and compression. This is joint work with Zachary Siegel, Aparna Gupte, and Tomaso Poggio.

Biography
Tomer Galanti is an Assistant Professor in the Department of Computer Science and Engineering at Texas A&M University. His research focuses on the theoretical and algorithmic foundations of deep learning and large language models. Combining theory and experimentation, his work addresses core challenges in deep learning efficiency — including reducing data requirements, designing compressible networks, enabling rapid adaptation to new tasks, accelerating inference, and improving training stability.

Prior to joining Texas A&M, he was a postdoctoral associate at MIT’s Center for Brains, Minds & Machines, where he worked with Tomaso Poggio. He received his Ph.D. in Computer Science from Tel Aviv University, advised by Lior Wolf. In 2021, he also interned as a Research Scientist at Google DeepMind, collaborating with Andras Gyorgy and Marcus Hutter.

Filed Under: Seminars

© 2016–2025 Department of Electrical and Computer Engineering

Texas A&M Engineering Experiment Station Logo