Friday, September 22, 2023
10:20 a.m. – 11:10 a.m. (CST)
ETB 1020, In-Person Presentation Only
Associate Professor at the Department of Computer Science
Title: “How We Pre-Trained GPT/LLM Models from Scratch on a CPU-Only Cluster: Democratizing the GenAI Ecosystem with Algorithms and Dynamic Sparsity”
- You can now Pre-train and fine-tune GPTs without any GPU.
- AI/LLMS without GPUs is here.
- AI farming on CPU.
The Neural Scaling Law informally states that an increase in model size and data automatically improves AI. However, we have reached a point where growth has tipped, making the cost and energy associated with AI prohibitive. The barrier to entry into AI is enormous and reserved for only a few with access to expensive GPUs. Unfortunately, there is a severe shortage of GPUs, and it is unlikely to improve in the near future. This talk will demonstrate how algorithms and software can eliminate the need for GPUs altogether, allowing us to build (pre-train, fine-tune, and deploy) some of the most sophisticated software using commodity CPUs that are widely available.
This talk will demonstrate the algorithmic progress that can exponentially reduce the compute and memory cost of pre-training, training, fine-tuning, as well as inference with LLMs. Our experiments with OPT models reveal that more than 99% of floating-point operations associated with large neural networks result in zeros. Unfortunately, modern AI software stacks relying on dense matrix multiplications are forced to spend almost all of their cycles and energy computing these zeros. In this talk, we will show how data structures can fundamentally leverage the inherent “dynamic sparsity” efficiently and effectively. In particular, we will argue how randomized hash tables can be used to design an efficient “associative memory” that reduces the number of multiplications associated with the training of neural networks by several orders of magnitude. The implementation of this algorithm, in the form of ThirdAI’s BOLT software, challenges the common knowledge prevailing in the community that specialized processors like GPUs are required for building GPT. We will demonstrate the world’s first GPT-2.5B, a generative model that was entirely pre-trained on standard CPU clusters and can be fine-tuned on a single commodity desktop. We will also show how we can build a CPU-only Retrieval Augmented Generation (RAG) ecosystem that does not require any vector database management and surpasses the accuracies of some of the most sophisticated foundational models with computations running on laptops and desktops.
Anshumali Shrivastava is an associate professor in the computer science department at Rice University. He is also the Founder and CEO of ThirdAI Corp, a company that is democratizing AI to commodity hardware through software innovations. His broad research interests include probabilistic algorithms for resource-frugal deep learning. In 2018, Science news named him one of the Top-10 scientists under 40 to watch. He is a recipient of the National Science Foundation CAREER Award, a Young Investigator Award from the Air Force Office of Scientific Research, a machine learning research award from Amazon, and a Data Science Research Award from Adobe. He has won numerous paper awards, including Best Paper Award at NIPS 2014, MLSys 2022, and Most Reproducible Paper Award at SIGMOD 2019. His work on efficient machine learning technologies on CPUs has been covered by popular press including Wall Street Journal, New York Times, TechCrunch, NDTV, etc.
More on Anshumali Shrivastava: https://www.cs.rice.edu/~as143/
More on CESG Seminars: HERE
Please join on Friday, 9/22/23 at 10:20 a.m. in ETB 1020.