Friday, November 1, 2024
10:20 – 11:10 a.m. (CST)
ETB 1020
Dr. Saurabh Kadekodi
Senior Research Scientist in Storage Analytics
Google
Title: “Data-Driven IO Modeling and Optimizations in Cluster Storage Systems”
Abstract
In this talk, I will cover two recent data-driven research projects in cluster storage systems — Thesios and Morph. Thesios is a methodology to accurately synthesize full-resolution representative IO traces, and counterfactual “what-if” traces by carefully combining down-sampled IO traces collected from multiple disks attached to multiple storage servers. Representative traces help inform the design and configuration of storage systems on real-world workloads, and counterfactual traces help assess the impact of anticipated or hypothetical new storage policies or hardware prior to deployment. I will also discuss the usefulness of Thesios for academia in order to obtain real-world traces, and the experience in open-sourcing synthesized traces comprising 2.5 billion IO requests.
Morph is the data-driven redundancy adaptation of files stored in cluster storage systems, over their lifetimes to address changes in data temperature and latency requirements. For newly ingested data, commonly stored via 3-way replication, Morph introduces a hybrid redundancy scheme that combines a replica with an erasure coded (EC) stripe, reducing both ingest IO and capacity overheads while enabling free transcode to EC by deleting replicas. For subsequent transcodes to wider, more space-efficient EC configs, Morph exploits Convertible Codes, which minimize data read for EC transcode, and introduces new block placement policies to maximize their effectiveness. Morph is thus designed to optimize redundancy by taking a file-lifetime view and minimizing IO overheads without affecting performance.
Biography
Saurabh Kadekodi obtained his PhD in the Computer Science Department at Carnegie Mellon University (CMU) in 2020 as part of the Parallel Data Laboratory (PDL) under the guidance of Prof. Gregory Ganger and Prof. Rashmi Vinayak. After graduation, Saurabh joined Google as a Visiting Faculty Researcher and is currently a Senior Research Scientist in the Storage Analytics team. Saurabh is broadly interested in designing distributed systems with special focus on the performance and reliability of storage systems.
To learn more about Dr. Kadekodi, visit his homepage at https://www.cs.cmu.edu/~saukad.