Extreme Computing Group
Resource management plays a major role in the perceived quality of service of datacenter applications. In many scenarios, multiple criteria, such as equipment cost, latency, fault tolerance and bandwidth ought to be considered. Consequently, the design of resource allocation mechanisms offers novel challenges, especially since the underlying algorithms need to run fast, on large instances, and without generating too much churn in the datacenter.
In this talk, we present solutions for two different scenarios. First, we consider the problem of assigning physical servers to datacenter services. Unfortunately, there exists an inherent tradeoff between achieving high fault tolerance for services, and reducing bandwidth usage in network core; spreading servers across fault domains improves fault tolerance, but requires additional bandwidth, while deploying servers together reduces bandwidth usage, but also decreases fault tolerance. We present a detailed analysis of a large-scale Web application and its communication patterns. Based on that, we propose and evaluate a scalable optimization framework that achieves both high fault tolerance and significantly reduces bandwidth usage in the network core by exploiting the skewness in the observed communication patterns.
Second, we describe recent efforts on improving Bing queries’ latencies: such interactive services exhibit highly variable latencies because their processing consists of many sequential stages, parallelization across 10s-1000s of servers and aggregation of responses across the network. To improve the tail latency of these services, one can use few building blocks, such as reissuing laggards elsewhere in the cluster and returning somewhat incomplete results. Combining these building blocks to reduce the overall latency is non-trivial because for the same amount of resource (e.g., number of reissues), different stages improve their latency by different amounts. We present Kwiken – a framework that takes an end-to-end view of latency improvements and costs. Kwiken decomposes the problem of minimizing latency over a general processing DAG into a manageable optimization over individual stages. Our simulations show sizable latency gains with already small percentage of extra resources.
Ishai Menache received his PhD in Electrical Engineering from the Technion, Israel. Subsequently, he was a postdoctoral associate at the Laboratory for Information and Decision Systems in MIT, and a visiting researcher with Microsoft Research New England. He is currently a researcher in the Extreme Computing Group of Microsoft Research, Redmond. Ishai’s current research focuses on developing large-scale resource management and optimization frameworks for datacenters, as well as on the economics of cloud computing. More broadly, his areas of interest include systems and networking, optimization, machine learning and game theory.