As much of the world’s computing continues to move into the cloud, the over-provisioning of computing resources to ensure the performance isolation of latency-sensitive tasks, such as web search, in modern datacenters is a major contributor to low machine utilization. Being unable to accurately predict performance degradation due to contention for shared resources on multicore systems has led to the heavy handed approach of simply disallowing the co-location of high-priority, latency-sensitive tasks with other tasks. Performing this precise prediction has been a challenging and unsolved problem.
In this paper, we present Bubble-Up, a characterization methodology that enables the accurate prediction of the performance degradation that results from contention for shared resources in the memory subsystem. By using a bubble to apply a tunable amount of “pressure” to the memory subsystem on processors in production datacenters, our methodology can predict the performance interference between co-locate applications with an accuracy within 1% to 2% of the actual performance degradation. Using this methodology to arrive at “sensible” co-locations in Google’s production datacenters with real-world large-scale applications, we can improve the utilization of a 500-machine cluster by 50% to 90% while guaranteeing a high quality of service of latency-sensitive applications.