Please note: This schedule is automatically displayed in China Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.
The staggered resource usage for different workloads (i.e., online services and offline jobs) in time make the colocation possible, which can effectively improve resource utilization and reduce cost. The talk introduces how to maximize resource utilization without disrupting online services' SLO, by the way of resource prediction, resource isolation, interference detection, offline eviction, etc. What is more, even if users cannot provide online services' latency metrics, we can detect interference by collecting kernel-level metrics through eBPF. All these techniques are built on the native Kubernetes. The colocation supports multiple scenarios, including containerized and non-containerized online services, as well as offline jobs in the Kubernetes and Hadoop ecosystem. In Tencent, it has been deployed over 40,000+ machines with 2000,000+ cores, including services such as advertising and Ceph storage, with an average 15% increase in utilization and hundreds of millions in cost savings.