December 9-10
The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon + Open Source Summit China 2021 - Virtual to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in China Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.
Thursday, December 9 • 12:10 - 12:45
在阿里巴巴我们是怎样先于用户发现和定位K8s集群问题的 | How We Discover and Locate k8s Cluster Problems Before Users at Alibaba - Peng Nanguang, Alibaba

快速发现和定位问题的能力是快速恢复系统的基石,只有做到先快速发现和定位问题,才能谈如何解决问题,尽量减少用户损失。那么如何在复杂的大规模场景中,做到真正的先于用户发现和定位问题呢? 我会将我们在管理大型K8S集群过程中快速发现和定位问题的一些经验和实践带给大家——我们是如何通过自研通用链路探测+定向巡检工具KubeProbe应对我们遇到的大规模集群的稳定性挑战的。

The ability to quickly find and locate problems is the cornerstone of the fast recovery system. Only by quickly discovering and locating problems first can we talk about how to solve problems and minimize user losses. So how can we find and locate problems before users in complex large-scale scenarios? I will bring some of our experience and practice in quickly discovering and locating problems in the process of managing large-scale K8S clusters-how we solved what we encountered by creating a universal link detection + directional inspection tool KubeProbe To the stability challenge of large-scale clusters. Link detection: Simulate generalized user behavior and detect whether the link and process are abnormal Directional inspection: Check the abnormal indicators of the cluster and find the existing or possible risk points in the future System enhancements: the efficiency and speed of problem discovery, root cause analysis after problem discovery, and Chat-Ops

Nanguang Peng

Software Engineer, Alibaba Cloud
Nanguang Peng is a platform development engineer from Alibaba Cloud, currently focusing on large-scale kubernetes cluster management and stability construction

Thursday December 9, 2021 12:10 - 12:45 CST
Kubecon + CloudNativeCon 演讲厅