Please note: This schedule is automatically displayed in China Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.
In the process of managing a k8s cluster, you may have encountered many etcd management and stability problems. For example, how to manage a large number of etcd clusters through a visualization platform? How to automatically discover potential hazards of etcd clusters, issue alarms in time and even realize self-healing? How to smoothly migrate the k8s etcd to a high-performance etcd cluster with zero downtime? Tencent is a large-scale Internet company and cloud service provider. TKE(Tencent Kubernetes Engine) has rich experience in large-scale k8s cluster management and manages tens of thousands of k8s clusters on Tencent Cloud. TKE has implemented an open source visual etcd management platform kstone, Provide etcd cluster registration and management, inspection, optimization suggestions, backup, migration, data visualization etc. Based on the kstone project, TKE efficiently managed tens of thousands of etcd clusters, which significantly reduced operation and maintenance costs.
Chaofan Wang is a Senior Engineer on TKE team at Tencent Cloud. He works on large-scale Kubernetes and etcd cluster management, and is responsible for the TKE etcd platform.
Cong Tang is a technical expert of the Tencent Cloud TKE team. He is an active etcd contributor and the founder of the open source project (https://github.com/tkestack/kstone). He is responsible for the stability and cost optimization of Tencent Cloud's large k8s cluster and etcd... Read More →
The ability to quickly find and locate problems is the cornerstone of the fast recovery system. Only by quickly discovering and locating problems first can we talk about how to solve problems and minimize user losses. So how can we find and locate problems before users in complex large-scale scenarios? I will bring some of our experience and practice in quickly discovering and locating problems in the process of managing large-scale K8S clusters-how we solved what we encountered by creating a universal link detection + directional inspection tool KubeProbe To the stability challenge of large-scale clusters. Link detection: Simulate generalized user behavior and detect whether the link and process are abnormal Directional inspection: Check the abnormal indicators of the cluster and find the existing or possible risk points in the future System enhancements: the efficiency and speed of problem discovery, root cause analysis after problem discovery, and Chat-Ops
Nanguang Peng is a platform development engineer from Alibaba Cloud, currently focusing on large-scale kubernetes cluster management and stability construction
随着 Windows 容器的成熟,大部分 Windows 应用程序和服务正在迁移到 Kubernetes。即使拥有 Linux 工作负载管理的成功经验,大规模管理 Windows 工作负载也是一项挑战。您知道 Windows 工作负载的暂存空间吗?您是否曾经因过度配置的暂存空间而导致节点崩溃?我们如何避免孤立磁盘?它怎么可能在滚动更新时陷入困境?如何优雅地关闭守护程序?组托管服务帐户 (GMSA) 作为在 Windows 上运行任务和应用程序的更安全的方式,您知道组托管服务帐户是如何集成到 Windows 群集中的吗?你还在纠结于 Kubernetes 上的组托管服务帐户与动态目录的集成吗?
As Windows containers become mature, a large portion of Windows applications and services are moving to Kubernetes. Even with the successful experience of Linux workloads management, it is challenging to manage Windows workloads at scale. Did you know the scratch space for windows workloads? Have you ever had a node crash caused by over-provisioned scratch space? How do we avoid the orphan disks? how could it be prone to get stuck at rolling update? How gracefully shutdown Daemonset? Group Managed Service Accounts (gMSA) as a more secure way to run tasks and applications on windows, do you know how gMSA integrates into the windows clusters? Are you still struggling on gMSA integration with Active Directory on Kubernetes?
She is a software engineer from VMware, currently focuses on K8s Windows related technologies. She once worked in IBM analytics related solutions and now VMware Tanzu Kubernetes Grid windows solutions.