Please note: This schedule is automatically displayed in China Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.
在阿里云数万个 Kubernetes 集群中,DNS 域名解析故障是最常见的问题之一。 DNS 解析故障的现象也千奇百怪,有些是间歇性的,有些是持续性的,有些影响了所有类型的域名查询,有些只影响了小部分。其根因也不尽相同,大部分是容器网络问题,有时候是配置错误。 Yuning Xie 将介绍在 Kubernetes 中 DNS 解析故障的可观测性与根因诊断,本主题将包括以下内容: 1. 介绍 Kubernetes 中常见的 DNS 解析故障场景 2. 介绍 CoreDNS 内置的可观测性插件,例如 log/errors/trace/dump/metrics 3. 如何通过 dnstap 协议诊断 CoreDNS 异常,以替代 tcpdump 等传统高开销的方法 4. 如何基于 BPF 完成客户端侧 DNS 异常的根因诊断
Among tens of thousands of Alibaba Cloud's Kubernetes clusters, DNS lookup failure is one of the most common types of failures. Some failures happen intermittently, some continuously, some break all kinds of DNS lookups, some only influence a very small amount of them. Their root cause varies. Container network failure accounts for most DNS failures, while misconfiguration also contributes a large portion. In this topic, Yuning will introduce methods for observing and diagnosing DNS lookup failures in Kubernetes, especially the painful intermittent and unpredictable ones, and cover: 1. Common scenarios where DNS lookup errors occur 2. CoreDNS's built-in observabilities, with plugins like log/errors/trace/dump/metrics 3. A novel approach to monitor and diagnose CoreDNS's lookup failure by adopting DNSTAP protocol and a context-based analyzer, to replace the highly-cost tcpdump 4. An eBPF-based approach to monitor DNS failures at the client-side, without interfering with DNS servers
Yuning Xie is a software engineer on Container Service for Kubernetes(ACK) team at Alibaba Cloud. He has devoted most of his time to container networks, and all sorts of observabilities around them.