Please note: This schedule is automatically displayed in China Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.
Developing a robot application in the real world is challenging. Developers have to handle heterogeneous sensors and hardware, each with unique SDK, data format and runs in different software environments, making robot system fragile, ready to fail. That is the dark age before ROS. ROS unifies the programming interface and communication mechanism and brings the first light to robot application developers. While we believe cloud native will make the light brighter. With containers and Kubernetes, developers can launch massive resources to do robot simulation in parallel and efficiently manage the whole life cycle of robot application. In this talk, speakers will introduce the "pre cloud native" robot development situation and how cloud native makes robot developers' life easier. The content includes: - Porting robot application to containers - Multiple robot simulations on the cloud with Kubernetes - Deploying and managing the application on a real robot - Robot Fleet Ops, Tele Ops
Zhen Ju works at the Open Source Competence Center of Huawei, and focuses on DevOps, Cloud Native technologies. He is one of the early explorers of container, and translated the first book of docker: The Docker Book. Zhen is now exploring applying cloud native technologies to robot... Read More →
现实世界中的恶意攻击者如何攻击 K8s 群集?如何防止容器逃逸?如何防止黑客绕过普通旧数据安全策略?如何防止黑客侧向移动?本次演讲将回答上述问题。开发人员和集群管理员可以学习如何构建一个安全的、多租户的、大规模的 Kubernetes 集群,并根据本讲座保护集群中的容器和数据。在过去的几年中,这位演讲者在 Blackhat、Hack In the Box、CIS 和 WHT 等会议上分享了黑客如何攻击 Kubernetes 和 Service Mesh 等基础设施。“研究攻击技术”的目的是为了防御,本课题将分享腾讯在多租户 Kubernetes 集群安全建设方面的经验和思路,并使用 read-world 攻击案例展示安全风险并提出解决方案。
How do real-world malicious attackers attack the K8s cluster? How to prevent container-escape? How to prevent hackers from bypassing Pod Security Policy? How to prevent hackers from lateral movement? This talk will answer the questions above. Developers and cluster administrators can learn how to build a secure, multi-tenant, large-scale Kubernetes cluster, and protect the containers and data in the cluster based on this talk. In the past few years, the speaker have shared how hackers attacked infrastructure such as Kubernetes and Service Mesh at conferences such as Blackhat, Hack In The Box, CIS, and WHT and so on. The purpose of "researching attack techniques" is for defend, this topic will share Tencent's experience and thoughts on security construction in multi-tenant Kubernetes clusters, and use read-world attack cases to show security risks and propose solutions.
# NEARGLE - 📒 https://github.com/neargle/ - 📮nearg1e.com@gmail.com1. Security Researcher @Tencent Security Platform Department, Thanks to Tencent Kubernetes Engine Team.2. Published several security research topics about container, Kubernetes and services mesh: * Kubernetes... Read More →
现如今,Kubernetes 在企业中的运用越来越普遍,其各种不完善之处也在逐步得到改善,优雅的节点关闭就是其中之一。Kubernetes 1.19 版开始为优雅的节点关闭提供支持。社区已经做了很多努力来确保实现这一功能。一个长期赞赏 Kubernetes 的初学者,从熟悉社区开发环境开始并参与其中,一步步熟悉 Kubernetes,然后参与开发了一个小功能——基于 Pod 优先级的优雅的节点关闭,本次分享的内容如下:为什么我们需要优雅的节点关闭?参与了基于 Pod 优先级的优雅的节点关闭的开发;逐渐了解 Kubernetes 社区的发展与合作。
At present, the use of Kubernetes in enterprises is becoming more and more popular, and its various imperfections have also been improved little by little, and graceful node shutdown is one of them. Kubernetes 1.19 version started to provide support for graceful node shutdown. The community has made many efforts to ensure this feature. A beginner who has been admiring Kubernetes for a long time starts with familiarity with the community development environment and participates in it, familiarizes with Kubernetes step by step, and then participates in the development of a small feature-graceful node shutdown based on Pod priority The content of this sharing is as follows: Why do we need graceful node shutdown Participated in the development of graceful node shutdown based on Pod priority Gradually understand Kubernetes community development and collaboration
Upgrading Kuberentes across multiple versions is more risky. Many customers choose to use cluster migration (that is, create a new high-version cluster, and then migrate the applications from the low-version cluster to the high-version cluster) instead of upgrading the cluster. However, how to migrate cluster with zero downtime has become a major challenge. This proposal propose a way to solve the problem.
Jing Gu is an engineer on Kubernetes Service team at Alibaba Cloud and is a member of Kubernetes. She primarily works on Kubernetes AIOps and cloud controller manager for Alibaba Cloud.
Bagua is a project developed by Kuaishou Technology and ETH Zürich to support high performance distributed deep learning on Kubernetes without requiring special network devices and restrictive scheduling. Benefiting from Bagua's innovative communication algorithms and integration with Kubernetes, users can scale the training horizontally with excellent speedup guarantee, on a Kubernetes cluster with just ordinary ethernet connection. Bagua's effectiveness has been validated in various scenarios and models, including ResNet on ImageNet, Bert Large, and huge scale industrial applications at Kuaishou such as ● recommendation model training with dozens of TB parameters, ● video/image understanding with >1 billion images/videos, ● ASR with TB level datasets, etc. As for end to end performance, in a production Kubernetes cluster, Bagua can outperform PyTorch-DDP, Horovod and BytePS in the end-to-end training time by a significant margin (up to 1.95×) across a diverse range of tasks.
Xianghong Li currently serves as a senior architect at Kuaishou Technology, focusing on cloud-native machine learning platform based on Kubernetes, and large scale AI system performance acceleration solutions, in order to help algorithm engineers deploy production ready machine learning... Read More →
This talk will be about the importance of tracking dependencies in a large project like Kubernetes and about "depstat" which is a tool created to track dependency updates to the Kubernetes codebase. The Kubernetes repository receives many pull requests each day many of which bring dependency changes with them. Most of the time, the maintainers manually have to spot these changes, determine their effects on the overall dependency tree, and then ping the pull request authors to take action. To avoid this and help better track the dependency updates, depstat was created. depstat is an upstream project which analyzes dependencies for go modules enabled projects. It currently runs as part of a prow job in the Kubernetes code repository and provides four crucial dependency-related metrics. "depstat" also provides the ability to analyze dependencies visually by creating a graph.
Arsh is a Developer Experience Engineer at Okteto. He is a CNCF Ambassador and was awarded the Kubernetes Contributor Award for his contributions in 2021. He has also led the CI Signal Team in the 1.25 Kubernetes Release. Previously, he worked at VMware and was also a contributor... Read More →
Which service mesh should I use and how do I get started? What are the different service meshes, and how do they contrast? Learn about the functionality of different service meshes and visually manipulate mesh configuration. This lightning talk introduces Meshery, an open source, multi-service mesh management plane that provisions (five and counting) different service meshes, their sample applications and benchmarks the performance of service mesh deployments. Meshery facilitates benchmarking various configuration scenarios of Istio, comparison of performance of services (applications) on and off the mesh and across different meshes. It vets mesh and services configuration against deployment best practices. Some of the service mesh projects use Meshery as their performance benchmark tool for each release.
Anita is a Developer Advocate and technical writer. With 3+ years of experience in Web development and DevRel on a global scale. She is passionate about educating the developer market about new tools and technologies. She champions topics around Documentation, Open source, DEI best... Read More →
Vivo 是世界上最大的智能手机公司之一。人工智能实验室的数百名工程师和研究人员在 NLP、CV、推荐、演讲等各个领域工作,带来了各种各样复杂的模型训练和服务案例。人工智能计算平台的建立是为了解决两大挑战:1.为大规模分布式模型培训和服务提供有效的资源调度。2.实现计算资源的高利用率,特别是昂贵的 GPU 设备。今天,该平台有几个生产集群,数千个 GPU 节点和数百个 GPU 节点。每天会部署数百个服务,运行数百个 ML 作业。这一节将讨论如何使用 Kubernetes、kube-batch、kubeflow 和其他开源软件构建平台。它还将涵盖他们遇到的问题,来之不易的最佳实践和他们对开源社区的贡献。
Vivo is one of the biggest smartphone companies in the world. Hundreds of engineers and researchers of AI Lab are working on various areas like NLP, CV, recommendation, speech, etc., which bring various and complicated cases of model training and serving. The AI computing platform is built to address two major challenges: 1. Provide efficient scheduling of resources for massively distributed model training and serving. 2. Achieve high utilization of computing resources, especially expensive GPU devices. Today the platform has several clusters on production, thousands of GPU nodes and hundreds of GPU nodes. Hundreds of services are deployed and hundreds of ML jobs are run every day. This session will cover how the platform is built with Kubernetes, kube-batch, kubeflow, and other OSS. It will also cover the issues they ran into, the hard-earned best practices and the contribution they made to the open-source community.
Ziyang is a staff engineer of vivo AI lab and is leading the engineering effort at vivo AI computing platform. Prior to vivo, Ziyang worked for Rancher and Oracle. He is active in cloud native community and is the contributor of kube-batch、tf-operator etc..
Many big data business are running on Kubernetes cluster. In order to allow big data business running on different Kubernetes' clusters efficiently access each other's data, it needs a novel way to establish high peformance and simple network communitcation between heterogeneous multi Kubernetes' clusters. In the second layer network, we chose the host routing to communicate to ensure network performance. The mainstream CNI supports this function. In the third layer network, we chose the vxlan tunnel technology supported by the mainstream CNI to connect the network. For heterogeneous CNI, in the second layer network, they can communicate directly. In the third layer network, their VNI could be different that the vxlan tunnel cannot be created between the clusters, so at least one CNI can be extended by programming to adapt to another CNI, ensure that the same VNI is used to establish a vxlan tunnel between the two CNIs, we chose antrea as the core CNI to support configurable VNIs.
Vicky Liu, Sr. R&D manager in Networking&Security BU at VMware. She has been working in IT domain for 10+ years and now focuses on Kubernetes networking solutions. She leads team to contribute to Antrea project which was officially announced on 2019 kubecon as an open sourced, light-weight... Read More →
Yang Li currently working at Transwarp, the position is senior software engineer, has been focusing on Cloud Networking for 9 years that has rich experience on the design and development of Iaas and Pass network functions.
随着 Windows 容器的成熟,大部分 Windows 应用程序和服务正在迁移到 Kubernetes。即使拥有 Linux 工作负载管理的成功经验,大规模管理 Windows 工作负载也是一项挑战。您知道 Windows 工作负载的暂存空间吗?您是否曾经因过度配置的暂存空间而导致节点崩溃?我们如何避免孤立磁盘?它怎么可能在滚动更新时陷入困境?如何优雅地关闭守护程序?组托管服务帐户 (GMSA) 作为在 Windows 上运行任务和应用程序的更安全的方式,您知道组托管服务帐户是如何集成到 Windows 群集中的吗?你还在纠结于 Kubernetes 上的组托管服务帐户与动态目录的集成吗?
As Windows containers become mature, a large portion of Windows applications and services are moving to Kubernetes. Even with the successful experience of Linux workloads management, it is challenging to manage Windows workloads at scale. Did you know the scratch space for windows workloads? Have you ever had a node crash caused by over-provisioned scratch space? How do we avoid the orphan disks? how could it be prone to get stuck at rolling update? How gracefully shutdown Daemonset? Group Managed Service Accounts (gMSA) as a more secure way to run tasks and applications on windows, do you know how gMSA integrates into the windows clusters? Are you still struggling on gMSA integration with Active Directory on Kubernetes?
She is a software engineer from VMware, currently focuses on K8s Windows related technologies. She once worked in IBM analytics related solutions and now VMware Tanzu Kubernetes Grid windows solutions.
With the implementation of cloud-native edge computing, more and more edge devices need to collaborate with the cloud. In addition, with the development of various professional chips, hardware acceleration cards, and tinyML technologies, many dedicated devices with low general resources also require edge-cloud synergy. Therefore, lightweight container sandbox technology is required to meet the requirements of low service overhead, fast startup, and service isolation. A unified cross-architecture runtime technology is required to solve the problem of multi-architecture system migration of edge devices, achieving unified application runtime and reducing development and maintenance costs.
姜鹏飞:在华为2012实验室EulerOS团队工作,openEuler CloudNative SIG Maintainer成员,主要聚焦于容器、WebAssembly沙箱、虚拟化等技术Pengfei Jiang works at the EulerOS team from 2012 Laboratories of Huawei, Maintainer of CloudNative SIG in the openEuler... Read More →