Loading…

线上峰会
12月9-10日
了解更多信息注册参加

Sched 应用程式允许你建立你的日程表,但不能代替你的活动注册。你必须注册 2021年中国 KubeCon + CloudNativeCon + Open Source Summit - 线上峰会 才能参加会议。如果你还没有注册但想加入我们,请到活动注册页面购票注册。

请注意:此日程表自动显示为中国标准时间(UTC +8)。要想看到您选择的时区,请从右侧 「Filter by Date」上方的下拉菜单中选择。日程表可能会有变动。


Virtual
December 9-10
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon + Open Source Summit China 2021 - Virtual to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in China Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.
Back To Schedule
Friday, December 10 • 12:10 - 12:45
DGL Operator:基于 DGL 和 K8s 的分布式图神经网络训练控制器 | DGL Operator: Distributed Graph Neural Network Training with DGL and K8s - Xiaoyu Zhai, Qihoo 360

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
DGL Operator(翟晓宇,奇虎 360)——许多学习任务需要处理包含元素间关系丰富信息的图形数据;工作流程的自动化和训练工作负载的细粒度管理可以使基于 DGL 的分布式图神经网络训练提高资源利用率,动态扩展各种 DGL 组件,降低分布式训练的系统复杂性,并应用机器学习操作。在本演示中,翟晓宇将首先通过一个图神经网络训练示例介绍图神经网络和 DGL 的背景,讨论执行 DGL 分布式训练的本地方式,以及它在生产规模集群中面临的挑战。稍后,翟晓宇将向观众展示 DGL Operator 解决方案的全貌,简要讨论如何使每个 DGL 组件成为容器化工作负载的概念,最后深入探讨 DGL Operator 的实现,包括多个分区选项和未来的设计。

DGL Operator (Xiaoyu Zhai, Qihoo 360) – Many learning tasks require processing graph data that contains rich information about the relationships between elements; the automation of workflow and fine-grained management of training workload can enable DGL-based distributed GNN training to improve resource utilization, dynamic scaling of various DGL components, reduce system complexity of distributed training, and apply MLOps purposes. In this presentation, Xiaoyu Zhai will firstly go through a GNN training example to introduce the background of GNNs and DGL, talk about the native way to execute DGL distributed training, and the challenges in production-scale clusters it faces. Later on, Xiaoyu Zhai will give the audiences a big picture of DGL Operator solution, briefly discuss the abstraction that how to make each DGL component to be a containerized workload, and finally dive into the implementations of DGL Operator, including multiple partitioning options and future design.

Speakers
avatar for Xiaoyu Zhai

Xiaoyu Zhai

Senior Machine Learning Engineer, Qihoo 360
Xiaoyu Zhai is a senior machine learning engineer in Qihoo 360 and a Kubeflow member. He is working on distributed training and optimization about deep learning and machine learning frameworks.



Friday December 10, 2021 12:10 - 12:45 CST
Kubecon + CloudNativeCon 演讲厅