Sched 应用程式允许你建立你的日程表,但不能代替你的活动注册。你必须注册 2021年中国 KubeCon + CloudNativeCon + Open Source Summit - 线上峰会 才能参加会议。如果你还没有注册但想加入我们,请到活动注册页面购票注册。

请注意:此日程表自动显示为中国标准时间(UTC +8)。要想看到您选择的时区,请从右侧 「Filter by Date」上方的下拉菜单中选择。日程表可能会有变动。

December 9-10
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon + Open Source Summit China 2021 - Virtual to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in China Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.
Back To Schedule
Friday, December 10 • 14:05 - 14:40
字节跳动中基于异构资源的机器学习训练加速 | ML training acceleration with heterogeneous resources in Bytedance - Deliang Fan & Tao Xin, ByteDance

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
字节跳动中有大量的中央处理器/图形处理器资源支持大量的深度学习模型训练。这些中央处理器/图形处理器资源有多种类型或规格。如何有效地利用这些异构资源是一个关键问题,特别是对于大规模分布式模型。本次分享将讨论如何通过充分利用字节跳动中的异构资源,从系统角度加快模型培训。主要工作包括:1.通过多个图形处理器共享机制充分利用图形处理器资源,增强模型培训能力。2.深入研究非统一内存访问架构关联资源分配(包括中央处理器/内存/图形处理器和 NIC),以获得更好的培训性能。3.集成 RDMA CNI,使用英特尔 SRIOV 技术实现高通量网络通信。

There are vast CPU/GPU resources to support a large number of deep learning model training in ByteDance. These CPU/GPU resources have multiple types or specifications. How to effectively use these heterogeneous resources is a critical issue, especially for large-scale distributed model. This sharing will talk about how to accelerate model training from a system perspective by fully utilizing heterogeneous resources in ByteDance. The main work includes: 1. Empower model training by fully utilizing GPU resources via multiple GPU sharing mechanisms. 2. Deep dive into NUMA affinity resource allocation (including CPU/Mem/GPU and NIC) for better training performance. 3. Integrate RDMA CNI for high throughput networking communication using Intel SRIOV technology.

avatar for Deliang Fan

Deliang Fan

avatar for Tao Xin

Tao Xin

Software Engineer, ByteDance

Friday December 10, 2021 14:05 - 14:40 CST
Kubecon + CloudNativeCon 演讲厅