Please note: This schedule is automatically displayed in China Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.
There are vast CPU/GPU resources to support a large number of deep learning model training in ByteDance. These CPU/GPU resources have multiple types or specifications. How to effectively use these heterogeneous resources is a critical issue, especially for large-scale distributed model. This sharing will talk about how to accelerate model training from a system perspective by fully utilizing heterogeneous resources in ByteDance. The main work includes: 1. Empower model training by fully utilizing GPU resources via multiple GPU sharing mechanisms. 2. Deep dive into NUMA affinity resource allocation (including CPU/Mem/GPU and NIC) for better training performance. 3. Integrate RDMA CNI for high throughput networking communication using Intel SRIOV technology.