Accelerating Distributed Machine Learning with an Efficient AllReduce Routing Strategy
Publication Date: 9/23/2024
Event: Frontiers in Optics 2024, Denver, CO
Reference: pp. 1-2, 2024
Authors: Zilong Ye, California State University; NEC Laboratories America, Inc.; Philip Ji, NEC Laboratories America, Inc.; Giovanni Milione, NEC Laboratories America, Inc.; Ting Wang, NEC Laboratories America, Inc.
Abstract: We propose an efficient routing strategy for AllReduce transfers, which compromise of the dominant traffic in machine learning-centric datacenters, to achieve fast parameter synchronization in distributed machine learning, improving the average training time by 9%.
Publication Link: