Accelerating Distributed Machine Learning with an Efficient AllReduce Routing Strategy

Publication Date: 9/23/2024

Event: Frontiers in Optics 2024, Denver, CO

Reference: pp. 1-2, 2024

Authors: Zilong Ye, California State University; NEC Laboratories America, Inc.; Philip Ji, NEC Laboratories America, Inc.; Giovanni Milione, NEC Laboratories America, Inc.; Ting Wang, NEC Laboratories America, Inc.

Abstract: We propose an efficient routing strategy for AllReduce transfers, which compromise of the dominant traffic in machine learning-centric datacenters, to achieve fast parameter synchronization in distributed machine learning, improving the average training time by 9%.

Publication Link: