Accelerating Distributed Machine Learning with an Efficient AllReduce Routing Strategy
We propose an efficient routing strategy for AllReduce transfers, which compromise of the dominant traffic in machine learning-centric datacenters, to achieve fast parameter synchronization in distributed machine learning, improving the average training time by 9%.