Accelerating Distributed Machine Learning with AllReduce Reconfiguration Based on Optical Circuit Switching
We propose to apply optical circuit switching to enable dynamic AllReduce reconfiguration for accelerating distributed machine learning. With simulated annealing-based optimization, theproposed AllReduce reconfiguration approach achieves 31% less average training time than existing solutions.