Posts

Parallel SGD with Compression

Parallel algorithm is a subject that interests me a lot because it has includes both theoretical aspect and numerical aspect, because on the one hand,  we are interested in designing a parallel algorithm which has fast Spanning time and low Work. On the other hand, the parallel algorithms are more sensitive to other factors such as work balancing, cache efficiency and communication cost.  I am currently reading a paper about Parallel SGD with compression and I would like to briefly discuss this paper a little bit. The paper doesn't have very hard theoretical analysis comparing to other optimization paper, but it is very useful because it deals with the issue that when you are paralleling the SGD you suffer from the communication cost. And it provides a convergence analysis about it. Now I am going to talk about the detail of this paper. In this paper, they present a parallel algorithm which deals with distributed learning under the data parallelism setting, where we assume that dat
Recent posts