The fastest and most energy efficient implementation of the breadth-first search algorithm on different single-node parallel architectures according to the Graph500 top
The breadth-first search (BFS) is one of the main graph bypass algorithms and the basis for many algorithms of higher-level graph analysis. Breadth-first search on graphs is a problem with irregular memory access and with irregular data dependency, that significantly complicates its parallelization on all existing architectures. The article will consider the implementation of the breadth-first search algorithm (the main test of Graph500 top) to process large graphs on different architectures: Intel x86, IBM Power8+, Intel KNL and NVidia GPU. The features of the algorithm implementation on shared memory and graph transformations, which allow to achieve record performance and energy efficiency using this algorithm among all single-node systems from Graph500 and GreenGraph500 top will be described in the article.
Parallel processing of graphs, BFS, CUDA, Power8, KNL, Graph500.
A.S. Kolganov. The fastest and most energy efficient implementation of the breadth-first search algorithm on different single-node parallel architectures according to the Graph500 top // Parallel computing technologies - XII international conference, Pavt ' 2018, Rostov-on-don, 2-6 April 2018 Short articles and poster descriptions., Chelyabinsk: Publishing center SUSU, 2018, P. 273-285