PCT 04/2018

Parallel computational technologies (PCT) 2018 was held on April 2nd - 6th, 2018 in Russia, Rostov-on-Don, Don State Technical University.

A.S. Kolganov has proposed there a speaker paper:

The optimization of image processing using GPU

The approaches to optimize image processing on GPU on the example of the median filtering algorithm are described in the paper.  A comparison is performed with free CPU image processing library, using AVX2 vector instructions. The reached filtration rate of 100 GPixels/sec on 3×3 square  on Titan Pascal GPU and the specific filtration rate of 10.2 GPixels/sec per 1 TFlops on 3×3 square for single accuracy are highest from known ones in the world at present.

This article is written by A.S. Kolganov.

A.S. Kolganov has proposed there a speaker paper:

The fastest and most energy efficient implementation of the breadth-first search algorithm on different single-node parallel architectures according to the Graph500 top

The breadth-first search (BFS)  is one of the main graph bypass algorithms and the basis for many algorithms of higher-level graph analysis. Breadth-first search on graphs is a problem with irregular memory access and  with irregular data dependency, that significantly complicates its parallelization on all existing architectures. The article will consider the implementation of the breadth-first search algorithm (the main test of Graph500 top) to process large graphs on different architectures: Intel x86, IBM Power8+, Intel KNL and NVidia GPU.  The features of the algorithm implementation on shared memory and graph transformations, which allow to achieve record performance and energy efficiency using this algorithm among all single-node systems  from Graph500 and GreenGraph500 top will be described in the article.

This article is written by A.S. Kolganov.

A.S. Kolganov has proposed there a speaker paper:

Static analysis of private variables in the system of automated parallelization of Fortran programs

The resource of parallelism in the programs is mainly concentrated in loops. To perform a loop in parallel, it is necessary to ensure the absence of data dependences. One type of such dependencies are dependencies on scalar variables, which can be privatized for each iteration of the loop, since they are used only within one iteration. This article analyzes the problem of automatic determination and placement of privatized variables in the loops for their parallel execution in Fortran programs. The algorithm of their determination based on data flow analysis methods and its extension for interprocedural analysis is proposed. The results of  the algorithm testing on some programs from the NASA test package, and on the composite model of multicomponent filtration in the development of oil and gas fields are presented.

The work was funded by RFBR grants Nos. 17-01-00820, 18-01-00851 A.

This article is written by a team of the following authors A.S. Kolganov, N.N. Korolev.

A.S. Kolganov has proposed there a speaker paper:

The transformation of serial C-programs for their parallelization

The parallelization often requires a significant transformation of  program, including at the source code level, and the need for conversion is determined by the capabilities of the selected parallel programming technology. One of the important transformations is the substitution of procedures in C programs, namely, replacing a procedure call by its code with all substituted arguments. This transformation allows to reduce the overhead associated with the procedure call when running the program, and at the stage of static analysis and compilation makes it possible to apply various optimizations, including parallelizing ones, without real interprocedural analysis. The use of this optimization at the source code level will allow the SAPFOR system to determine more effecient schemes of the user program parallelization. The corresponding module for the SAPFOR system was implemented in C++ using the LLVM and Clang infrastructure and tested on tests from the NAS Parallel Benchmarks.

The work was funded by the Russian Foundation for Basic Research, projects 16-07-01067, 17-01-00820 and 18-01-00851.

This article is written by a team of the following authors N.A. Kataev, A.S. Kolganov, YU.G. Zykov.

V.A. Bakhtin has proposed there a speaker paper:

The experience of solving applications with irregular grids using DVM-system

DVM-system is designed to develop parallel programs of scientific and technical calculations in C-DVMH and Fortran-DVMH languages. These languages use a single parallel programming model (DVMH) and are an extension of the standard C and Fortran languages by parallelism specifications in the form of compiler directives. DVMH-model allows to develop effective parallel programs for heterogeneous computing clusters with accelerators. The article describes the experience of using DVM-system for parallelization of applications using irregular grids.

The research was funded by  RFBR (scientific projects № 16-07-01014, 16-07-01067 and 17-01-00820).

This article is written by a team of the following authors V.A. Bakhtin, A.A. Ermichev, V.A. Krukov, N.V. Podderugina, M.N. Pritula, D.A. Zaharov.