News
-
XX all-Russian conference Scientific service on the Internet 2018 was held on Semptember 17th - 22nd, 2018 in Russia, Novorossiysk, Abrau-Durso, a boarding house 'Sailor'.
Presentations
The consistent representation of Fortran programs in the PARFOR compiler at different levels of abstraction
The consistent representation of Fortran programs in the PARFOR compiler at different levels of abstraction
Parallelizing compiler PARFOR as separate toolkit from automated parallelization system SAPFOR is designed to develop parallel programs in Fortran and C languages with implicit parallelism. The aim of the PARFOR compiler development is the automatic transformation of a program source code to run on a computational cluster. Implicit parallel programming assumes the possibility to describe the properties of the source serial program to simplify searching of the program necessary transformations. The parallelization is performed in the DVMH model in Fortran-DVMH and C-DVMH languages correspondingly. The LLVM compiler system is used for the program analysis, and the transformations are based on the representation of programs in the form of an abstract syntax tree (AST). In this paper we describe an approach that allows to consistently consider programs at two levels of abstraction: source code level and low-level representation, using the low-level representation potential in relation to the program analysis and saving a user-friendly view of the displayed information. The paper considers also the possibility of transformed LLVM IR research to increase the accurasy of performed analysis. The main reasons for choosing LLVM as an instrumental base for the development of the compiler PARFOR are explained also in the paper.
The work was funded by RFBR (projects 16-07-01014, 17-01-00820, 18-01-00851) and program of Presidium of RAS No. 26 “The fundamental principles of development of algorithms and software for advanced highly productive computing”.
Presented by N.A. Kataev.
Parallelization of software systems. Problems and prospects
Parallelization of software systems. Problems and prospects
When paralleling real industrial programs of computational nature we have to deal with the following problems, common for such programs: multi-modularity, multivariance, multilanguage. The large amount of program code makes it difficult to perform the necessary parallelization transformations manually and make decisions on the consistent distribution of data and computations. Therefore, it was proposed to implement a partial or incremental parallelization for clusters in the SAPFOR system. In addition, special attention was paid to the automatic determination of the required transformations of a serial program and their automatic execution. However, the problems caused by multi-modularity, multivariance and multilanguage also need to be solved. The article discusses possible ways to overcome these problems.
The work was funded by RFBR (projects № 16-07-01067, 16-07-01014, 16-29-09550, 17-01-00820, 18-01-00851) and programs of the Presidium of RAS No. 26 “The fundamental principles of development of algorithms and software for advanced highly productive computing”.
Presented by A.S. Kolganov.
-
Russian ‒ German Conference: Supercomputing in Scientific and Industrial Problems ‒ 2018 was held on April 23rd - 27th, 2018 in Russia, Svetlogorsk, Hotel "Universal".
Presentations
DVM-system was developed in Keldysh Institute of Applied Mathematics, Russian Academy of Sciences, with the active participation of graduate students and students of Faculty of Computational Mathematics and Cybernetics of Lomonosov Moscow State University. It is designed to create parallel programs of scientific-technical calculations in C-DVMH and Fortran-DVMH languages. These languages use the same model of parallel programming (DVMH-model) and are the extensions of standard C and Fortran languages by parallelism specifications, implemented as compiler directives. The directives are invisible to standard compilers, so a programmer can have one program for sequential and for parallel execution on computers of different architectures.
Presented by V.A. Bakhtin.
-
Parallel computational technologies (PCT) 2018 was held on April 2nd - 6th, 2018 in Russia, Rostov-on-Don, Don State Technical University.
Presentations
The approaches to optimize image processing on GPU on the example of the median filtering algorithm are described in the paper. A comparison is performed with free CPU image processing library, using AVX2 vector instructions. The reached filtration rate of 100 GPixels/sec on 3×3 square on Titan Pascal GPU and the specific filtration rate of 10.2 GPixels/sec per 1 TFlops on 3×3 square for single accuracy are highest from known ones in the world at present.
Presented by A.S. Kolganov.
The fastest and most energy efficient implementation of the breadth-first search algorithm on different single-node parallel architectures according to the Graph500 top
The fastest and most energy efficient implementation of the breadth-first search algorithm on different single-node parallel architectures according to the Graph500 top
The breadth-first search (BFS) is one of the main graph bypass algorithms and the basis for many algorithms of higher-level graph analysis. Breadth-first search on graphs is a problem with irregular memory access and with irregular data dependency, that significantly complicates its parallelization on all existing architectures. The article will consider the implementation of the breadth-first search algorithm (the main test of Graph500 top) to process large graphs on different architectures: Intel x86, IBM Power8+, Intel KNL and NVidia GPU. The features of the algorithm implementation on shared memory and graph transformations, which allow to achieve record performance and energy efficiency using this algorithm among all single-node systems from Graph500 and GreenGraph500 top will be described in the article.
Presented by A.S. Kolganov.
Static analysis of private variables in the system of automated parallelization of Fortran programs
Static analysis of private variables in the system of automated parallelization of Fortran programs
The resource of parallelism in the programs is mainly concentrated in loops. To perform a loop in parallel, it is necessary to ensure the absence of data dependences. One type of such dependencies are dependencies on scalar variables, which can be privatized for each iteration of the loop, since they are used only within one iteration. This article analyzes the problem of automatic determination and placement of privatized variables in the loops for their parallel execution in Fortran programs. The algorithm of their determination based on data flow analysis methods and its extension for interprocedural analysis is proposed. The results of the algorithm testing on some programs from the NASA test package, and on the composite model of multicomponent filtration in the development of oil and gas fields are presented.
The work was funded by RFBR grants Nos. 17-01-00820, 18-01-00851 A.
Presented by A.S. Kolganov.
The transformation of serial C-programs for their parallelization
The transformation of serial C-programs for their parallelization
The parallelization often requires a significant transformation of program, including at the source code level, and the need for conversion is determined by the capabilities of the selected parallel programming technology. One of the important transformations is the substitution of procedures in C programs, namely, replacing a procedure call by its code with all substituted arguments. This transformation allows to reduce the overhead associated with the procedure call when running the program, and at the stage of static analysis and compilation makes it possible to apply various optimizations, including parallelizing ones, without real interprocedural analysis. The use of this optimization at the source code level will allow the SAPFOR system to determine more effecient schemes of the user program parallelization. The corresponding module for the SAPFOR system was implemented in C++ using the LLVM and Clang infrastructure and tested on tests from the NAS Parallel Benchmarks.
The work was funded by the Russian Foundation for Basic Research, projects 16-07-01067, 17-01-00820 and 18-01-00851.
Presented by A.S. Kolganov.
The experience of solving applications with irregular grids using DVM-system
The experience of solving applications with irregular grids using DVM-system
DVM-system is designed to develop parallel programs of scientific and technical calculations in C-DVMH and Fortran-DVMH languages. These languages use a single parallel programming model (DVMH) and are an extension of the standard C and Fortran languages by parallelism specifications in the form of compiler directives. DVMH-model allows to develop effective parallel programs for heterogeneous computing clusters with accelerators. The article describes the experience of using DVM-system for parallelization of applications using irregular grids.
The research was funded by RFBR (scientific projects № 16-07-01014, 16-07-01067 and 17-01-00820).
Presented by V.A. Bakhtin.
-
Russian Supercomputing Days 2017 was held on Semptember 25th - 26th, 2017 in Russia, Moscow, hotel Holiday Inn Moscow - Sokolniki.
Presentations
The experience of DVM-system using for solution of applied tasks
The experience of DVM-system using for solution of applied tasks
DVM-system is intended for development of parallel programs of scientific and technical calculations in C-DVMH and Fortran-DVMH languages. These languages use a single parallel programming model (DVMH-model) and are an extension of the standard C and Fortran languages by parallelism specifications, implemented as compiler directives. DVMH-model allows to create efficient parallel programs for heterogeneous computational clusters, which nodes use as computing devices not only universal multi-core processors but also can use attached accelerators (GPUs or Intel Xeon Phi coprocessors). The article describes the experience of DVM-system using for parallelization of various aplications.
The work was funded by RFBR grants № 16-07-01014, 16-07-01067, 16-37-00266 and 17-01-00820.
Presented by V.A. Bakhtin.
-
XIX all_Russian scientific conference Scientific service on the Internet 2017 was held on Semptember 18th - 23rd, 2017 in Russia, Novorossiysk, Abrau-Durso, a boarding house 'Sailor'.
Presentations
Incremental parallelization for clusters in the SAPFOR system
Incremental parallelization for clusters in the SAPFOR system
The experience of SAPFOR system usage has shown that when parallelizing large programs and program systems for a cluster, it is necessary to parallelize them incrementally, starting from the most time-consuming fragments and incrementally adding new fragments, until we reach the desired level of the parallel program performance. The principles of incremental parallelization of the program systems are considered in the article.
The work was funded by RFBR (projects No. 16-07-01067, 16-07-01014, 17-01-00820).
Presented by N.A. Kataev.
The SAPFOR system is primarily the system of automated parallelization of programs, that involves the organization of interaction with a user to make some decisions on parallelization. The existing version of the system has the tools of visualization of automatically made decisions, and also allows the user to participate in parallelization by setting special instructions in the source code of the program. The capabilities provided by these tools are limited and require from the user either to explicitly specify all the features of the program before its parallelization, or to repeatedly restart all components of the system after each small refinement of the properties. This article discusses a new approach to the organization of interactive interaction with the user in the SAPFOR system, which will allow to provide the user with information about parallelization process and to take into account user’s recommendations during the system work.
The work was was funded by the Russian Foundation for basic research, project 17-01-00820 – a
Presented by N.A. Kataev.
Development of the method of comparative debugging of DVM-programs
Development of the method of comparative debugging of DVM-programs
Parallel program debugging is a time-consuming and non-trivial task. To automate this process, the DVM system provides a comparative debugging mechanism that allows to detect the differences between the intermediate results of the parallel and serial execution, previously saved as files with traces. But when debugging real programs, the size of these files can significantly exceed the capabilities of the file system. For such cases, another way of execution of the DVMH program. Comparative debugging in the DVM system is implemented by tracing the following events when the program is executed: reading and modifying of variables, loop iterations, etc. The intermediate results obtained during parallel execution are compared with the reference ones, which are usually the results of serial execution, previously saved as files with traces. But when debugging real programs, the size of these files can significantly exceed the capabilities of the file system. For such cases, another way of comparative debugging organization is required – simultaneous start of serial and parallel execution of the program and comparison of their intermediate results “on the fly”. This article describes the principles of this debugging mode implementation in a DVM system.
The work was funded by the Russian Foundation for basic research, projects 16-07-01067 and 17-01-00820.