DVM-system of parallel program development
Following specification features of parallel program execution are provided to a programmer in DVM-system:
- distribution of array elements on the processors;
- determining the loops, which iterations can be executed in parallel (the loop iterations are mapped on the processors in accordance with data distribution);
- specification of program sections executed in parallel (parallel tasks) and their mapping on the processors;
- organization of efficient access to remote data (located on other processors);
- organization of efficient execution of reduction operations – global operations on the data located on different processors (such as summing values or finding their maximal and minimal values).
- specification of the regions – special constructions of the language, consisting of sequential parts of code and parallel loops that may be executed on the accelerators.
C-DVMH and Fortran-DVMH compilers convert the input program into a parallel program using standard programming technologies MPI, OpenMP and CUDA.
The program debugging is performed as follows.
First, the program is debugged as a usual sequential program using ordinary debugging tools. Then the program is executed in special mode to check parallelism specifications. It allows to verify their correctness and fullness. At the next step the program may be executed in the mode of comparison of intermediate results of its parallel execution with the reference ones, obtained, for example, as a results of its sequential execution. To debug programs that use graphic processors as accelerators, the mode to compare the results of the region execution on the CPU and the GPU is provided. To debug the program trace accumulation tools can be used also.
Performance analysis tools allow a user to obtain information about main characteristics of the program performance (or its parts).
2 Goals of DVM-system development
- Simplicity of parallel program development.
- High performance of program execution on computers of different architecture.
2.1 C-DVMH and Fortran-DVMH parallel program development
Using these languages, a programmer is freed from necessity to represent his program as a set of interconnecting processes, and defines the behavior of the parallel program in global name space (global address space).
Important advantage of DVM approach is that parallelism specifications (DVMH-directives) are implemented as special comments and are “transparent” for standard compilers. It simplifies new parallel language introduction, because a programmer knows, that his program can be executed without any changes in sequential mode on any computers.
DVM system includes advanced tools for functional debugging and performance debugging of DVMH-programs.
2.2 High program performance on computers of different architecture
There are the following possibilities to increase DVMH-program performance:
- usage of group asynchronous interconnections of the processors (simultaneous execution of several reductions and exchanges for several arrays);
- possibility to overlap group asynchronous interconnections with computations;
- automatic execution reordering of loop iterations for surpassing computations and data passing;
- automatic distribution of executions between all computational devices of a cluster node taking into account their performance;
- automatic data reordering to provide efficient access to GPU memory.
3 The Content of DVM-system
DVM-system includes following software packages:
- Fortran-DVMH compiler
- C-DVMH compiler
- Lib-DVMH runtime support library
- Tools of DVMH-program functional debugging
- Tools of DVMH-program performance debugging
These packages perform the following functions:
- Fortran-DVMH compiler converts parallel program in Fortran program using standard programming technologies MPI, OpenMP and CUDA and expanded by Lib-DVMH function calls.
- C-DVMH compiler converts parallel program in Fortran program using standard programming technologies MPI, OpenMP and CUDA and expanded by Lib-DVMH function calls.
- Lib-DVM library is a run-time support system for execution of DVMH-programs (Fortran DVMH or C-DVMH programs). Lib-DVMH functions use standard communication system MPI and CUDA technology.
- Tools of DVMH-program functional debugging provide the program execution on a workstation in the special mode of checking DVM-directives and also the program execution on a parallel computer in the special mode, when intermediate results of the execution are compared with reference results (for example, the results of sequential execution). Moreover to debug programs that use graphic processors as accelerators, the mode to compare the results of the region execution on CPU and GPU is provided.
- Performance analyzer is a browser of parallel DVMH-program performance characteristics. Performance analyzer gives to a user the performance information as for whole DVMH-program as for its various fragments of specified detail degree.