Execution performance of NAS NBP 2.3

Results obtained on July, 2001.

The tables, presented below, contain information about sizes and performance of MPI-programs and DVM-programs for NAS tests.

In comparison with sequential program a size of DVM-program is increased on average by 5%, whereas the size of MPI-program is increased on average by 40%. Note, that the size of DVM-program is increased because of inserting special comments independent from array sizes and a number of processors. Additional code of MPI-program is complicated system of managing programs to pass messages, which depend on array sizes and the number of processors.

Performances of DVM-programs and MPI-programs are comparable. However sometimes DVM-program performance is less by 50-60%. It is caused by two reasons. First, DVM-system doesn’t use MPI collective operations, which are performed on some parallel systems more efficiently than their realization via point-to-point communications. Second, MPI-versions of some tests use parallelization along two dimensions of processor grid, whereas DVM-versions of all tests are performed now only on a line of processors. At present the works to eliminate these two reasons are performed.

Table 1. Sizes of NAS NPB 2.3 sources (in lines)
Table 1. Sizes of NAS NPB 2.3 sources (in lines)
Test SEQ MPI DVM MPI/SEQ DVM/SEQ
BT 4059 5744 4146 1.41 1.02
CG 1108 1793 1118 1.62 1.01
EP 641 670 649 1.04 1.01
FT 1500 2352 1605 1.57 1.07
IS 925 1218 1085 1.32 1.17
LU 4189 5497 4269 1.31 1.02
MG 1898 2857 1992 1.50 1.05
SP 3361 5020 3580 1.49 1.06
S 17681 25151 18444 1.42 1.04
SEQ serial code
MPI parallel code in Fortran77 or C (IS) + MPI
DVM parallel code in FORTRAN-DVM or C-DVM (IS)
Performance of MPI-programs and DVM-programs for NAS NPB 2.3
NCI-cluster Pentium III/500+Mayrinet, Windows NT, MPI-FM, Visual C++ 6.0, Digital Fortran 5.0
RCC-cluster Pentium III/500 + SCI, Red Hat Linux release 6.1 (Cartman), ScaMPI, Portland Group C compiler, Portland Group F77 compiler
MVS-1000/16 Pentium III/800 + Fast Ethernet, Red Hat Linux release 7.0 (Guinness), Router, LAM-MPI, GNU C compiler version 2.96, GNU Fortran compiler version 2.96
Table 2. BT test execution times in seconds (class A)
Table 2. BT test execution times in seconds (class A)
NP NCI-cluster(Peking) RCC-cluster(MSU) MVS-1000/16(KIAM)
MPI DVM DVM/MPI MPI DVM DVM/MPI MPI DVM DVM/MPI
1 2548,5
2
4 656,9 716,7 1,09 606,1 712,3 1,17 568,2 571,1 1,00
8 446,3 390,4 0,87 284,7 380,6 1,34 314,8 303,5 0,96
16 271,4 270,8 1,00 220,8 231,2 1,04 208,9
Table 3. CG test execution times in seconds (class A)
Table 3. CG test execution times in seconds (class A)
NP NCI-cluster(Peking) RCC-cluster(MSU) MVS-1000/16(KIAM)
MPI DVM DVM/MPI MPI DVM DVM/MPI MPI DVM DVM/MPI
1 43,7 45,4 1,04 41,4 42,9 1,04 30,6 30,9 1,01
2 22,0 24,9 1,13 28,3 22,8 0,81 16,7 19,4 1,16
4 12,0 14,0 1,17 11,7 13,6 1,16 12,0 13,1 1,09
8 6,4 9,0 1,41 6,3 9,1 1,44 7,3 9,9 1,36
16 5,0 8,9 1,78 5,0 7,0 1,40 8,6
Table 4. EP test execution times in seconds (class A)
Table 4. EP test execution times in seconds (class A)
NP NCI-cluster(Peking) RCC-cluster(MSU) MVS-1000/16(KIAM)
MPI DVM DVM/MPI MPI DVM DVM/MPI MPI DVM DVM/MPI
1 434,3 414,4 0,95 389,3 393,1 1,01 306,7 305,7 0,99
2 217,1  207,3 0,95 179,7 196,7 1,09 153,2 153,0 1,00
4 108,6 103,7 0,95 97,7 98,4 1,01 77,4 77,3 1,00
8 54,3 51,9 0,95 48,9 49,3 1,01 38,7 38,9 1,01
16 28,0 26,9 0,96 24,5 75,0 1,02 21,1
Table 5. FT test execution times in seconds (class A)
Table 5. FT test execution times in seconds (class A)
NP NCI-cluster(Peking) RCC-cluster(MSU) MVS-1000/16(KIAM)
MPI DVM DVM/MPI MPI DVM DVM/MPI MPI DVM DVM/MPI
1 130,2 136,1 1,04
2 88,2 75,8 0,88 58,1
4 47,5 45,9 0,97 42,5 42,6 1,00 33,7 32,9 0,98
8 27,1 24,7 0,91 21,2 26,0 1,23 19,8 19,8 1,00
16 21,2 14,8 0,70 13,3 14,5 1,09 13,5
Table 6. IS test execution times in seconds (class A)
Table 6. IS test execution times in seconds (class A)
NP NCI-cluster(Peking) RCC-cluster(MSU) MVS-1000/16(KIAM)
MPI DVM DVM/MPI MPI DVM DVM/MPI MPI DVM DVM/MPI
1 18,3 19,6 1,07 15,7 19,5 1,24 10,1 13,2 1,31
2 11,7 14,9 1,27 10,7 13,5 1,26 11,9 14,8 1,24
4 7,7 8,6 1,12 5,2 7,2 1,38 8,3 9,0 1,09
8 5,0 4,6 0,92 2,9 3,9 1,34 5,4 5,0 0,92
16 3,8 3,2 0,84 2,3 3,40 1,48 3,3
Table 7. LU test execution times in seconds (class A)
Table 7. LU test execution times in seconds (class A)
NP NCI-cluster(Peking) RCC-cluster(MSU) MVS-1000/16(KIAM)
MPI DVM DVM/MPI MPI DVM DVM/MPI MPI DVM DVM/MPI
1 1796,6 1739,7 0,97 1581,5 1886,0 1,19 1186,2
2 911,9 820,5 0,90 989,5 974,4 0,98 617,5 624,9 1,01
4 452,8 448,9 0,99 361,5 512,3 1,41 323,4 349,6 1,08
8 202,4 248,5 1,23 265,9 265,9 1,60 172,9 198,6 1,15
16 111,3 172,2 1,55 143,2 143,2 1,69 141,4
Table 8. MG test execution times in seconds (class A)
Table 8. MG test execution times in seconds (class A)
NP NCI-cluster(Peking) RCC-cluster(MSU) MVS-1000/16(KIAM)
MPI DVM DVM/MPI MPI DVM DVM/MPI MPI DVM DVM/MPI
1 77,7 71,5 0,92
2 47,9 36,5 0,76 33,0 30,5 0,92
4 20,7 22,2 1,07 22,2 18,8 0,85 18,2 16,1 0,88
8 9,3 13,5 1,45 9,7 10,5 1,08 9,5 9,1 0,96
16 5,9 9,7 1,64 7,0 6,7 0,96 6,5
Table 9. SP test execution times in seconds (class A)
Table 9. SP test execution times in seconds (class A)
NP NCI-cluster(Peking) RCC-cluster(MSU) MVS-1000/16(KIAM)
MPI DVM DVM/MPI MPI DVM DVM/MPI MPI DVM DVM/MPI
1 1681,0 2040,0 1,21 1670,7 2132,2 1,28 1616,5 1534,4 0,95
2
4 435,4 562,4 1,29 435,2 616,6 1,42 472,4 450,3 0,95
8 271,9 309,5 1,14 207,7 311,7 1,50 274,9 258,2 0,94
16 150,2 222,7 1,48 201,6 201,6 1,38 196,8