A parallel implementation of the COLUMBUS multireference configuration interaction program

Matthias Schüler; Thomas Kovar; Hans Lischka; Ron Shepard; Robert J. Harrison

doi:10.1007/BF01126612

A parallel implementation of the COLUMBUS multireference configuration interaction program

Matthias Schüler, Thomas Kovar, Hans Lischka, Ron Shepard, Robert J. Harrison

Chemistry and Biochemistry

Research output: Contribution to journal › Article › peer-review

37 Scopus citations

Abstract

In this work a parallel implementation of the COLUMBUS MRSDCI program system is presented. A coarse grain parallelization approach using message passing via the portable toolkit TCGMSG is used. The program is very well portable and runs on shared memory machines like the Cray Y-MP, Alliant FX/2800 or Convex C2 and on distributed memory machines like the iPSC/860. Further implementations on a network of workstations and on the Intel Touchstone Delta are in progress. Overall, results are quite satisfactory considering the complexity and the prodigious requirements, especially the I/O bandwidth, of MRCI programs in general. For our largest test case we obtain a speedup of a factor of 7.2 on an eight processor Cray Y-MP for that section of the program (hamiltonian matrix times trial vector product) which has been parallelized. The speedup for one complete diagonalization iteration amounts to 5.9. An absolute speed close to 1 GFLOPS is found. Results for the iPSC/860 show that ordinary disk I/O is certainly not sufficient in order to guarantee a satisfactory performance. As a solution for that problem, the implementation of a fully asynchronous distributed-memory model for certain data files is in preparation.

Original language	English
Pages (from-to)	489-509
Number of pages	21
Journal	Theoretica Chimica Acta
Volume	84
Issue number	6
DOIs	https://doi.org/10.1007/BF01126612
State	Published - Feb 1993

Keywords

COLUMBUS program system
Multireference CI
Parallel computing

Access to Document

10.1007/BF01126612

Cite this

@article{90ef35368d3f4d8b8e158f999e0e45d6,

title = "A parallel implementation of the COLUMBUS multireference configuration interaction program",

abstract = "In this work a parallel implementation of the COLUMBUS MRSDCI program system is presented. A coarse grain parallelization approach using message passing via the portable toolkit TCGMSG is used. The program is very well portable and runs on shared memory machines like the Cray Y-MP, Alliant FX/2800 or Convex C2 and on distributed memory machines like the iPSC/860. Further implementations on a network of workstations and on the Intel Touchstone Delta are in progress. Overall, results are quite satisfactory considering the complexity and the prodigious requirements, especially the I/O bandwidth, of MRCI programs in general. For our largest test case we obtain a speedup of a factor of 7.2 on an eight processor Cray Y-MP for that section of the program (hamiltonian matrix times trial vector product) which has been parallelized. The speedup for one complete diagonalization iteration amounts to 5.9. An absolute speed close to 1 GFLOPS is found. Results for the iPSC/860 show that ordinary disk I/O is certainly not sufficient in order to guarantee a satisfactory performance. As a solution for that problem, the implementation of a fully asynchronous distributed-memory model for certain data files is in preparation.",

keywords = "COLUMBUS program system, Multireference CI, Parallel computing",

author = "Matthias Sch{\"u}ler and Thomas Kovar and Hans Lischka and Ron Shepard and Harrison, {Robert J.}",

year = "1993",

month = feb,

doi = "10.1007/BF01126612",

language = "English",

volume = "84",

pages = "489--509",

journal = "Theoretica Chimica Acta",

issn = "0040-5744",

number = "6",

}

TY - JOUR

T1 - A parallel implementation of the COLUMBUS multireference configuration interaction program

AU - Schüler, Matthias

AU - Kovar, Thomas

AU - Lischka, Hans

AU - Shepard, Ron

AU - Harrison, Robert J.

PY - 1993/2

Y1 - 1993/2

N2 - In this work a parallel implementation of the COLUMBUS MRSDCI program system is presented. A coarse grain parallelization approach using message passing via the portable toolkit TCGMSG is used. The program is very well portable and runs on shared memory machines like the Cray Y-MP, Alliant FX/2800 or Convex C2 and on distributed memory machines like the iPSC/860. Further implementations on a network of workstations and on the Intel Touchstone Delta are in progress. Overall, results are quite satisfactory considering the complexity and the prodigious requirements, especially the I/O bandwidth, of MRCI programs in general. For our largest test case we obtain a speedup of a factor of 7.2 on an eight processor Cray Y-MP for that section of the program (hamiltonian matrix times trial vector product) which has been parallelized. The speedup for one complete diagonalization iteration amounts to 5.9. An absolute speed close to 1 GFLOPS is found. Results for the iPSC/860 show that ordinary disk I/O is certainly not sufficient in order to guarantee a satisfactory performance. As a solution for that problem, the implementation of a fully asynchronous distributed-memory model for certain data files is in preparation.

AB - In this work a parallel implementation of the COLUMBUS MRSDCI program system is presented. A coarse grain parallelization approach using message passing via the portable toolkit TCGMSG is used. The program is very well portable and runs on shared memory machines like the Cray Y-MP, Alliant FX/2800 or Convex C2 and on distributed memory machines like the iPSC/860. Further implementations on a network of workstations and on the Intel Touchstone Delta are in progress. Overall, results are quite satisfactory considering the complexity and the prodigious requirements, especially the I/O bandwidth, of MRCI programs in general. For our largest test case we obtain a speedup of a factor of 7.2 on an eight processor Cray Y-MP for that section of the program (hamiltonian matrix times trial vector product) which has been parallelized. The speedup for one complete diagonalization iteration amounts to 5.9. An absolute speed close to 1 GFLOPS is found. Results for the iPSC/860 show that ordinary disk I/O is certainly not sufficient in order to guarantee a satisfactory performance. As a solution for that problem, the implementation of a fully asynchronous distributed-memory model for certain data files is in preparation.

KW - COLUMBUS program system

KW - Multireference CI

KW - Parallel computing

UR - http://www.scopus.com/inward/record.url?scp=0007086903&partnerID=8YFLogxK

U2 - 10.1007/BF01126612

DO - 10.1007/BF01126612

M3 - Article

AN - SCOPUS:0007086903

SN - 0040-5744

VL - 84

SP - 489

EP - 509

JO - Theoretica Chimica Acta

JF - Theoretica Chimica Acta

IS - 6

ER -

A parallel implementation of the COLUMBUS multireference configuration interaction program

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this