March 9, 2022
March 9, 2022
Present
Tony Hey, Jeyan Thiyagalingam, Geoffrey Fox, Juri Papay, Gregg Barrett, Farzana Yasmin Ahmad, Aristeidis Tsaris, Murali Emani, Piotr Luszczek
Apologies: Arjun Shankar, Christine Kirkpatrick, Gregor von Laszewski,
Tentative Agenda
- New member introductions
- Update on possible new benchmarks by Geoffrey
- Further comments on FAIR metadata (not done)
- Continuation of discussion of portability of benchmarks. How much effort is involved in deploying a benchmark?
- Clarification on what each benchmark is going to measure.
- AOB (Performance and Portability)
Update on possible new benchmarks
Geoffrey went through new benchmarks that could be added to our collection. Science WG of MLCommons March 9 2022.
- LLNL Lawrence Livermore
- FastML
- DOE-SBI project (Biophysics, Performance, Nano Engineering)
- RAL Projects
The Livermore surrogate ICF should be available soon.
Current Benchmarks
- We agreed to complete our 4 Benchmarks by May 22, 2022to announce at ISC BOF. ISC2022 is held at Frankfurt May 29 - June 2, 2022 BOF Sessions - Welcome to ISC High Performance 2022. We need to add logging and deposit data and reference models in the MLCommons repository. We agreed to complete by May 2022 and also complete the paper MLCommons Science Benchmarks
- Murali said the ANL benchmark was ready with a science metric.
Performance and Portability
- Juri presented some initial results around moving the STEM-DL benchmark from Horovod-based to Pytorch lightening. The performance of the Pytorch Lightening around a single GPU became a concern (which is far too slower than Horovod-based implementation). Jeyan / Juri proposed to investigate this issue further,
- There was a discussion of various deep learning communication libraries: NCCL, GLOO, MPI, Horovod