April 6, 2022

Present

Tony Hey, Arjun Shankar, Gregor von Laszewski, Juri Papay, Gregg Barrett, Farzana Yasmin Ahmad, Aristeidis Tsaris, Piotr Luszczek, Cade Brown, Geoffrey Fox
Apologies: Jeyan Thiyagalingam

Tentative Agenda

New member introductions
Status of Benchmarks.given ISC May 22 deadline
Status of Paper https://docs.google.com/document/d/1WwcS0gjVoz5Bf0G05xKIgoh2WEBxmNQM8VmkHNP67ag/edit?usp=sharing given ISC May 22 deadline
AOB

Discussion

Juri described performance evaluation on a Graphcore system with good libraries and support but small memory. On Summit he is running Cloudmask and STEMDL using Horovod. There are temporary(?) problems with Tensorflow. SciML group will extend the study to look at the Cerebras machine at Edinburgh
This is different from Science performance but still, it is important to understand how performance depends on system architecture
It was noted that we don’t have a “PRAM” like model for AI systems
A glitch with the CloudMask Amazon data source was noted
Gregor noted some early difficulties with storage performance (shared much worse than dedicated disks) on some Virginia systems.
He also pointed out the need for a trivial benchmark to verify a new machine setup in a short time. The SciML-Bench MNIST is a possible choice
Aris described the HPC working group's progress in creating a process for adding new benchmarks and a new BERT-based benchmark. Discussions with MLCommons Power are making progress. Tony hoped for a mixed-precision UTK benchmark
We discussed who will be at ISC BOF. Jeyan will be there in person
We re-emphasized the need to finish the paper and benchmarks by early May. We agreed that each organization would think deeply and communicate to the group what will be possible by that time.