April 6, 2022
April 6, 2022
Present
Tony Hey, Arjun Shankar, Gregor von Laszewski, Juri Papay, Gregg Barrett, Farzana Yasmin Ahmad, Aristeidis Tsaris, Piotr Luszczek, Cade Brown, Geoffrey Fox
Apologies: Jeyan Thiyagalingam
Tentative Agenda
- New member introductions
- Status of Benchmarks.given ISC May 22 deadline
- Status of Paper https://docs.google.com/document/d/1WwcS0gjVoz5Bf0G05xKIgoh2WEBxmNQM8VmkHNP67ag/edit?usp=sharing given ISC May 22 deadline
- AOB
Discussion
- Juri described performance evaluation on a Graphcore system with good libraries and support but small memory. On Summit he is running Cloudmask and STEMDL using Horovod. There are temporary(?) problems with Tensorflow. SciML group will extend the study to look at the Cerebras machine at Edinburgh
- This is different from Science performance but still, it is important to understand how performance depends on system architecture
- It was noted that we don’t have a “PRAM” like model for AI systems
- A glitch with the CloudMask Amazon data source was noted
- Gregor noted some early difficulties with storage performance (shared much worse than dedicated disks) on some Virginia systems.
- He also pointed out the need for a trivial benchmark to verify a new machine setup in a short time. The SciML-Bench MNIST is a possible choice
- Aris described the HPC working group's progress in creating a process for adding new benchmarks and a new BERT-based benchmark. Discussions with MLCommons Power are making progress. Tony hoped for a mixed-precision UTK benchmark
- We discussed who will be at ISC BOF. Jeyan will be there in person
- We re-emphasized the need to finish the paper and benchmarks by early May. We agreed that each organization would think deeply and communicate to the group what will be possible by that time.