Skip to content

April 6, 2022

April 6, 2022

Present

Tony Hey, Arjun Shankar, Gregor von Laszewski, Juri Papay, Gregg Barrett, Farzana Yasmin Ahmad, Aristeidis Tsaris, Piotr Luszczek, Cade Brown, Geoffrey Fox
Apologies: Jeyan Thiyagalingam

Tentative Agenda

Discussion

  • Juri described performance evaluation on a Graphcore system with good libraries and support but small memory. On Summit he is running Cloudmask and STEMDL using Horovod. There are temporary(?) problems with Tensorflow. SciML group will extend the study to look at the Cerebras machine at Edinburgh
  • This is different from Science performance but still, it is important to understand how performance depends on system architecture
  • It was noted that we don’t have a “PRAM” like model for AI systems
  • A glitch with the CloudMask Amazon data source was noted
  • Gregor noted some early difficulties with storage performance (shared much worse than dedicated disks) on some Virginia systems.
  • He also pointed out the need for a trivial benchmark to verify a new machine setup in a short time. The SciML-Bench MNIST is a possible choice
  • Aris described the HPC working group's progress in creating a process for adding new benchmarks and a new BERT-based benchmark. Discussions with MLCommons Power are making progress. Tony hoped for a mixed-precision UTK benchmark
  • We discussed who will be at ISC BOF. Jeyan will be there in person
  • We re-emphasized the need to finish the paper and benchmarks by early May. We agreed that each organization would think deeply and communicate to the group what will be possible by that time.