July 10, 2024

Present

Geoffrey Fox, Rashadul Kabir,Tues day, Indra Priyadarsini, Victor Lu, Wes Brewer, Armstrong Foundjem, Gregor von Laszewski, Piotr Luszczek, Juri Papay, Sujata Goswami,

Apologies

Christine Kirkpatrick, Gregg Barrett

Tentative Agenda

Any New Members Introduction
Status of Papers (delayed as Christine absent)
Status of Benchmarks
Science Foundation Models
Any Other Business

New Members

Tues Day: I studied political science and communications at George Washington University, and then I spent the last 20+ years working in media, entertainment and hospitality in Los Angeles and NYC. i'm a pianist, audio engineer and independent researcher in AI/ML. I'm a member of several Google Cloud programs and currently in a Google cohort pursuing my professional certification in Cloud Architecture engineering.
Indra Priyadarsini: https://www.linkedin.com/in/indra-ipd/?originalSubdomain=jp Research Scientist at IBM Research - Tokyo. Works on AI for material science and is active in the AI Alliance
Rashadul Kabir, https://www.linkedin.com/in/rashadulkabir/ a PhD candidate at Colorado State University, working on workload scheduling on exascale data centers

Foundation Models

Geoffrey mentioned he was giving a short talk on Foundation models and Time Series Foundation Models and Patterns for Science Time Series July 15 2024 at the IEEE Space Mission Challenges for Information Technology - IEEE Space Computing Conference IEEE SMC-IT/SCC 2024 Jul 15, 2024

Any Other Business

Gregor discussed OSMIBench and the HPE SmartSim software
Need to get PyTorch equivalent of Tensorflow Serving Torchserve https://pytorch.org/serve/
See examples at https://github.com/CrayLabs/SmartSim-Zoo
This is Andrew Shao’s initial SmartSim application focused on climate simulations: https://github.com/CrayLabs/NCAR_ML_EKE
The University of Virginia is interested in using Radical Pilot not SmartSim
Juri had a questions on Wes’s paper on Digital Twins for SC24 covering power and cooling. That was settled offline
He noted that a major Frontier software upgrade will affect running programs
Juri noted that flop rate depends on cooling as high frequency makes chips too hot
Piotr is back from vacation, working remotely at MIT but living still in Tennessee
He noted that the HPCG benchmark varies from 1% to 10% peak
This suggests using memory movement as a performance measure