May 17, 2023

Present

Gregg Barrett, Geoffrey Fox, Juri Papay, Mallikarjun Shankar, Wesley Brewer, Gregor von Laszewski, Piotr Luszczek, Christine Kirkpatrick, Aristeidis Tsaris, Murali Emani,

Tentative Agenda

Any new members
We're live! New ML Benchmarks for Scientific Discovery | MLCommons
Discussion of new Benchmarks (Continued) with talk by Wes Brewer MLCommons-Brewer-05-16-23.pdf
Using Benchmarking Data to Inform Decisions Related to Machine Learning Resource Efficiency (Continued) mlcommons_data_energy_usage_paper
Benchmark Carpentry benchmark-carpentry
AI Readiness of MLCommons Science (Continued) MLCommons Science FAIR Concept Paper
AOB

We’re Live

Our benchmarks were launched by MLCommons New ML Benchmarks for Scientific Discovery | MLCommons and we need to propagate the news
Christine will advertise through SDSC and NCSA
SDSC Innovators newsletter and ask if NCSA will do the same.
And Christine’s RCN on ML and FAIR.
Campus Champions and Henry Neeman are good lists
cc-all@campuschampions.org
hneeman@ou.edu
Arjun suggested hpc-announce at Argonne
send email to hpc-announce@mcs.anl.gov
Juri will advertise in Europe

Towards Developing/Benchmarking Digital Twins of/on HPC

Wes Brewer presented a fascinating talk on building a digital twin of Frontier
Towards AI and Mixed Reality for the Smart Facility and its Digital Twin https://www.researchgate.net/profile/Benjamin-Hernandez/publication/358970366_Towards_AI_and_Mixed_Reality_for_the_Smart_Facility_and_its_Digital_Twin
SC21 Best paper on Summit https://ieeexplore.ieee.org/document/9910121
Wes presented a useful definition of digital twins with 5 levels with the ultimate level 5 as an “autonomous twin”
Apollo 13 was perhaps the first digital twin
HPE and NVIDIA with DOE and TSC Finland are building a virtual prototype of OLCF-6 (OLCF-6 is Frontier's successor. Frontier is OLCF-5 and Summit is OLCF-4), including scheduling, network, coolong, power.
Different components have different timescales
Reinforcement Learning optimization used at level 5
Frontier is 21MW, 37888 GPU’s with blades that have 2 CPU and 8 GPU each
Using Modelica modelling language building on expertise of Fusion team at Oak Ridge with this technology
Using AR/VR with Meta Quest 2 and Microsoft Hololens 3500S and Unreal gaming visualization
Build community DT4HPC around HPC Digital Twins with SC23 BOF
Discussed interoperability of digital twin components
Wes then discussed HPC-AI workflow benchmarks going back to old Darpa HPCS program.
Presented Jha’s execution motifs as a way of classifying different benchmark areas
Discussed SMIBench as component interoperability benchmarking with Open Surrogate Model Inference (OSMI) Benchmark
Gregg suggested Melissa Romanus at LBL would be interested
Tom Gibbs noted :
Ironic that I missed the meeting due to finalizing the session I was chairing for invited talks at ISC on AI for Science.
One of the sessions was on Digital Twins for Science, where we had talks on twins for Fusion Reactors and Climate Models.
Looking over Wes’s talk I’m coming to the conclusion that it might be as challenging to benchmark the twin than build it.
It basically comes down to evaluating a complex workflow where some of the components have latency constraints

Power Paper

mlcommons_data_energy_usage_paper
Christine has advanced this with comments from Gregg and Arjun
Stress value of ontologies
Perhaps a better source of data than Tom’s hardware; https://www.spec.org/benchmarks.html useful

Any Other Business

Summit very expensive to maintain past formal end tim