November 13, 2024

Present

Geoffrey Fox, Gregor von Laszewski, Juri Papay, Gary Mazzaferro, Wes Brewer, Marisa Ahmad, Armstrong Foundjem, Piotr Luszczek, Victor Lu, Andy Cheng, Shirley Moore, Claus Weiland, Gregg Barrett, Satoshi Iwata, Piero Coronica, Azza Ahmad, Steve Farrell, Murali Emani, Sharma Lee, Christine Kirkpatrick, Gavin Mitchell Farrell

Tentative Agenda

Any New Members Introduction
Implications of the merger of HPC and Science working groups
Meeting cadence weekly?
Plan going forward
Choosing Up-to-date Benchmarks
White Papers
Possibly Energy Paper
Benchmark Carpentry https://docs.google.com/document/d/15YIlAWOBA2_xjXkTnAZmaw003Jh4eqURVZYQHhdGYdQ/edit#heading=h.fa0u4qc1plw5
AI Readiness of MLCommons Science https://docs.google.com/document/d/1NbL-VdkrY9jzPxveOys2RCK8TdEJ7O5wgnxjAgzK-rE/edit?usp=sharing
· Any Other Business

New Members

Azza Ahmed, is a Senior HPC Engineer at TU Delft, the Netherlands. She supports computer science departments (INSY and ST) in carrying their computations in our HPC clusters (DAIC https://daic.tudelft.nl/ and DelftBlue https://www.tudelft.nl/en/dhpc/system), and she is involved in the needs mapping/planning for research/education infrastructure at the university
Satoshi Iwata is with Fujitsi in Japan and an HPC working group veteran. https://www.linkedin.com/in/satoshi-iwata-ba412791/
Piero Coronica https://www.mpcdf.mpg.de/person/112609 is with the Max Planck Computing and Data Facility https://www.mpcdf.mpg.de/ (MPCDF, formerly known as RZG), which is a cross-institutional competence centre of the Max Planck Society to support computational and data sciences, in Garching, Germany near to Munich. The new supercomputer Viper of the Max Planck Society is now open for early user operation. This is the first phase of the deployment consisting of 768 CPU-processor nodes (based on 4th Gen AMD EPYC “Genoa” CPUs) and a IBM SpectrumScale storage solution, delivered by Eviden/Atos. In the course of 2024, a second phase with ca. 340 GPU nodes comprising 680 AMD Instinct MI300A APUs will be deployed.
Gavin Michael Farrell :: Dipartimento di Matematica “Tullio Levi-Civita” is a Ph.D. student at Padua University. He has a bioinformatics background with the ELIXIR EU project https://www.linkedin.com/in/gavin-farrell97/?originalSubdomain=uk. His Ph.D. is in biocomputing group.

White Papers

Christine will get out the white paper on energy which has been restored to roughly original form and sent to Gregor
We will spend time, up to 20 minutes, on AI Readiness paper at next meeting which is the day before Thanksgiving
Christine noted relevance of open models and data in AI readiness
We will also spend time, up to 20 minutes, on benchmark Carpentry paper at next meeting
Shirley Moore will add material on benchmark tools
Add a survey of benchmark projects
Authors should add their names and material

Other Discussion

The plan for this meeting was changed at last moment as the plan to merge science and HPC working groups was postponed for consideration of other futures for HPC group. The preparation for the meeting was not properly done and Geoffrey apologized for this.
Juri noted that Summit is closing very soon
Armstrong can get access to UVA computers if necessary
Marisa noted the MLCommons SC Happy hour at Time: 7-9pm, (following the SC ‘24 MLPerf BoF Session from 5:15 to 6:45 pm) and Location: Atlantucky 170 Northside Dr SW Suite 96, Atlanta, GA 30313 | map
Google just ordered 7 small nuclear reactors but 5-10 years to come online (Christine)
Waste “only” has a 300 year half life as low density
Christine noted that some software particularly power intense and one needs to raise understanding

Benchmarks

Geoffrey noted goal of identifying uptodate benchmarks in areas like weather with many recent deep learning innovations. Gary noted NOAA’s initiative https://sab.noaa.gov/wp-content/uploads/4.0-DL4NWP_NOAAResponse_Nov2024.pdf here and Shirley’s comments are spelled separately later.
Gavin noted his experience with bioinformatics benchmarks
The activity in Materials Science Foundation models was noted
DOE and AI Alliance active
Improve benchmarks so easier to analyze with complete ontologies and no free text
Energy analysis should be part of benchmarks
Gregor and WEs noted Mom6 benchmark delayed by HPE but should complete soon

Comments from Gary Mazzaferro

DevOPs DMTF RedFish data center specification
CXL Infiniband controller
https://en.wikipedia.org/wiki/Comparison_of_open-source_configuration_management_software
https://en.wikipedia.org/wiki/Redfish_(specification)
https://redfish.dmtf.org/
https://computeexpresslink.org/cxl-specification/
And LangGraph: https://www.langchain.com/langgraph
CXL is supported by RedFish

Comments from Shirley Moore

Here are links to the weather forecasting benchmarks and survey article on GNNs for the heterogeneous catalysis problem that I mentioned during the November 13 meeting.
Deep Learning Weather Prediction benchmark: https://github.com/amazon-science/dlwp-benchmark/?tab=readme-ov-file
NVIDIA Modulus (not a benchmark suite but includes example codes for weather forecasting and other areas such as CFD, healthcare, molecular dynamics, and additive manufacturing ):
https://developer.nvidia.com/modulus
https://github.com/nvidia/modulus
https://docs.nvidia.com/deeplearning/modulus/modulus-core/api/modulus.models.html#model-zoo
Review article on GNNs for computational heterogeneous catalysis: https://pubs.aip.org/aip/jcp/article/161/17/171001/3318459/Application-of-graph-neural-network-in