July 10, 2024
July 10, 2024
Present
Geoffrey Fox, Rashadul Kabir,Tues day, Indra Priyadarsini, Victor Lu, Wes Brewer, Armstrong Foundjem, Gregor von Laszewski, Piotr Luszczek, Juri Papay, Sujata Goswami,
Apologies
Christine Kirkpatrick, Gregg Barrett
Tentative Agenda
- Any New Members Introduction
- Status of Papers (delayed as Christine absent)
- Status of Benchmarks
- Science Foundation Models
- Any Other Business
New Members
- Tues Day: I studied political science and communications at George Washington University, and then I spent the last 20+ years working in media, entertainment and hospitality in Los Angeles and NYC. i'm a pianist, audio engineer and independent researcher in AI/ML. I'm a member of several Google Cloud programs and currently in a Google cohort pursuing my professional certification in Cloud Architecture engineering.
- Indra Priyadarsini: https://www.linkedin.com/in/indra-ipd/?originalSubdomain=jp Research Scientist at IBM Research - Tokyo. Works on AI for material science and is active in the AI Alliance
- Rashadul Kabir, https://www.linkedin.com/in/rashadulkabir/ a PhD candidate at Colorado State University, working on workload scheduling on exascale data centers
Foundation Models
- Geoffrey mentioned he was giving a short talk on Foundation models and Time Series Foundation Models and Patterns for Science Time Series July 15 2024 at the IEEE Space Mission Challenges for Information Technology - IEEE Space Computing Conference IEEE SMC-IT/SCC 2024 Jul 15, 2024
Any Other Business
- Gregor discussed OSMIBench and the HPE SmartSim software
- Need to get PyTorch equivalent of Tensorflow Serving Torchserve https://pytorch.org/serve/
- See examples at https://github.com/CrayLabs/SmartSim-Zoo
- This is Andrew Shao’s initial SmartSim application focused on climate simulations: https://github.com/CrayLabs/NCAR_ML_EKE
- The University of Virginia is interested in using Radical Pilot not SmartSim
- Juri had a questions on Wes’s paper on Digital Twins for SC24 covering power and cooling. That was settled offline
- He noted that a major Frontier software upgrade will affect running programs
- Juri noted that flop rate depends on cooling as high frequency makes chips too hot
- Piotr is back from vacation, working remotely at MIT but living still in Tennessee
- He noted that the HPCG benchmark varies from 1% to 10% peak
- This suggests using memory movement as a performance measure