Skip to content

June 26, 2024

June 26, 2024

Present

Geoffrey Fox, Juri Papay, Gregor von Laszewski, Gregg Barrett, , Wes Brewer, Victor Lu, Hector Hernandez Corzo, Javier Toledo, Armstrong Foundjem, Tom Gibbs, Ali Hashmi

Apologies

Christine Kirkpatrick

Tentative Agenda

New Member

Status of Benchmarks

  • We started discussing OSMIBench where our work is led by Wes and gregor. There are deployment choices between SmartSim (See December 13, 2023 minutes and https://github.com/CrayLabs/SmartSim), SimAI-Bench from Argonne (presented at PASC PASC24_presentation.pdf) and Cloudmesh Experiment Executor (from Gregor)
  • Difficulties with Tensorflow Serving
  • Deployed with Docker or Singularity
  • Hope to complete before the next meeting
  • Wes suggested bringing up Frontier deployment difficulties at the Users meeting e.g. that each user must install PyTorch
  • Wes noted AI-coupled HPC Workflow Applications, Middleware and Performance paper with Shantenu Jha. He gave a related presentation June 27. OSMI-Bench Brewer.pdf
  • Gregor Wes and Juri discussed how best to package benchmarks
  • Each application needs own directory/environment
  • Customize environment for each benchmark
  • HPE software is customized to their machine
  • Difficult to avoid software version clashes. The DGX A100 workstation used by Javier is easier as more dedicated
  • Systems cant install across heterogeneous targets needing different drivers
  • NVIDIA can help here
  • Juri OLMO on Frontier
  • Vision transformer for weather
  • Juri is running OLMO language model on Frontier
  • And a Vision transformer for weather

Any Other Business