March 20, 2024

Present

Geoffrey Fox, Gregor von Laszewski, Gregg Barrett, Wes Brewer, Piotr Luszczek, Xavier Coubez, Armstrong Foundjem, Victor Lu, Elie Alhajjar, Sarah Chan, Adarsh Agrawal, Craig Peters,

Apologies

Christine Kirkpatrick, Vijay Janapa Reddi, Tom Gibbs

Tentative Agenda

Any New Members Introduction
News from NVIDIA GTC
Science Foundation Models
Updates on Papers and Projects
Any Other Business

New Members

Sarah Chan https://www.linkedin.com/in/queserasarah/ sarah.chan@sagebase.org PhD in Molecular Biology and 12 years global business development experience in taking early stage life science technologies to market. Moved from Singapore to Seattle in 2023 to work in Sage Bionetworks · https://sagebionetworks.org/ , which aims to encourage deeper more fruitful collaborations to advance bioscience.
Adarsh Agrawal, Stony Brook University CS Masters, August 2023 - December 2024, interested in LLM
Craig Peters https://www.linkedin.com/in/craiglpeters/ craiglpeters@github.com is a product manager at GitHub. He is interested in contributing to reproducible science and works on an open specification for reproducible environments called Dev Containers documented at https://containers.dev
Works well with VSCode environments

GTC Keynote

Piotr described experiences at NVIDIA GTC meeting
CEO Jensen Huang gave a great talk introducing the new Blackwell architecture https://nvidianews.nvidia.com/news/nvidia-blackwell-platform-arrives-to-power-a-new-era-of-computing
Elizabeth Blackwell was first woman in America to receive a medical degree but this chip named after David Harold Blackwell — a mathematician who specialized in game theory and statistics, and the first Black scholar inducted into the National Academy of Sciences
This B architecture replaces previous H Hopper architecture
This architecture has two dies on a single chip -” Packed with 208 billion transistors, Blackwell-architecture GPUs are manufactured using a custom-built 4NP TSMC process with two-reticle limit GPU dies connected by 10 TB/second chip-to-chip link into a single, unified GPU.”
Custom low precision floating point with FP4 for inference
Fifth-Generation NVLink with 1.8TB/s bidirectional throughput per GPU
Grace will link to Blackwell
NIMS: NVIDIA NIM is a set of easy-to-use microservices for accelerating the deployment of foundation models on any cloud or data center and helps keep your data secure.
TSMC link for fabrication

Discussion

Discussion of digital twins with Wes commenting on his team’s model for Frontier
Digital Twin workshop on and using HPC https://sites.google.com/view/sc24digitaltwinsforhpc/home
Wes has Frontier 6 month job log. Liquid cooled on each blade
Frontier restricted now. Summit available
Craig Peters noted Reproducible computing https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023EA003458
Christine Kirkpatrick noted: We started this in EarthCube and it resulted in Notebook Directory | EarthCube AGU's effort picks up where we left off and hopefully brings it into use with a journal(s). We expect/hope to do a similar effort with NIH later this year.
Comment from Wes that reinforcement learning not used in practical scheduling