December 1, 2021
December 1, 2021
Present
Tony Hey, Geoffrey Fox, Gregor von Laszewski, Bala Desinghu,, Juri Papay, Junqi Yin, Aristeidis Tsaris, Gregg Barrett, Hai Ah Nam, Christine Kirkpatrick, Murali Emani,
Apologies: Mallikarjun Shankar, Jeyan Thiyagalingam
Tentative Agenda
- Any new member introductions (None)
- Debriefing on SC21 BOF https://docs.google.com/presentation/d/1zn2EdBpXc6oNnyS-Ljy-BddRs3osjEkUGrFa98hb1pk/edit?usp=sharing
- Making Benchmark release more solid https://drive.google.com/file/d/1BeefJTj4ZZL4Wa5c3zNz1l5nzQN-ktGR/view?usp=sharing
- Discussion of our paper https://docs.google.com/document/d/1WwcS0gjVoz5Bf0G05xKIgoh2WEBxmNQM8VmkHNP67ag/edit?usp=sharing
- Any other business
SC21 BOF
We noted somewhat disappointing attendance but this partly reflected the way the virtual event was organized. Bala couldn’t join BOF and Gregor noted lack of as much advertising as in previous years.
WebSite
Aristeidis Tsaris noted
https://stagingscience--mlcommons.netlify.app/en/groups/research-science /
this is the link Vs the main site which is: Science Working Group
Geoffrey asked Peter what to do for small changes but no simple immediate solution was available
MLCommons Requirements/Processes
Gregg summarized discussion as:
- Confirm completion of CLA
- We have the science target metrics confirmed for each benchmark?
- We need to confirm the needed logging for each benchmark. We don’t think MLCommons RCP mechanism is needed
Emani writes:
- Update reference implementations to include mlperf-logging library calls. https://github.com/mlcommons/logging
- An example from HPC benchmark suite is at https://github.com/mlcommons/hpc/blob/main/open_catalyst/ocpmodels/trainers/mlperf_forces_trainer.py
- Having a mlperf logging skeleton in place would be a good first step, we can decide on what metrics to capture in our group meetings.
- Once the reference implementations of the benchmarks are ready, we need to move them to the official mlcommons github repo. David Kanter or Peter Mattson can help create a branch for us.
- There are existing scripts for package-checker and compliance checker etc, in the logging repo (https://github.com/mlcommons/logging). We may need to tweak those in a separate branch for the science benchmarks. We did had a separate branch for the HPC v1.0 benchmarks.
- Also, getting the CLA form signed by benchmark contributors (owners) initially and later submitters is important. One CLA form is enough per participating organization. You can get more details on this by emailing support@mlcommons.org and copying David Kanter. As far as I am aware of, ANL, ORNL, LBNL have already signed these.
We agreed to see what we can do in next 2 weeks (i.e. By December 15) and complete by the New Year
Laboratory MLCommons Membership
This is for US and UK laboratories. The fee voting status needs to clarified and Tony later asked David/Peter
The Association (MLCommons) is organized and shall be operated as a non-stock membership operating not for profit under the General Corporation Law of the State of Delaware.
Paper
https://docs.google.com/document/d/1WwcS0gjVoz5Bf0G05xKIgoh2WEBxmNQM8VmkHNP67ag/edit?usp=sharing.with comments in November 3 meeting
Tony suggested submitting to benchcouncil journal. This idea seemed to be received positively
November 3 2021
Present
Geoffrey Fox, Murali Emani, Juri Papay, Junqi Yin, Aristeidis Tsaris, Gregg Barrett, Hai Ah Nam, Mallikarjun Shankar, Feiyi Wang, Christine Kirkpatrick
Apologies: Tony Hey, Jeyan Thiyagalingam
Tentative Agenda
- Any new member introductions (None)
- Updates on initial benchmarks; status of documentation
- Discussion of the SC21 submission process and timescale
- website
- Any other business
Presentation on HPC FAIR Project
- Murali opened the meeting with a presentation on the HPC FAIR project from Stony Brook, Argonne, LLNL, and NCSU. This can be found at the HPCFAIR directory or at MLCommons HPCFAIR paper or HPCFAIR presentation. It will be presented at the SC21 workshop Machine Learning in HPC Environments - Program
- Slides 5 and 6 review other approaches
- Slide 7 states the research challenges
- Slide 9 gives the system architecture
- HPC ontology
- Caching models
- Workflow synthesizer
- Metadata Info
- Front end
- Back end store
- Uses json-ld format, MLCube, and Containers used.
- Slide 12 gives an MNIST example
- Slide 13 describes available API’s
- Slide 14 reviews the capabilities of the system
- The Candle benchmark UNO and workflow used
- Juri will evaluate using it at RAL
- Arjun wondered if the system was restricted to AI applications. Could it be used for simulations?
- Christine noted the possible relevance of FAIR Digital Objects Forum with work on adding metadata without changing dataset files. It will be presented at an RDA FAIURF Birds of a Feather
- Feiyi asked about the infrastructure assumptions
Last Steps on Benchmark Release
- Juri noted that the Cloudmask benchmark was complete and he was looking at running Oak Ridge’s STEMDL.
- Our benchmark approach is that only data and target metrics are "fixed". In MLCommons closed division even hyperparameter optimization is strongly constrained. Last time we suggested both open/closed division for science benchmarks but open division is more important than closed compared to other MLCommons benchmarks.
- There are MLCommons processes that we do not have time to fully address before SC21. For example, if you want to contribute code, you or your org must sign the CLA. Contact support@mlcommons.org for more information. Wikthin DOE this took up to 2 months to get signature.
- From https://mlcommons.org/en/get-involved/
- We agreed to go ahead with SC21 announcements with appropriate caveats. Geoffrey will hopefully check with Peter and David
SC21 Activities and Priorities
- Geoffrey will make a presentation at the SC BoF session: Wednesday, 17 November 2021 5:15 pm - 6:45 pm CST https://sc21.supercomputing.org/presentation/?id=bof118\&sess=sess365
- He will make a draft over the weekend and ask for improvements
- Due to SC21, the November 17 working group meeting is canceled
Any Other Business
- Geoffrey noted that LLNL was working on 2 more benchmarks. Gregg noted OECD workshop on AI and the productivity of science” AT https://www.oecd.org/sti/inno/ai-productivity-of-science.htm
October 20 2021
Present
Geoffrey Fox, Juri Papay, Junqi Yin, Aristeidis Tsaris, Gregg Barrett, Hai Ah Nam, Mallikarjun Shankar, Feiyi Wang, Christine Kirkpatrick
Apologies: Tony Hey, Jeyan Thiyagalingam
Tentative Agenda
- Any new member introductions (None)
- Updates on initial benchmarks; status of documentation
- Discussion of the SC21 submission process and timescale
- website
- Any other business
Updates on initial benchmarks
- We collected the current descriptions of the 4 benchmarks into a single document MLCommonsScienceBenchmarks. This can serve as the basis of a paper to be generated to support the SC21 announcements.
- The paper starting point is at Paper on MLCommonsScienceBenchmarks. Mallikarjun Shankar and Geoffrey Fox will coordinate
- Christine will look up material
Benchmark Release at SC21
- We suggested changing the term “reference implementation” to example or exemplar.
- The “examples” should satisfy our submission requirements. These include
- Open-source Code
- Algorithm description
- Scientific objective
- Optional field for FAIRness
- They should record hardware used and running logs
- We need to build on MLCommons policies mlcommons/policies: General policies for MLPerf™ including submission rules, coding standards, etc. . We need to fork their GitHub and Aris will look into this.
- Web page should explain everything well
- Explain difference between Science and HPC working groups
- We discussed the difference between improving capability in scientific discovery benchmarks which are similar to current MLCommons open division and traditional hardware performance benchmarks which are like the closed division.
- We agreed to call for both and as an example agreed that running example implementations on a nifty new machine like Cerebras was in scope.
- Current MLCommons logging checker won’t work in our open division.
Website update
- We agreed that current “staging” website was in good shape and we asked MLCommons to put on production site; Peter Mattson has agreed and this in process
Any Other Business
- Christine noted that there will be no virtual booth at SC21 for SDSC
October 6 2021
Present
Geoffrey Fox, Juri Papay, Murali Emani, Tony Hey, Junqi Yin, Aristeidis Tsaris, Gregg Barrett, Gregor von Laszewski Hai Ah Nam, Grigori Fursin, Mallikarjun Shankar, Bala Desinghu, Feiyi Wang, Amit Ruhela
Apologies: Jeyan Thiyagalingam (new baby girl today!)
Tentative Agenda
- Any new member introductions
- Updates on initial benchmarks; status of documentation
- Discussion of the submission process and timescale
- website
- Any other business
Welcome to new members
Hai Ah Nam from NERSC introduced herself. With nuclear physics background, she had experience at several DOE laboratories. Currently, she was helping NERSC add ML to its program. Feiyi Wang reintroduced himself and leads the ORNL activities in our area.
Updates on initial benchmarks
- Juri Papay reviewed his collection of benchmark details using STEMDL as an example. This has two separate tasks - Classification and Reconstruction. One should specify a minimal scientific performance (as seen in a reference implementation) and describe science accuracy goals. The raw system performance -- a classic MLPerf measurement -- can also be reported.
- We discussed defining dependencies such as versions of PyTorch or Tensorflow
- This is linked to the ease of running reference implementations on other machines.
- We need to get our logging in agreement with MLCommons and put our benchmarks in their GitHub. Murali will help here
Website update
Juri will coordinate with Aris to update our website which is still in the MLCommons staging area.
Any Other Business
- It was suggested that we investigate/reconcile the different software approaches in SciML-Bench https://github.com/stfc-sciml/sciml-bench versus MLCommons
- We should also delineate better the relationship between Science and HPC working groups. Perhaps we could hold a common meeting after SC21 or New Year.
- Distinguish in SC21 BOF
Action Items
- Murali summarized these as
- Create a MLCommons github repository to add our benchmark reference implementations. These can be added via submodules feature in Git, though, it may be good to create another branch in the original repository to isolate the MLCommons-specific versions.
- An example from the HPC group: https://github.com/mlcommons/hpc/
- Update reference implementations to include mlperf-logging library calls. mlcommons/logging: MLPerf™ logging library
- An example from HPC benchmark suite is at https://github.com/mlcommons/hpc/blob/main/open_catalyst/ocpmodels/trainers/mlperf_forces_trainer.py
- Move the current website with the benchmark suite details to MLCommons domain.
- A roadmap on the submission and review process with associated timelines.
September 22 2021
Present
Geoffrey Fox, Jeyan Thiyagalingam, Christine Kirkpatrick, David Kanter, Juri Papay, Murali Emani, Tony Hey, Junqi Yin, Aristeidis Tsaris, Wahid Bhimji, Gregg Barrett, Gregor von Laszewski
Tentative Agenda
- Any new member introductions
- Updates on initial benchmarks
- Quantification of the submission process and timescale
- website
- Note on possible LLNL benchmarks
- SC21 MLCommons BOF
- September 30 Community meeting https://docs.google.com/spreadsheets/d/1nZI063pDZupkPznCn-UaoY92C_hJTpxVFlm6VjIdpv8/edit#gid=533252977
- Any other business
Welcome to new members
Wahid Bhimji from NERSC who leads the data analytics group at NERSC (Steve Farrell from MLCommons HPC group is part of this unit)
Update on Benchmark Status
- Juri Papay has collected nearly all the information around benchmarks. (description, data pointers) but some details are still to be completed. He will share the document with Geoffrey soon after final information is in.
- David Kanter suggested that Juri Papay join other groups in MLCommons and share common practices.
- Murali (UNO) and Junqi (STEMDL) have provided most relevant descriptions to Juri and will furnish remaining info soon.
- CloudMask benchmark has been integrated into the SciML-Bench suite. STEMDL can also be run and will be integrated into the SciML-Bench soon.
- Aris shared summary https://tsaris.github.io/Science-Benchmarks-Working-Group/#cloudmask-specific-benchmark-targets
- Murali asked about acknowledgments and Tony suggested that we check with the labs to see their take.
Quantification of the submission process and timescale
- Geoffrey asked about running our own benchmarks on different machines as this was an early goal without as much progress as hoped.
- Jeyan and Juri mentioned their plans on running these on RAL systems. Christine mentioned that computer cycles at SDSC are available for these purposes.
- A common format to report results was discussed, but as of now, there is no such common format.
- It was also highlighted that there is a need for instructions on running these benchmarks and all benchmark owners are required to provide a clear set of instructions around this.
Website
We discussed the website including the status of the staging site https://stagingscience--mlcommons.netlify.app/en/groups/research-science/ with respect to the real site https://mlcommons.org/en/groups/research-science/ .
Note on possible LLNL benchmarks
LLNL talked to Geoffrey and will hopefully share additional benchmarks selected from their Open Data Initiative | Data Science Institute and talk Creating Open Data sets and Benchmarks. We discussed choosing benchmarks that illustrated new areas and new methods and having clear science goals.
SC21 MLCommons BOF
- Most of the MLCommons activities at SC21 will be virtual.
- Geoffrey asked Murali about the BOF: The current BOF with HPC Benchmarks is scheduled for November 17th (but may happen anytime between 16-18th). A longer session (75 minutes) has been requested. A purely virtual version has been requested.
- The overall plan of the BOF is to have a set of talks from HPC, Science and Training, with approximately 15 minutes each, and a common discussion at the end.
- Geoffrey mentioned that it is an opportunity for gathering more members and more benchmarks.
- David Kanter mentioned that a joint press release on Science and HPC Benchmarks going forward can be made. The credits to SDSC and STFC ECHO are to be included in the acknowledgment and announcement. WG will work on a draft for this press release. David’s BOF Press Release will have the HPC results as well as our announcement of the availability of our Science benchmarks. We need to prepare material to support this.
- Official announcements about benchmarks can be made in SC, and we can announce the intentions in the community meeting.
- Christine has reached out to Geoffrey and Tony for a lightning talk at the SDSC SC21 virtual exhibit.
MLCommons Community Meeting
- There is a September 30 Community meeting. This time, there will be a guest speaker from IBM Ruchi Puri on AI for writing code with signups have been circulated by email. https://docs.google.com/s52977.
- However since the last meeting, there are not many changes to the contents of the slides. David Kanter mentioned that similar presentations can be expected and there is nothing wrong in reusing them as this provides an opportunity for additional exposure. WG chairs will have access to the slides
- Gregg noted MLCommons Discord server https://discord.gg/Vwx7tSCu
- The July 15 presentation copied and renamed for September 30 can be found at Science part of MLCommons Community Meeting September 30 2021
- The updated slides are incorporated into the full community deck 2021.09.30 MLCommons Community Meeting
Datasets and Storage
- Jeyan will be sharing a write-up of our plans for datasets with the WG for further comments, and the overall plan can be shared in the community meeting. The write-up will cover the overall mirroring aspects, storage notions, downloading mechanisms, and other relevant details.
- Christine presented example documentation of how to use a dataset on the OSN. See page 2 which has command-line instructions. It's a PDF because we gave it out at our AGU booth. https://www.openstoragenetwork.org/wp-content/uploads/2021/05/NCSA_Terra_OSN_flyer.pdf
- The licensing model for the datasets was discussed and David Kanter suggested CC BY (than CC0). This will match other datasets in MLCommons and will simplify a number of things. Christine has also suggested the same CC BY model.
September 8 2021
Present
Geoffrey Fox, Christine Kirkpatrick, Juri Papay, Aristeidis Tsaris, Gregg Barrett, Gregor von Laszewski, Junqi Yin, Murali Emani, Tony Hey, Arjun Shankar
Apologies Jeyan Thiyagalingam,
Tentative Agenda
- Any new member introductions
- Updates on initial benchmarks
- Quantification of the submission process and timescale
- MLCommons 2022 Calendar (not discussed)
- website
- Any other business
Welcome to new members
None
Quantification of the submission process and timescale: Data storage
- We agreed that all data should be stored in at least and possibly more locations with well defined access defined by an example; Globus is a possible access method
- Sites are RAL, SDSC, Oak Ridge
- RAL benchmarks are available at RAL and SDSC
- Christine gave an example command line to a bucket on the SDSC OSN pod: aws s3 ls s3://TFusion/ --endpoint-url https://ncsa.osn.xsede.org/
- Here's an example of a dataset page we could add: https://www.openstoragenetwork.org/global-scale-remote-sensing/
Quantification of the submission process and timescale: Benchmarks
- We agreed that benchmarks should have
- Scientific Objective or measure of success that one is challenged to improve on
- Datasets and how to get from sites such as RAL, SDSC, Oak Ridge
- Reference implementation(s)
- Hardware systems explored so far with performance achieved
- Action Item: Complete above for current benchmarks
Website
- We have progress!
- Aris's draft site: https://tsaris.github.io/Science-Benchmarks-Working-Group/
- Current official site https://mlcommons.org/en/groups/research-science/
- Staging site to public MLCommons site above
- https://stagingscience--mlcommons.netlify.app/en/groups/research-science/
- This is in a Github branch of the mlcommons/website repo here:
- https://github.com/mlcommons/website/tree/stagingscience/
- Action Item: Update discussion on website, of benchmarks as discussed above. Include a summary table
Any Other Business
None
AUGUST 25 2021
Present
Jeyan Thiyagalingam, Geoffrey Fox, Christine Kirkpatrick, Brian Spears (LLNL), Brian Van Essen (LLNL), Juri Papay, Aristeidis Tsaris, Gregg Barrett, Gregor von Laszewski, James Warren, Junqi Yin, Michael Golomon (LLNL), Rushi Anirudh (LLNL), Murali Emani, Bala Designghu,
Apologies Tony Hey, Arjun Shankar
Tentative Agenda
- Any new member introductions
- Presentation from Lawrence Livermore National Laboratory by Brian Spears
- Updates on initial benchmarks
- Continued discussion of the submission process
- website
- Any other business
Welcome to new members
- Brian Spears, Rishi Anirudh, Michael Golomon, Brian Van Essen
Presentation from LLNL Lawrence Livermore Laboratory
Brian Spears led this with contributions from other LLNL members. The important talk can be found at Creating Open Data sets and Benchmarks. It describes the AI for Science activities at LLNL and their open data initiative in collaboration with SDSC. They have 10 existing benchmarks and several others in the pipeline. Discussions highlighted large dataset issue discussed in the previous meeting (July 28). We should bring up with MLCommons - perhaps the research working group.
- Geoffrey noticed the overlaps of SciML and LLNL and stated that contribution from LLNL towards MLCommons will be great.
- Brian Van Essen raised a point about the expectations around benchmarks and datasets. Geoffrey responded with the science discovery aspects of our benchmark goals
- Jeyan raised a point about the license model of the benchmarks from Brian (it is MIT/BSD). There is an export release issue - but mostly creative commons or open license used.
- Brian mentioned that he is keen to jointly explore distributing large datasets.
- Jeyan mentioned that there are lots of overlaps between what LLNL and the UK SciML are trying to do, and stated that it will be collaborate.
- Google Colab benchmark versions were discussed
Updates on initial benchmarks
- University of Virginia: (Geoffrey) Time series/ earthquake: good progress in adding Google TFT temporal fusion transformer
- STFC: (Jeyan & Juri) datasets and the code is already open
- ANL: (Murali) Candle - all in place
- ORNL: Junqi - No further update in terms of release
Website
- Aris is continuing to engage with David and Peter on this issue. Geoffrey will try to accelerate progress.
Any Other Business
None
JULY 28 2021
Present
Geoffrey Fox, Jeyan Thiyagalingam, Aristeidis Tsaris, Arjun Shankar, Gregg Barrett, Junqi Yin, Feiyi Wang, Tony Hey, Christine Kirkpatrick
Apologies Juri Papay,
Tentative Agenda
- Any new member introductions
- Updates on initial benchmarks
- Continued discussion of the submission process
- Repositories; large datasets
- Website
- Any other business
Welcome to new members
- No new members
Updates on initial benchmarks
- STFC: JT mentioned that they have refactored Cloud benchmark from the SciML-Bench suite, and now available as a separate repository. There are questions around licensing and contributing.
- ORNL: Metrics for benchmarks are being considered and working on those aspects (including inference aspects) and parameter search.These are on the web pages already
- GF: The time series benchmark will be ready for the release by SC'21.
- ANL: No one present from ANL.
Submission Process
- GF insisted that benchmarks addressing science discovery metrics can be considered under an open division.
- Discussions around announcing the competition in SC'21 (BOF - see below) outlining the science challenges for each benchmark (with time to solution as one metric).
- Arjun proposed a fact-sheet based competition process (where a fact sheet includes the rules of engagement, metrics, expected outcomes, specimen results, and the scope of the competition.
- Arjun will provide a sample fact sheet.
- The idea is to have target an exercise run by mid October 2021 to prepare for a dry run at the SC21
- The overall objective is to increase the algorithmic (model) tweaking for improved science outcomes
Repositories; large datasets
- There was a lot of discussion around distributing large datasets
- JT, TH along with Christine proposed to adopt the current bucket-based distribution mechanism used by the SciML-Bench.
- Christine agreed to support / provide any mirroring support towards this.
- JT write up a short proposal around this to be discussed within the WG and to be presented at the BoF.
- DOI-based mechanism was proposed by Arjun. In theory, locations can be replicated and changed, but some clarification around this is needed.
Website
- Aris is continuing to engage with David and Peter on this space and this will be resolved fairly soon.
- In the interim, we will use the internal page github.io designed by Aris.
Any Other Business
- Arjun mentioned that Murali and Steve are pushing the BOF efforts at SC (where MLCommons Science will be a part of it). Arjun mentioned that they are focusing on the separation between HPC and Science at the BOF, and comments may be needed around it.
- Christine is trying to put together an experiment (take the system information from GitHub repositories and do feature selection to make data more FAIR). These are mostly at the theoretical level, but will investigate this further. She is looking at several machine learning schema; one that came out of W3C ML Schema Core Specification, Agile Knowledge Engineering and Semantic Web AKSW(MEX), and data mining OntoDM ontology
- Gregg is engaging with the National Institute of Physics to involve them with the WG.
- GF spoke to Livermore Labs and they have some data science projects - and they are interested in getting involved.
JULY 14 2021
Present
Geoffrey Fox, Jeyan Thiyagalingam, Aristeidis Tsaris, Arjun Shankar, Gregg Barrett, Junqi Yin, Feiyi Wang, Tony Hey, Bala Desinghu,
Apologies Juri Papay,
Tentative Agenda
- Any new member introductions
- Updates on initial benchmarks
- Firm up Community Presentation Thursday (see above)
- Continue Web Site and Metrics discussions
- Any other business
Welcome to new members
- No new members
Updates on initial benchmarks and labs
- STFC/RAL: Jeyan provided an update around next release of the SciMLBench suite, new benchmarks, and issues around distributed learning, particularly around Horovod. Junqi offered to help Jeyan. STFC will package the CloudMask code separately.
- ORNL: Working with Juri on STEMDL benchmark.
- ANL: no updates
- Indiana: Geoffrey mentioned that their benchmark is containerized. TEvolOp uses either an LSTM model or a modification of the original Transformer model. He prefers the latest temporal fusion Transformer model from Google. There is a potential for scalability, particularly with transformer models where the memory requirements can be quite demanding. It can meet the deadline of being ready to release at SC21.
Community Meeting
There is an MLCommons Community meeting Thursday, July 15 -- please update the science WG presentation https://docs.google.com/presentation/d/1IgRZeY-Re9cXD6LauVHvlzmkYa9HFV3F7V1pOXpA5RA/edit?usp=sharing where I added placeholders for Web Site and Science metrics to our April 22 presentation
For background note that the full April 22 community meeting presentation is 2021.04.22 MLCommons Community Meeting and the Science part is Science part of MLCommons Community Meeting April 22 2021
- Geoffrey has agreed to present at the community meeting.
- STFC has few updates to make, but Jeyan has added some changes,
- Arjun agreed to update things around STEMDL benchmarks
- Tony raised an issue about metadata and ontology of the WG. Geoffrey mentioned that the benchmark results need to have ontologies.
Continuation of Website Discussion from June 30 meeting
Arjun discussed their discussions with David Kanter on how Science WG is different to HPC. David Kanter promised to come back to us about an official website where we can host this. For now, we will use the current github.io page. Arjun encouraged all to contribute to this. Potentially this can also be at the MLCommons Website. Gregg suggested that we keep this at MLCommons. Aris will send an email to David Kanter about this.
Metrics and Benchmarks
This topic is added to tomorrow’s presentation and discussed below
Any Other Business
- Tony raised an issue about when the benchmarks will be ready for competition.
- Geoffrey mentioned that we need to draw up a plan to make submissions ready with steps we need to take
- We can potentially announce the competition at SC'21 and Tony agreed that this is the sensible way forward.
- Arjun suggested having a BOF at SC'21 for the Science WG to announce the benchmarks. Geoffrey suggested being in touch with Murali to get this arranged. Aris will act on this.
- Arjun stressed that science targets to be defined asap for each benchmark (such as minimum target accuracy, time to solution, etc).
- Arjun suggested that we bring this up in the next meeting to discuss this more solidly.
- Arjun: We also need to make sure that there are Github pages to all benchmarks and each group to try each other's benchmarks.
- Arjun: Re-emphasized the potential significance of TEvolOp being mapped to the transformer model.
Action Items
- All: Get updates on the community meeting presentation
- Geoffrey will update the website https://tsaris.github.io/Science-Benchmarks-Working-Group/ with Earthquake benchmark
- Geoffrey: Send info on SC21 BOF
- Jeyan: Will contact Junqi about Horovod
- Jeyan: Will set up a separate repo for Cloud with instructions for running.
- Aris will contact Kanter about the website.
JUNE 30 2021
Present
Aristeidis Tsaris, Geoffrey Fox, Juri Papay, Gregg Barrett, Murali Emani, Gregor von Laszewski, Arjun Shankar, David Kanter, Junqi Yin, Amit Ruhela,
Apologies Christine Kirkpatrick, Tony Hey, Jeyan Thiyagalingam,
Tentative Agenda
- Any new member introductions
- Updates on initial benchmarks
- General discussion of submission process including (NOT directly DISCUSSED)
- Summary/Scope of every benchmark
- The main github.io page
- Github link for each and every benchmark
- A clear description of how to run the benchmark with MLCube
- A clear description of how to run the benchmark with SciML-Bench
- We also need to collect all datasets in one place. We can mirror this in UCSD
- Discussion of metrics specific to science discovery
- Any other business
Welcome to new members
- No new members
Updates
- Juri Papay presented an update on the RAL work including steps needed for IBM hardware like Summit. MLCommons_30_06_2021.pptx with a module whereas containers are used elsewhere
- Using ThetaGPU at Argonne will be added
- He noted that Tony Hey had just presented at ISC 2nd International Workshop on Machine Learning Hardware on RAL benchmarking ML Benchmarks for Scientific Applications
- Gregor von Laszewski (IU) stressed the many different containers needed across different hardware and applications
- Murali is looking at SciML and has an allocation on ThetaGPU that we can use
- The IU thesis following discusses data engineering for UNO -- Vibhatha Lakmal Abeykoon, "TOWARDS DATA ANALYTICS-AWARE HIGH PERFORMANCE DATA ENGINEERING AND BENCHMARKING", Indiana University Ph.D. Thesis Dissertation Defense, June 2, 2021 -- UNO could be a useful “end-to-end” benchmark
- Oak Ridge (Arjun, Aris) discussed their work on an initial landing page https://tsaris.github.io/Science-Benchmarks-Working-Group/ which was followed by a lively discussion.
- We need to add materials on all benchmarks including that on science outcomes
- We proposed a three-level Hierarchy: Google Drive (internal working documents) -- The new website (for the general user but not stable and official for MLCommona) - MLCommons website
- What metadata should we record?
- Please review the diagram below from Oak Ridge
![][image2]
Science and Machine Performance Benchmarks
- We continued discussion of Science benchmarks stressing a science benchmark (as well as or replacing) a performance benchmark.
- Compare use in procurement or science discovery
- David Kanter noted similar points in the medical working group where a given model can perform differently in different population demographics.
- Our approach with important datasets is synergistic with MLCommons stress on datasets
- For science metrics such as accuracy, the reference model is a baseline to be beaten.
- We would allow data augmentation to get better science
Any Other Business
- MLCommons (HPC working group) is preparing an SC21 Birds of a Feather. Murali and Geoffrey are involved and it should discuss the Science working group
- We discussed work needed on MLCube and Metadata
- Geoffrey gave an ISC talk where René Jäkel from Dresden TU described their data-intensive HPC center. Could be a good contact for the HPC group; we could usefully add more from Europe.
Action Items
- Geoffrey/Gregg: Arrange for Juri presentation at Best practices WG on Friday, July 2 as Juri cannot present on planned July 9.
- Check website strategy with Peter and David
- Gregor communicates MLCommons website issues to David
- Circulate Science metrics idea to other chairs to get feedback -- perhaps based on ImageNet competitions. Geoffrey brought this up in July 1 benchmark chairs meeting without strong comment.
- Everybody review web page and diagram
JUNE 16 2021
Present
Tony Hey, Aristeidis Tsaris, Geoffrey Fox, Jeyan Thiyagalingam, Juri Papay, Christine Kirkpatrick, Gregg Barrett, Arjun Shankar, Feiyi Wang
Tentative Agenda
- Any new member introductions
- Updates on initial benchmarks
- Comments on Time Scale for HPC https://docs.google.com/presentation/d/1h9lc31fhIN3mM20WNRlo_yxwWFiqn75iUbm1Z33jYTg/edit#slide=id.gdfefe66d88_0_90
- General discussion of submission process
- Any other business
Welcome to new members
- No new members
Updates
- STFC Update: Juri is working Cloud benchmark will be tried on ORNL,
- Juri presented some sample results on Cloud benchmarking with 2.4TB on PEARL.
- Juri is experiencing some difficulties around building the containers for Summit.
- Jeyan is working on SciML framework update. New upcoming features are:
- Support for containers,
- Single configuration file,
- Inference/training workload,
- Additional benchmarks and
- ORNL Update: Aries is trying STEMDL on SciMLBench (Jeyan updated them to wait for the new API)
- Indiana Update: Geoffrey Fox is packaging into a container in to a four-node DGX A100 workstation
- ANL Update: None
Comments on Time Scale for HPC
- Link is here
- Geoffrey Fox covered about the time-scale, and practices from the HPC WG, covering details before issuing a benchmark
- Benchmarks need to be frozen for four months before submission.
- We may not need an advisory board
- RCPs or Reference Convergence Points (for instance epochs should not change with HW).
- Tony asked which ones to submit? and associated time-scales,
- Jeyan asked about RCPs - and Aries mentioned that it is only for HPC. Arjun stressed that our objective is to foster scientific advancements in ML, and as such, the actual evaluation scope is different. The guideline should say that: "given a dataset and a problem the scope or RCP is actually inviting the community to make the scientific results better using whatever the technique possible"
- Gregg and Geoffrey mentioned that MLCommons's focus is on datasets covering, data engineering, cleaning, pre-processing etc. As such, this is a good avenue for us to explore.
- Christine raised a view on the validity of comparisons.
- Potential metric could be around [Accuracy or Domain-Specific Measure] against Time for a given dataset.
General discussion of submission process
- Jeyan mentioned whether it is essential to bring all code to a common place (GitHub). Arjun suggested they put up a page around github.io and link all codes. ORNL will act on this.
- Christine brought up a point on the datasets - mainly license. Ideally the relevant datasets can live in the main page.
- All will contribute towards writing up the scope/rationale around the benchmarking exercise (within next two weeks - i.e. before the end of June 2021).
- Standard reference in terms of submission process is to be explored.
- An example GitHub is at http://github.com/mlcommons/hpc
Any Other Business
- Christine will get back to the WG about potential schemas we can use for our submission.
Action Items
- Geoffrey/Gregg: Arrange for Juri presentation at Best practices WG on some Friday at noon
- ORNL: Create a github.io page for holding the links for the benchmarks of the WG.
- Benchmark contributors (STFC, ANL, and ORNL): Write the scope/challenge of their benchmarks.
- Christine: To report back to the WG on the potential schema we can use for the data/metadata.
MAY 19 2021
Present
Tony Hey, Aristeidis Tsaris, Grigori Fursin, Geoffrey Fox, Jeyan Thiyagalingam, Juri Papay, Christine Kirkpatrick, Junqi Yin, Gregg Barrett, Murali Emani, Bala Desinghu, Gregor von Laszewski, Feiyi Wang, Amit Ruhela,
Tentative Agenda
- Any new members
- Updates on initial benchmarks
- Presentation: "All about SciML Bench" by Jeyan Thiyagalingam SciMLBench_MLCommons_Science.pdf
- Presentation: "Overview of MLCube for Science WG" – Juri Papay MLCube_Initial_Experience_MLCommon_Science.pdf
- Response to NSF Dataset RFI https://docs.google.com/document/d/1SkcsfmSslpe-auRhtliFw-d9wbTW03smyzU324SmrsA/edit?usp=sharing -- Geoffrey Fox, Christine Kirkpatrick
- Any other business
Welcome to new members
- No new members
Updates
- No updates from Jeyan (STFC), Junqi (ORNL), Murali (ANL) or Geoffrey (Indiana)
All about SciMLBench SciMLBench_MLCommons_Science.pdf
- Jeyan presented SciMLBench and there were a lot of questions around / clarifications.
- Encouragement for benchmarks from Light Sources (such as Diamond), Astronomy, and Particle Physics
- Grigori mentioned about the Collective Knowledge from
- Bala clarified average dataset sizes for Cryo Benchmarks (\~10-20TB)
- A suggestion was made by Gregg to make tutorials (which can drive adoption)
- The relation to Collective Knowledge https://github.com/ctuning/ck was discussed
- Murali suggested having separate containers for each benchmark.
Overview of MLCube for Science WG MLCube_Initial_Experience_MLCommon_Science.pdf
- Juri presented MLCube
- Questions - Comparison [better organisation of configurations can be done]
- Gregor can contribute around MLCube
- Grigori mentioned that the MLCube is not updated because they are collecting feedback
- and Juri will provide the comments / suggestions (Geofffrey will send an introduction email to Best Practices WG)
Response to NSF Dataset RFI
- Christine and Geoffrey prepared a response (https://ggle.io/435R)
- Geoffrey suggested all to take part - and deadline this Friday the 21st May, 2021
Any Other Business
- Juri got access to Summit now
- SciML Bench Datasets are now mirrored in OSN
- Christine Kirkpatrick was instrumental in securing the OSN mirror. She is happy to help this to be mirrored at other locations
- Murali submitted a request on system use
MAY 5 2021
Present
Geoffrey Fox, Jeyan Thiyagalingam, Tony Hey, Grigory Fursin, David Kanter (DK), Arjun Shankar, Junqi Yin, Bala Desinghu, Aristeidis Tsaris, Steven Farrel, Murali Emani, Juri Papay, Feiyi Wang, Gregg Barrett, Christine Kirkpatrick,
Tentative Agenda
- Any new member
- introductions
- Updates on initial benchmarks
- Report from Community Meeting
- Steps for Submitters
- Status of Initiatives FAIR and Benchmarking Software
- Any other business
Welcome to new members
- Grigori Fursin, VP of MLOps at OctoML.ai introduced himself, noting his work on FAIR data.
- Feiyi Wang reintroduced himself with his Oak Ridge group.
Updates
- Juri - up to 100 GPUs with larger 2.5 TB dataset (Feiyi asked for resource requirements).
- Junqi - no further updates
- Murali submitted a resource allocation for benchmarking on ThetaGPU - will become available.
- Geoffrey - no further updates
- Bala.Desinghu presented his potential benchmarking project on CryoEM with a presentation.
Discussion - Community Meeting
David Kanter provided some updates on community meeting focussing on three things: Datasets, Best Practices and Benchmarks
- Large datasets (FAIR, they are thinking of a new image (non-science), high-res dataset as well as a speech dataset with 86000 hours of labelled data)
- Best Practices (FAIR and most importantly reproducibility)
- Benchmarks (best highlights are power measurements, controls)
- They are bringing Training reference models into MLCube
On the science side - is different to other WGs (unique characteristics such as multi-channel datasets). David Kanter is keen to see that the WG is well supported. And keen to support us on the benchmark submissions. Exploration around hyperparameters areto be expected.
The full community meeting presentation is 2021.04.22 MLCommons Community Meeting and the Science part is Science part of MLCommons Community Meeting April 22 2021
Discussion - Submission Issues
- Stick to the schedule of benchmarks (and complete): Fewer complete methods are preferred over more half-finished methods
- We should coordinate with HPC working group
- We should clearly talk about deadlines and freezing benchmarks etc. on a time-scale that might allow us to report some results at SC21. Note that David said that ‘you will NOT be able to keep to your initial schedule – but we can certainly set what we think is a realistic schedule and try!
- Geoffrey thought his time series benchmark may not make the cut on readiness
- Peer-review window (submission window is wide enough for all accommodating changes). Usually, 5 weeks is enough.
- Hear from the training WG about their experience. David gave us introduction to Victor Bittorf chair of Training WG and MLCube lead.
Discussion - FAIR data and Benchmark Software
How do we position our work on FAIR and best practices.
- Best practices around training
- MLCube is recommended for packaging mechanism
- Related initiatives from Grigori, who volunteered to discuss in later meeting:
- Workflows Community Summit: https://workflowsri.org/summits/community
- Workflows Community Summit: Bringing the Scientific Workflows Community Together: https://arxiv.org/abs/2103.09181
- Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIs: https://arxiv.org/abs/2011.01149
- Systems Logs and other outputs provided as part of the benchmark is being looked at by Christine (from FAIR point of view)
- RDA had a session on ML + FAIR (OpenML) – and there are lessons to be learnt (University of Illinois, Chicago and Argonne)
- Best practice around storing datasets (storing them data repositories as opposed to GitHub, which is problematic) – though no MLCommon datasets are stored in GitHub
- Geoffrey Fox suggested designing a prototype around benchmarks with FAIR principles being respected. With this example, we should take it to rest of MLCommons
- Jeyan will present at the next meeting about SciMLBench and MLCube.
Any Other Business
- Grigori: Shared few links
- Related workflows community summit (FAIR discussions): https://workflowsri.org/summits/community
- Report from this summit: https://arxiv.org/pdf/2103.09181.pdf
- Tony suggested Bala and STFC to work together in curating a benchmark around CryoEM
- Christine: Took part in a “Making the Leap to Large” meeting ( where there were some interest on scientific datasets) . She is keen for some content
- https://eventsvirtually.org/making-the-leap-to-large
- OSN and ECHO
- Jeyan and Juri to speak to Christine about using OSN for distributing SciML datasets
APRIL 7, 2021
Present:
Geoffrey Fox, Jeyan Thiyagalingam, Christine Kirkpatrick, Bala Desinghu, Gregg Barrett, Gregor von Laszweski, Junqi Yin , Juri Papay, Murali Emani, Venkat Viswanath, Vibhatha Abeyakoon
Regrets: Mallikarjun (Arjun) Shankar,
Agenda
- Any new member introductions
- Benchmark and multi-host execution Updates
- Discussion of next steps in FAIR.
- See Christine's talk to Benchmark-Infra WG yesterday
- Any other business
Welcome to new members
No new members
Updates
- Juri (RAL): Presented the status of the SciML Benchmarks including the results. His talk is uploaded
- Junqi(ORNL): presented an update on the STEMDL/EDiff
- Murali(ANL) presented
- Gregor IU) had no updates but will need some help from SciML accessing RAL computers
- Bala Desinghu(Rutgers) on cryo-em datasets and a possibility of having a benchmark around them. This seemed promising and Bala will work with RAL on exploring this
- We need to define scope of work between benchmark itself, documentation, runs on systems, metadata etc.
Discussion
- Christine discussed next steps on FAIR (as she presented to benchmark infra WG).
- Discussions around ontology with Schema.org, and notion of metadata
- MLPerf has interesting logging standards but some fields are not easily machine readable as they contain multiple types of information “joined as English”
- We should adopt the FAIR principles in the areas of logging, systems and around datasets.
- It is unclear how these are captured across different benchmarks in its current form in our working group.
- Murali noted practice of HPC working group whose benchmarks capture system metadata with mlperf-logging library. Important links are
- The system meta data in each submission (per application) are available at https://github.com/mlcommons/hpc_results_v0.7
- For example, the system metadata for Cori (NERSC) is available https://github.com/mlcommons/hpc_results_v0.7/tree/main/LBNL/systems
- GF suggested that this FAIR/metadata work is important but should be viewed as distinct from the main benchmarking activity with distinct time scales.
Any Other Business
- Access to systems: Juri will work with Gregg and others to get the PEARL system used for benchmarking
- STFC / RAL will also seek some help from Murali and Venkat for access to ANL systems.