Call for Participation

First International Symposium on Checkpointing for Supercomputing (SuperCheck21)

Call for participation


NERSC is hosting the First International Symposium on Checkpointing for Supercomputing, which will be held February 4-5, 2021. This free event will be held online and will feature the latest work in checkpoint/restart research, tools development and production use.

About the Symposium

Checkpoint/Restart (C/R) is critical for fault-tolerant computing in high-performance computing (HPC). While there has been much research and development on C/R and C/R tools, few HPC end users are able to use these tools in production workloads. Although research codes often demonstrate promising C/R capabilities, there are no feasible C/R options for diverse production workloads, especially on cutting-edge HPC systems. In this symposium, we will bring together C/R researchers, practitioners, application developers, and end users to share both the latest research results and experiences on adopting C/R tools in production. The goal of this symposium is to showcase the latest research on C/R, motivate the development of usable C/R tools, and boost the adoption of C/R tools in HPC production workloads.

This symposium features up-to-the-minute, original and high-quality work, and will be presentation only (no papers). Authors are required to submit a two-page extended abstract for peer-review. Accepted abstracts will be published at arxiv.org, and the authors will be invited to present their work at the symposium. The presentation slides and recordings along with the presenter profiles will be posted on the symposium website. We encourage participation from researchers, end-users, professionals and students.

Topics of Interest

We welcome any and all aspects of checkpointing for science and engineering in the High Performance Computing (HPC) context, including the latest research results and development, deployment, and application experiences. The symposium scope includes but is not limited to:

C/R research and tools development:

  • C/R targeting the full range of supercomputing software, including MPI, OpenMP, GPGPU software, FPGAs, cloud, and container applications, etc.

  • Both pure and hybrid approaches to transparent checkpointing (some examples of hybrid approaches are: application-specific plugins to aid in checkpointing; and integrated modules for transparent checkpointing as part of larger scientific/engineering toolkits)

  • Frameworks for multi-level checkpointing

  • The development of new methods for low-overhead checkpointing, newer fundamental algorithms, software development methods, the impact of future supercomputer hardware, performance evaluation, and reproducibility, fault recovering

  • Research on C/R scheduling and intervals

C/R use in production (including all levels of checkpointing: application, job, and system levels):

  • The adoption of transparent C/R tools in production workloads (C/R use cases)

  • The application-initiated use of C/R tools (alternative to built-in internal checkpointing)

  • C/R applications and support on HPC systems (e.g., resource scheduling, system utilization, batch system integration, best practice, etc.)

Submission

We invite authors to submit their original, high-quality work.

All submissions should be made electronically through the SuperCheck21 submissions website. Submissions must be double blind, i.e., authors should remove their names, institutions or hints found in references to earlier work. When discussing past work, they need to refer to themselves in the third person, as if they were discussing another researcher’s work. Furthermore, authors must identify any conflict of interest with the PC chair or PC members.

Authors are required to submit a <150 word abstract (this will be used for the symposium website) and two pages of an extended abstract. The page limit includes figures and tables, but does not include references, for which there is no page limit. Extended abstracts should be submitted in the IEEE conference format as a PDF.

Click here to submit your abstracts.

Upon Acceptance

The symposium will feature two keynotes, invited talks, a panel discussion, and technical talks each with 25-minute of presentation and 5 minutes of discussion. We will post the detailed schedule when it is available.

If your submission is accepted, you are required to address the reviewer comments, and upload your updated abstracts at the submission site along with the author bios (<300 words) by January 29, 2021. The updated extended abstracts can be up to three pages, including figures and tables, but not references, for which there is no page limit. The accepted abstracts will be published at arxiv.org. All presentations will be pre-recorded. Presenters will receive instructions and more information on recording and uploading their presentations, which are due January 29, 2021. All presentation slides, recordings, and the presenter bios will be included in the technical program archive on the SuperCheck21 website.

Participation

The symposium will be held from February 4-5, 2021, 8:00am–12:30pm Pacific Time. All participants including presenters are required to register. The registration is free.

Click here to register for the symposium

Important Dates

  • Call for Participation Release: September 24, 2020

  • Abstract Submission Due: December 14, 2020 (AoE)

  • Acceptance Notification: January 11, 2021 January 12, 2021

  • Presentation Submission Due: Jan 29, 2021 (AoE)

  • Symposium: February 4-5, 2021, 8:00am–12:30pm Pacific Time

Organizers

  • Zhengji Zhao, National Energy Research Scientific Computing Center(NERSC) at Lawrence Berkeley National Laboratory (LBNL)

Contact:

Zhengji Zhao, zzhao@lbl.gov