Headed to Supercomputing 2019? Register to join the Cylc user group featuring the Altair weather solution! Lunch included.
Weather modeling is a challenge. It requires high-performance computing (HPC) resources and powerful software that can orchestrate the most complex cycling workloads.
Predicting Australia’s Climate and Weather
The Bureau of Meteorology is Australia’s national climate, water, and weather agency, one of the most fundamental and widely used services of the Australian government. The Bureau provides the entire continent with expertise about its “often-harsh natural environment, things like drought, floods, storms, and tropical cyclones,” says Alan Riley, project manager at the Bureau. “All these things need expertise provided to the Australian public, and that’s done through regular forecasts, warnings, monitoring, and advice that spans not only Australia, but the Antarctic Territory as well.”
The Bureau runs its numerical weather prediction (NWP) suite on its petascale Cray supercomputer. It’s using an aging SMS workflow scheduler. The Bureau needs a modern, scalable service to support the growing NWP modelling and post-processing output.
Orchestrating Cycling Workflows with Cylc
Cylc (pronounced “silk”) is an open-source Python workflow engine for cycling systems that handles a range of workflow complexities. [See Workflow Automation for Cycling Systems, Computing in Science and Engineering, Vol 21, Issue 4, July/Aug 2019]. It automatically executes tasks according to detailed schedules and dependencies, and it’s especially useful in areas such as weather and climate modeling, NWP, physics simulation, and data processing. Cylc has become a popular choice at major weather and climate centers around the world — and now it’s a key tool at the Bureau of Meteorology.
With Cylc, tasks can be run as soon as dependencies are met, with maximum concurrency for more efficient use of HPC resources. Cylc was originally developed for operational environmental forecasting at NIWA, the National Institute of Water and Atmospheric Research in New Zealand, and now it’s an open-source collaboration between NIWA, the Met Office (UK), the Center of Excellence for Weather and Climate Simulation in Europe (ESiWACE), the Bureau of Meteorology, and other contributors.
“Cycling is not the same as real-time scheduling,” says NIWA’s Dr. Hilary Oliver, the originator of Cylc. “Cylc seamlessly transitions between scheduling as quickly as possible when running off the clock and real-time scheduling once the system has caught up to the clock.”
Bureau of Meteorology program director Tim Pugh told us that one reason his organization chose Cylc is for its effective use in research and operations. “The selection of Cylc as the workflow scheduler for our operations was very simple,” he said. “We’re using it in research. By using it in production it meant a very simple transition of our HPC applications and numerical prediction from research into operations.”
Cylc and Altair PBS Professional™
Converting complex, critical systems to a new open-source workflow engine can be a daunting prospect — so Altair, with NIWA and the Australian Bureau of Meteorology, developed a production environment for monitoring the performance of many Cylc workflows along with the PBS Professional workload manager.
PBS Professional is Altair’s industry-leading workload manager and job scheduler for HPC and cloud environments, part of the Altair PBS Works™ suite. It allows HPC users to simplify HPC infrastructure management and optimize system utilization, improving application performance and maximizing ROI on hardware and software investments.
Altair already provided PBS Professional to the Bureau as a scheduler to manage the resources on their HPC system. With the SMS scheduler being deprecated and needing to be replaced, the Bureau was looking for a commercial vendor to provide support for their Cylc workflows, since Cylc is open source with no commercial support.
Why the Bureau of Meteorology Chose Altair
“Altair was selected to assist the Bureau because of its knowledge of HPC systems and our need to integrate with their products,” said Tim Pugh. “PBS Professional provides the job resource management and scheduling on our HPC system. It’s critical to the system such that if that service were to go down, then we’d consider that a whole-system outage. Altair understands the criticality of that.” The partnership, he said, “allows us to achieve the outcomes every day that are needed by the Bureau’s forecast center.”
David Block, project manager at Altair, added, “Altair has a long history of listening to our customers and producing robust solutions on time and on target. Altair is collaborating with the Bureau to build a general-purpose solution to assist weather and climate staff with monitoring their environment, and then providing the diagnostics necessary to pinpoint any issue.”
Currently, staff members at many sites are required to constantly monitor the environment. The new solution will alert them via a push mechanism. It will also allow them to monitor the supercomputer hardware, Cylc suites, and PBS Professional jobs, and report status clearly and concisely. The solution is being designed to be modular and general-purpose so any site can deploy it out of the box, or substitute components they’re more familiar with. Altair is building several components as well as making contributions to Cylc.
“Due to the use of open-source software,” said Block, “sites around the world can implement the solution and work with the communities to make improvements. Altair is also available to support any site that installs the solution.”
Continuing to Improve Cylc
The Cylc + PBS Professional integration at Australia’s Bureau of Meteorology will unify a large production system with many workflows using PBS Professional and the open-source Kafka message broker. “Altair brings significant commercial expertise to the open-source project,” said NIWA’s Dr. Oliver, “and they’re supporting Cylc development to modernize the graphic user interface and improve security.”
The Bureau’s Alan Riley added, “Simply moving to Cylc solves a lot of problems we’ve spoken about, but the project is also moving on. It’s doing some extra things to improve the system even more.” Areas in which the collaboration continues to evolve include:
- Security features
- Cross-scheduler triggering via Apache Kafka
- Communication between SMS and Cylc suites and vice versa
- Enhanced reporting
- BI tool support for dashboarding and performance analysis
- Integration with the Bureau’s enterprise monitoring tool
Additional upcoming Cylc improvements include an all-new web-based control panel with tools that give support teams more control and better monitoring capabilities. Cylc version 8 will be a major rearchitecting of Cylc for modern web technologies. Altair will continue to be active in the Cylc community.
- The Cost of a Millisecond: HPC Job Scheduling for Semiconductor Design - February 14, 2020
- Altair Enlighten Awards Now Open – Accepting Submissions for Global Automotive Lightweighting Advancements - February 3, 2020
- Crash Simulation with Arm and the Catalyst UK Project - January 24, 2020