Parallel High-resolution Climate Data Analysis using Swift

Matthew Woitaszek, John M. Dennis, Taleena R. Sines. Parallel High-resolution Climate Data Analysis using Swift. In MTAGS 2011: Proceedings of the 4th Workshop on Many-Task Computing on Grids and Supercomputers, Seattle, Washington, USA, November 2011.

Advances in software parallelism and high-performance systems have resulted in an order of magnitude increase in the volume of output data produced by the Community Earth System Model (CESM). As the volume of data produced by CESM increases, the single-threaded script-based software packages traditionally used to post-process model output data have become a bottleneck in the analysis process. This paper presents a parallel version of the CESM atmosphere model data analysis workflow implemented using the Swift scripting language. Using the Swift implementation of the workflow, the time to analyze a 10-year atmosphere simulation on a typical cluster is reduced from 95 to 32 minutes on a single 8-core node and to 20 minutes on two nodes. The parallelized workflow is then used to evaluate several new data-intensive computational systems that feature RAM-based and flash-based storage. Even when constraining parallelism to limit the amount of file system space used by intermediate temporary data, our results show that the Swift-based implementation significantly reduces data analysis time.

@inproceedings{201111-mtags2011-amwgswift,
      Address = {Seattle, Washington, USA},
      Author = {Woitaszek, Matthew and Dennis, John M. and Sines, Taleena R.},
      Booktitle = {MTAGS 2011: Proceedings of the 4th Workshop on Many-Task Computing on Grids and Supercomputers},
      Doi = {10.1145/2132876.2132882},
      Month = {November},
      Title = {Parallel High-resolution Climate Data Analysis using {S}wift},
      Year = {2011}}