Ensemble dispatching on an IBM Blue Gene/L for a bioinformatics knowledge environment

Paul Marshall, Matthew Woitaszek, Henry M. Tufo, Rob Knight, Daniel McDonald, Julia Goodrich, Jeremy Widmann. Ensemble dispatching on an IBM Blue Gene/L for a bioinformatics knowledge environment. In MTAGS '09: Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, Portland, Oregon, USA, November 2009.

This paper discusses our work providing support for processing a large number of short tasks within the context of our development of a collaborative bioinformatics knowledge environment for structural biologists, environmental microbiologists, and evolutionary biologists. We have designed and implemented a new ensemble-based task dispatching system that we have deployed on a Blue Gene/L system in conjunction with the Blue Gene's High Throughput Computing (HTC) capability. Unlike our prior general database-backed HTC task dispatching system, the ensemble-based task dispatching system is able to efficiently process and dispatch large numbers of very short tasks to over a thousand cores. We also investigate the scalability of the IBM Blue Gene/L at HTC in general, identifying and eliminating processor-reboot inefficincies for very short tasks for specific applications, making the Blue Gene/L a feasible processing system for this bioinformatics workload.

@inproceedings{200911-mtags2009-btc,
      Address = {Portland, Oregon, USA},
      Author = {Marshall, Paul and Woitaszek, Matthew and Tufo, Henry M. and Knight, Rob and McDonald, Daniel and Goodrich, Julia and Widmann, Jeremy},
      Booktitle = {MTAGS '09: Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers},
      Doi = {http://doi.acm.org/10.1145/1646468.1646481},
      Isbn = {978-1-60558-714-1},
      Month = {November},
      Pages = {1--8},
      Title = {Ensemble dispatching on an {IBM} {B}lue {G}ene/{L} for a bioinformatics knowledge environment},
      Year = {2009},}