Characterizing Grid experiments in Bioinformatics for an efficient scheduling
Ignacio Blanquer (GRyCAP(Universidad Politécnica de Valencia))
Abel Antonio Carrión (GRyCAP(Universidad Politécnica de Valencia))
Vicente Hernández (GRyCAP(Universidad Politécnica de Valencia))
Bioinformatics is an area which involves the execution of many computing-intensive applications. Due to the development of new complex techniques and the increasing size of databases, this area demands the execution of experiments which exceed the resources of most research groups. The use case presented in this work is one of the most representative applications in the Bioinformatics field: computing the alignment of genomic and proteomic samples with respect to annotated databases through BLAST, developed at the NCBI in USA. Performing the homology search of one sequence with BLAST, even with large databases such as GenBank, takes only a few minutes. However, processing millions of sequences, with a sequential approach, requires years of CPU computation. Nevertheless, the computing time of this massive parallel application can be intensively reduced by means of e-Science infrastructures, such as EGEE, EELA and some NGIs, splitting the work into thousands of loosely-coupled tasks. In order to improve the performance of these experiments, a key aspect has proven to be the development of sophisticated automatisms which predict the jobs’ elapsed time. Thus, this article focuses on describing all the details regarding the characterization of these experiments to improve the performance results.
Performance Analysis