Comparison of Data Partitioning Schema of Parallel Pairwise Alignment on Shared Memory System

Auriza Rahmad Akbar, Heru Sukoco, Wisnu Ananta Kusuma

Abstract


The pairwise alignment (PA) algorithm is widely used in bioinformatics to analyze biological sequence. With the advance of sequencer technology, a massive amount of DNA fragments are sequenced much quicker and cheaper. The alignment algorithm needs to be parallelized to be able to align them in a shorter time. Many previous researches have parallelize PA algorithm using various data partitioning schema, but it is unclear which one is the best. The data partitioning schema is important for parallel PA performance, because this algorithm use dynamic programming technique that needs intense inter-thread communication. In this paper, we compared four partitioning schemas to find the best performing one on shared memory system. Those schemas are: blocked columnwise, rowwise, antidiagonal, and blocked columnwise with manual scheduling and loop unrolling. The last schema gave the best performance of 89% efficiency on 4 threads. This result provided fine-grain parallelism that can be used further to develop parallel multiple sequence alignment (MSA).

Keywords


data partition; pairwise alignment; parallel processing; shared memory

Full Text:

PDF

References


Cohen J. Bioinformatics: an introduction for computer scientists. ACM Computing Surveys (CSUR). 2004; 36: 122–158.

Waterman MS, Smith TF, Beyer WA. Some biological sequence metrics. Advances in Mathematics. 1976; 20: 367–387.

Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research. 1994; 22: 4673–4680.

Notredame C, Higgins DG, Heringa J. T-Coffee: a novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology. 2000; 302: 205–217.

Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research. 2002; 30: 3059–3066.

Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research. 2004; 32: 1792–1797.

Rognes T, Seeberg E. Six-fold speed-up of Smith–Waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics. 2000; 16: 699–706.

Liu L, Li Y, Li S, Hu N, He Y, Pong R, Lin D, Lu L, Law M. Comparison of next-generation sequencing systems. BioMed Research International. 2012; 2012.

Kleinjung J, Douglas N, Heringa J. Parallelized multiple alignment. Bioinformatics. 2002; 18: 1270–1271.

Li KB. ClustalW-MPI: ClustalW analysis using distributed and parallel computing. Bioinformatics. 2003; 19: 1585–1586.

Chaichoompu K, Kittitornkun S, Tongsima S. MT-ClustalW: multithreading multiple sequence alignment. IEEE International Parallel and Distributed Processing Symposium (IPDPS). Rhodes Island. 2006: 8.

Katoh K, Toh H. Parallelization of the MAFFT Multiple Sequence Alignment Program. Bioinformatics. 2010; 26: 1899–1900.

Rognes T. ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches. Nucleic Acids Research. 2001; 29: 1647–1652.

Liu Y, Wirawan A, Schmidt A. CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinformatics. 2013; 14: 117.

Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology. 1970; 48: 443–453.

Smith TF, Waterman MS. Identification of common molecular subsequences. Journal of Molecular Biology. 1981; 147: 195–197.

Hughey R. Parallel hardware for sequence comparison and alignment. Computer Applications in the Biosciences: CABIOS. 1996; 12: 473–479.

Martins WS, del Cuvillo J, Cui W, Gao GR. Whole genome alignment using a multithreaded parallel implementation. Symposium on Computer Architecture and High Performance Computing. Vitória. 2001: 1–8.

Liu W, Schmidt B. Parallel design pattern for computational biology and scientific computing applications. IEEE International Conference on Cluster Computing. Hongkong. 2003: 456–459.

Li J, Ranka S, Sahni S. Pairwise sequence alignment for very long sequences on GPUs. IEEE International Conference on Computational Advances in Bio and Medical Sciences. Las Vegas. 2012: 1–6.

Cormen TH, Leiserson CE, Rivest RL, Stein C. Introduction to Algorithms. Third Edition. Cambridge: MIT Press. 2009: 390–396.

Carroll H, Beckstead W, O’Connor T, Ebbert M, Clement M, Snell Q, McClellan D. DNA reference alignment benchmarks based on tertiary structure of encoded proteins. Bioinformatics. 2007; 23(19): 2648–2649.

Dagum L, Menon R. OpenMP: an industry standard API for shared-memory programming. Computational Science & Engineering, IEEE. 1998; 5: 46–55.

Lamport L. How to make a multiprocessor computer that correctly executes multiprocess programs. Computers, IEEE Transactions on. 1979; 100: 690–691.

Loveman DB. Program improvement by source-to-source transformation. Journal of the ACM (JACM). 1977; 24: 121–145.

Sedgewick R. Implementing quicksort programs. Communications of the ACM. 1978; 21: 847–857.

Quinn MJ. Parallel Programming in C with MPI and OpenMP. New York: McGraw-Hill. 2003: 118–119.




DOI: http://doi.org/10.12928/telkomnika.v13i2.1415

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

TELKOMNIKA Telecommunication, Computing, Electronics and Control
ISSN: 1693-6930, e-ISSN: 2302-9293
Universitas Ahmad Dahlan, 4th Campus
Jl. Ringroad Selatan, Kragilan, Tamanan, Banguntapan, Bantul, Yogyakarta, Indonesia 55191
Phone: +62 (274) 563515, 511830, 379418, 371120
Fax: +62 274 564604

View TELKOMNIKA Stats