Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming

Lala Septem Riza, Farhan Dhiyaa Pratama, Erna Piantari, Mahmoud Fahsi

Abstract


Genomic repeats, i.e., pattern searching in the string processing process to find repeated base pairs in the order of Deoxyribonucleic Acid (DNA), requires a long processing time. This research builds a big-data computational model to look for patterns in strings by modifying and implementing the Boyer-Moore algorithm on Apache Spark Streaming for human DNA sequences from the Ensemble site. Moreover, we perform some experiments on cloud computing by varying different specifications of computer clusters with involving datasets of human DNA sequences. The results obtained show that the proposed computational model on Apache Spark Streaming is faster than standalone computing and parallel computing with multicore. Therefore, it can be stated that the main contribution in this research, which is to develop a computational model for reducing the computational costs, has been achieved.


Keywords


Apache Spark Streaming; DNA; genomic repeats; human genom; string matching;

Full Text:

PDF


DOI: http://doi.org/10.12928/telkomnika.v18i2.14883

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

TELKOMNIKA Telecommunication, Computing, Electronics and Control
ISSN: 1693-6930, e-ISSN: 2302-9293
Universitas Ahmad Dahlan, 4th Campus
Jl. Ringroad Selatan, Kragilan, Tamanan, Banguntapan, Bantul, Yogyakarta, Indonesia 55191
Phone: +62 (274) 563515, 511830, 379418, 371120
Fax: +62 274 564604

View TELKOMNIKA Stats