DNA sequence compression using adaptive particle swarm optimization-based memetic algorithm

Zexuan Zhu*, Jiarui Zhou, Zhen Ji, Yu Hui Shi

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

106 Citations (Scopus)

Abstract

With the rapid development of high-throughput DNA sequencing technologies, the amount of DNA sequence data is accumulating exponentially. The huge influx of data creates new challenges for storage and transmission. This paper proposes a novel adaptive particle swarm optimization-based memetic algorithm (POMA) for DNA sequence compression. POMA is a synergy of comprehensive learning particle swarm optimization (CLPSO) and an adaptive intelligent single particle optimizer (AdpISPO)-based local search. It takes advantage of both CLPSO and AdpISPO to optimize the design of approximate repeat vector (ARV) codebook for DNA sequence compression. ARV is first introduced in this paper to represent the repeated fragments across multiple sequences in direct, mirror, pairing, and inverted patterns. In POMA, candidate ARV codebooks are encoded as particles and the optimal solution, which covers the most approximate repeated fragments with the fewest base variations, is identified through the exploration and exploitation of POMA. In each iteration of POMA, the leader particles in the swarm are selected based on weighted fitness values and each leader particle is fine-tuned with an AdpISPO-based local search, so that the convergence of the search in local region is accelerated. A detailed comparison study between POMA and the counterpart algorithms is performed on 29 (23 basic and 6 composite) benchmark functions and 11 real DNA sequences. POMA is observed to obtain better or competitive performance with a limited number of function evaluations. POMA also attains lower bits-per-base than other state-of-the-art DNA-specific algorithms on DNA sequence data. The experimental results suggest that the cooperation of CLPSO and AdpISPO in the framework of memetic algorithm is capable of searching the ARV codebook space efficiently.

Original languageEnglish
Article number6031913
Pages (from-to)643-658
Number of pages16
JournalIEEE Transactions on Evolutionary Computation
Volume15
Issue number5
DOIs
Publication statusPublished - Oct 2011

Keywords

  • Approximate repeat vector
  • DNA sequence compression
  • memetic algorithm
  • particle swarm optimization

Fingerprint

Dive into the research topics of 'DNA sequence compression using adaptive particle swarm optimization-based memetic algorithm'. Together they form a unique fingerprint.

Cite this