PRATIWI, Lustiana, CHOO, Yun-Huoy, MUDA, Azah Kamilah and PRATAMA, Satrya Fajri (2022). Swarm Intelligence-based Hierarchical Clustering for Identification of ncRNA using Covariance Search Model. International Journal of Advanced Computer Science and Applications (IJACSA), 13 (11): Paper 95, 822-831. [Article]
Documents
31631:615003
PDF
Pratama -SwarmIntelligenceBasedHierarchicalClustering(VoR).pdf - Published Version
Available under License Creative Commons Attribution.
Pratama -SwarmIntelligenceBasedHierarchicalClustering(VoR).pdf - Published Version
Available under License Creative Commons Attribution.
Download (673kB) | Preview
Abstract
Covariance Model (CM) has been quite effective in finding potential members of existing families of non-coding Ribonucleic Acid (ncRNA) identification and has provided ex-cellent accuracy in genome sequence database. However, it has significant drawbacks with family-specific search. An existing Hierarchical Agglomerative Clustering (HAC) technique merged overlapping sequences which is known as combined CM (CCM). However, the structural information will be discarded, and the sequence features of each family will be significantly diluted as the number of original structures increases. Additionally, it can only find members of the existing families and is not useful in finding potential members of novel ncRNA families. Furthermore, it is also important to construct generic sequence models which can be used to recognize new potential members of novel ncRNA families and define unknown ncRNA sequence as the potential members for known families. To achieve these objectives, this study proposes to implement Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) to ensure the CCMs have the best quality for every level of dendrogram hierarchy. This study will also apply distance matrix as the criteria to measure the compatibility between two CMs. The proposed techniques will be using five gene families with fifty sequences from each family from Rfam database which will be divided into training and testing dataset to test CMs combination method. The proposed techniques will be compared to the existing HAC in terms of identification accuracy, sum of bit-scores, and processing time, where each of these performance measurements will be statistically validated.
More Information
Statistics
Downloads
Downloads per month over past year
Metrics
Altmetric Badge
Dimensions Badge
Share
Actions (login required)
View Item |