Swarm Intelligence-based Hierarchical Clustering for Identification of ncRNA using Covariance Search Model.

PRATIWI, Lustiana, CHOO, Yun-Huoy, MUDA, Azah Kamilah and PRATAMA, Satrya Fajri (2022). Swarm Intelligence-based Hierarchical Clustering for Identification of ncRNA using Covariance Search Model. International Journal of Advanced Computer Science and Applications (IJACSA), 13 (11): Paper 95, 822-831.

[img]
Preview
PDF
Pratama -SwarmIntelligenceBasedHierarchicalClustering(VoR).pdf - Published Version
Creative Commons Attribution.

Download (673kB) | Preview
Official URL: https://thesai.org/Publications/ViewPaper?Volume=1...
Open Access URL: https://thesai.org/Downloads/Volume13No11/Paper_95... (Published)

Abstract

Covariance Model (CM) has been quite effective in finding potential members of existing families of non-coding Ribonucleic Acid (ncRNA) identification and has provided ex-cellent accuracy in genome sequence database. However, it has significant drawbacks with family-specific search. An existing Hierarchical Agglomerative Clustering (HAC) technique merged overlapping sequences which is known as combined CM (CCM). However, the structural information will be discarded, and the sequence features of each family will be significantly diluted as the number of original structures increases. Additionally, it can only find members of the existing families and is not useful in finding potential members of novel ncRNA families. Furthermore, it is also important to construct generic sequence models which can be used to recognize new potential members of novel ncRNA families and define unknown ncRNA sequence as the potential members for known families. To achieve these objectives, this study proposes to implement Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) to ensure the CCMs have the best quality for every level of dendrogram hierarchy. This study will also apply distance matrix as the criteria to measure the compatibility between two CMs. The proposed techniques will be using five gene families with fifty sequences from each family from Rfam database which will be divided into training and testing dataset to test CMs combination method. The proposed techniques will be compared to the existing HAC in terms of identification accuracy, sum of bit-scores, and processing time, where each of these performance measurements will be statistically validated.

Item Type: Article
Uncontrolled Keywords: Covariance model; ncRNA identification; swarm intelligence; hierarchical clustering; 0803 Computer Software; 1005 Communications Technologies; 46 Information and computing sciences
Identification Number: https://doi.org/10.14569/IJACSA.2022.0131195
Page Range: 822-831
SWORD Depositor: Symplectic Elements
Depositing User: Symplectic Elements
Date Deposited: 07 Mar 2023 12:50
Last Modified: 11 Oct 2023 16:45
URI: https://shura.shu.ac.uk/id/eprint/31631

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics