Classification of Biological Data using Deep Learning Technique
Main Article Content
Abstract
A huge amount of newly sequenced proteins is being discovered on daily basis. The main
concern is how to extract the useful characteristics of sequences as the input features for the
network. These sequences are increasing exponentially over the decades. However, it is very
expensive to characterize functions for biological experiments and also, it is really necessary
to find the association between the information of datasets to create and improve medical
tools. Recently machine learning algorithms got huge attention and are widely used. These
algorithms are based on deep learning architecture and data-driven models. Previous work
failed to properly address issues related to the classification of biological sequences i.e.
protein including efficient encoding of variable length biological sequence data and
implementation of deep learning based neural network models to enhance the performance of
classification/ recognition systems. To overcome these issues, we have proposed a deep
learning based neural network architecture so that classification performance of the system
can be increased. In our work, we have proposed 1D-convolution neural network which
classifies the protein sequences to 10 top common classes. The model extracted features from
the protein sequences labels and learned through the dataset. We have trained and evaluate
our model on protein sequences downloaded from protein data bank (PDB). The model
maximizes the accuracy rate up to 96%.