What is Data mining?
Data mining means extracting and analyzing information of interest from large amount of raw data. It is a potential way for discovering new patterns and meaningful connections from massive data volume. As it is difficult to get a useful data or precise information that we want from vast amount of data stored in databases .For this purpose data mining tools and techniques are used.(Lan et al., 2018)
What is bioinformatics?
Bioinformatics is a combination of two subjects, molecular biology and computer science. It emerges as interdisciplinary field that involves applying the computational power such as tools and methods to assess, interpret, review and understand biological mechanisms, evolutionary relationships and gene expression studies. we can say that bioinformatics is helping mankind by converting raw data into useful information that researchers used to discover drugs and treat diseases. It is very important field as it also help in integrating proteomics and genomics.(Luscombe, Greenbaum, & Gerstein, 2001)
Type of data that bioinformatics analyzed:
Bioinformatics Analyses the wide range of topics that come under the field. It organizes and analyzes data from various sources of information such as:
Raw DNA sequence
Protein sequence
Protein structure
Genomics
Gene expression (Luscombe et al., 2001)
Data Mining in bioinformatics:
Data mining in bioinformatics is very important aspect of this field. In recent years, due to advancement in the field of protein sciences and gene mapping, pool of data is stored in databases. So, data mining involves extraction of useful information from large amount of data related to protein structure, amino acid sequence in proteins, gene sequence of hundreds of organisms. It covert it into sensible data that is helpful for various discoveries, analyzing functions of proteins and for noticing the defects and mutations in gene sequence. It also finds out the hidden structures and sequences of proteins that are not accessible in the bulk of data stored in data bases. Data mining is also described as knowledge discovery in data bases (KDD). (Raza, 2012)
Data mining tools in bioinformatics:
Bioinformatics use various tools for extracting useful information from data bases. Some tools are as follows:
WEKA stands for Waikato environment for knowledge analysis. It is an extensive data mining tool in bioinformatics. It gives algorithms to preprocess, classify, cluster, associate and visualize biological data.
Orange also performs bioinformatics data mining task by analyzing and inferring large scale biological data. (David, Saeb, & Al Rubeaan, 2013)
Datamining techniques in bioinformatics:
Datamining tasks are performed by numerous computational methods. Data mining techniques include
Classification
Clustering
Association
prediction of patterns.
Classification
This technique is used to distribute data into classes. Classification techniques used in bioinformatics include Decision tree, Bayesian network classifiers. A tree decision classifier is a machine learning algorithm that present tree like structure for visual representation of classification of data.
Clustering
Clustering is a technique of sorting data into groups depending upon the similarities and differences between them so, data. Distance based clustering, dynamic clustering, density clustering, hierarchical clustering are bioinformatics clustering methodologies.
Association
Association is a technique used to make relationship between different features of data in large data sets. It is helpful in relating different diseases with genes and exploring relationship between gene expression and drug action .it also associate protein structure with protein function.
Prediction of patterns:
Prediction is very important data mining strategy as it uses machine learning methodology to predict future results and patterns from data. It is helpful in estimating protein 3D structure from amino acid sequences. It can predict treatment of various genetic diseases and role of gene from its nucleotide sequence. (Khan, Bala, Yesmin, & Abedin, 2022)
Conclusion
Bioinformatics, through its data mining feature enables enhanced understanding of complex biological processes. By analyzing large data sets, it unveils the structures and associations that are not noticeable through other conventional techniques. It speeds up the drug discovery process by forecasting the resulting effectiveness and risks of compound.
References:
David, S. K., Saeb, A., & Al Rubeaan, K. (2013). Comparative analysis of data mining tools and classification techniques using weka in medical bioinformatics. Computer Engineering and Intelligent Systems, 4(13), 28-38.
Khan, M. N. R., Bala, S., Yesmin, S., & Abedin, M. Z. (2022). Bioinformatics: The importance of data mining techniques.Paper presented at the Sentimental Analysis and Deep Learning: Proceedings of ICSADL 2021.
Lan, K., Wang, D.-t., Fong, S., Liu, L.-s., Wong, K. K., & Dey, N. (2018). A survey of data mining and deep learning in bioinformatics. Journal of medical systems, 42, 1-20.
Luscombe, N. M., Greenbaum, D., & Gerstein, M. (2001). What is bioinformatics? An introduction and overview. Yearbook of medical informatics, 10(01), 83-100.
Raza, K. (2012). Application of data mining in bioinformatics. arXiv preprint arXiv:1205.1125.