HARD DATA ANALYTICS PROBLEMS MAKE FOR BETTER DATA ANALYSIS ALGORITHMS: Bioinformatics as an Example

被引:7
|
作者
Bacardit, Jaume [1 ]
Widera, Pawe [1 ]
Lazzarini, Nicola [1 ]
Krasnogor, Natalio [1 ]
机构
[1] Newcastle Univ, Sch Comp Sci, Interdisciplinary Comp & Complex BioSyst Res Grp, Claremont Tower,Claremont Rd, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
基金
英国工程与自然科学研究理事会;
关键词
PREDICTION; CLASSIFICATION; PROTEINS; CONTACTS;
D O I
10.1089/big.2014.0023
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Data mining and knowledge discovery techniques have greatly progressed in the last decade. They are now able to handle larger and larger datasets, process heterogeneous information, integrate complex metadata, and extract and visualize new knowledge. Often these advances were driven by new challenges arising from real-world domains, with biology and biotechnology a prime source of diverse and hard (e.g., high volume, high throughput, high variety, and high noise) data analytics problems. The aim of this article is to show the broad spectrum of data mining tasks and challenges present in biological data, and how these challenges have driven us over the years to design new data mining and knowledge discovery procedures for biodata. This is illustrated with the help of two kinds of case studies. The first kind is focused on the field of protein structure prediction, where we have contributed in several areas: by designing, through regression, functions that can distinguish between good and bad models of a protein's predicted structure; by creating new measures to characterize aspects of a protein's structure associated with individual positions in a protein's sequence, measures containing information that might be useful for protein structure prediction; and by creating accurate estimators of these structural aspects. The second kind of case study is focused on omics data analytics, a class of biological data characterized for having extremely high dimensionalities. Our methods were able not only to generate very accurate classification models, but also to discover new biological knowledge that was later ratified by experimentalists. Finally, we describe several strategies to tightly integrate knowledge extraction and data mining in order to create a new class of biodata mining algorithms that can natively embrace the complexity of biological data, efficiently generate accurate information in the form of classification/regression models, and extract valuable new knowledge. Thus, a complete data-to-information-to-knowledge pipeline is presented.
引用
收藏
页码:164 / 176
页数:13
相关论文
共 50 条
  • [21] Biological Big Data Analytics: Challenges and Algorithms
    Rajasekaran, Sanguthevar
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 1 - 1
  • [22] ANALYSIS OF ALGORITHMS - COPING WITH HARD PROBLEMS
    KOLATA, GB
    SCIENCE, 1974, 186 (4163) : 520 - 521
  • [23] Data analytics and knowledge discovery on big data: Algorithms, architectures, and applications
    Wrembel, Robert
    Gamper, Johann
    DATA & KNOWLEDGE ENGINEERING, 2024, 150
  • [24] The application of Evolutionary and Nature Inspired Algorithms in Data Science and Data Analytics
    Mohammadi, Farid Ghareh
    Shenavarmasouleh, Farzan
    Rasheed, Khaled
    Taha, Thiab
    Amini, M. Hadi
    Arabnia, Hamid R.
    2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI 2021), 2021, : 255 - 261
  • [25] Big data analytics in bioinformatics: architectures, techniques, tools and issues
    Kashyap H.
    Ahmed H.A.
    Hoque N.
    Roy S.
    Bhattacharyya D.K.
    Network Modeling Analysis in Health Informatics and Bioinformatics, 2016, 5 (1)
  • [26] Enhancing Real-Time Data Analysis through Advanced Machine Learning and Data Analytics Algorithms
    Abualigah, Laith
    INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2025, 21 (01) : 4 - 25
  • [27] Problems in data structures and algorithms
    Tarjan, RE
    GRAPH THEORY, COMBINATORICS AND ALGORITHMS: INTERDISCIPLINARY APPLICATIONS, 2005, : 17 - 39
  • [28] Better-Not Just Bigger-Data Analytics
    Nallamothu, Brahmajee K.
    CIRCULATION-CARDIOVASCULAR QUALITY AND OUTCOMES, 2017, 10 (07):
  • [29] Data Analytics for Better Informed Technology & Engineering Management
    Porter A.L.
    IEEE Engineering Management Review, 2019, 47 (03): : 29 - 32
  • [30] Special Issue on Algorithms for Data and Text Mining in Bioinformatics
    Makris, Christos
    Tsakalidis, Athanasios
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2015, 24 (01)