HARD DATA ANALYTICS PROBLEMS MAKE FOR BETTER DATA ANALYSIS ALGORITHMS: Bioinformatics as an Example

被引:7
|
作者
Bacardit, Jaume [1 ]
Widera, Pawe [1 ]
Lazzarini, Nicola [1 ]
Krasnogor, Natalio [1 ]
机构
[1] Newcastle Univ, Sch Comp Sci, Interdisciplinary Comp & Complex BioSyst Res Grp, Claremont Tower,Claremont Rd, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
基金
英国工程与自然科学研究理事会;
关键词
PREDICTION; CLASSIFICATION; PROTEINS; CONTACTS;
D O I
10.1089/big.2014.0023
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Data mining and knowledge discovery techniques have greatly progressed in the last decade. They are now able to handle larger and larger datasets, process heterogeneous information, integrate complex metadata, and extract and visualize new knowledge. Often these advances were driven by new challenges arising from real-world domains, with biology and biotechnology a prime source of diverse and hard (e.g., high volume, high throughput, high variety, and high noise) data analytics problems. The aim of this article is to show the broad spectrum of data mining tasks and challenges present in biological data, and how these challenges have driven us over the years to design new data mining and knowledge discovery procedures for biodata. This is illustrated with the help of two kinds of case studies. The first kind is focused on the field of protein structure prediction, where we have contributed in several areas: by designing, through regression, functions that can distinguish between good and bad models of a protein's predicted structure; by creating new measures to characterize aspects of a protein's structure associated with individual positions in a protein's sequence, measures containing information that might be useful for protein structure prediction; and by creating accurate estimators of these structural aspects. The second kind of case study is focused on omics data analytics, a class of biological data characterized for having extremely high dimensionalities. Our methods were able not only to generate very accurate classification models, but also to discover new biological knowledge that was later ratified by experimentalists. Finally, we describe several strategies to tightly integrate knowledge extraction and data mining in order to create a new class of biodata mining algorithms that can natively embrace the complexity of biological data, efficiently generate accurate information in the form of classification/regression models, and extract valuable new knowledge. Thus, a complete data-to-information-to-knowledge pipeline is presented.
引用
收藏
页码:164 / 176
页数:13
相关论文
共 50 条
  • [1] Data Analytics in Bioinformatics: Data Science in Practice for Genomics Analysis Workflows
    Ocana, Kary A. C. S.
    Silva, Vitor
    de Oliveira, Daniel
    Mattoso, Marta
    2015 IEEE 11TH INTERNATIONAL CONFERENCE ON E-SCIENCE, 2015, : 322 - 331
  • [2] Supporting faculty and staff to make better use of learning analytics data
    Knaub, Alexis V.
    Koester, Benjamin
    Henderson, Charles
    McKay, Timothy
    2016 PHYSICS EDUCATION RESEARCH CONFERENCE, 2016, : 188 - 191
  • [3] Data Mining and Network Analytics in Bioinformatics
    Zou, Quan
    CURRENT PROTEOMICS, 2018, 15 (03) : 174 - 174
  • [4] Special issue on semantic data analytics and bioinformatics
    Wang, Haiying
    Mak, Man-Wai
    Wang, Hui
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2018, 9 (03) : 371 - 371
  • [5] AI and Big Data Analytics for Health and Bioinformatics
    Kwoh, Chee Kcong
    PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS-BIOLOGY AND BIOINFORMATICS (CSBIO 2017), 2017, : 1 - 1
  • [6] Data Mining and Network Analytics in Bioinformatics and Medicine
    Zou, Quan
    CURRENT PROTEOMICS, 2018, 15 (05) : 343 - 343
  • [7] Getting better data to make better decisions
    Flickinger, B
    LC GC NORTH AMERICA, 2002, 20 (03) : 240 - 240
  • [8] Design of Algorithms for Big Data Analytics
    Bhatnagar, Raj
    BIG DATA ANALYTICS, BDA 2015, 2015, 9498 : 101 - 107
  • [9] Using Data Science & Big Data Analytics to Make Healthcare Green
    Godbole, Nina S.
    Lamb, John
    2015 12TH INTERNATIONAL CONFERENCE & EXPO ON EMERGING TECHNOLOGIES FOR A SMARTER WORLD (CEWIT), 2015,
  • [10] Data analysis and bioinformatics
    Di Gesu, Vito
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2007, 4815 : 373 - 388