Amalur: The Convergence of Data Integration and Machine Learning

被引:0
|
作者
Li, Ziyu [1 ]
Sun, Wenbo [1 ]
Zhan, Danning [1 ]
Kang, Yan [2 ]
Chen, Lydia [3 ,4 ]
Bozzon, Alessandro [1 ]
Hai, Rihan [1 ]
机构
[1] Delft Univ Technol, Dept Software Technol, NL-2628 CD Delft, Netherlands
[2] WeBank, Shenzhen 518052, Peoples R China
[3] Univ Neuchatel, Dept Comp Sci, CH-2000 Neuchatel, Switzerland
[4] Delft Univ Technol, NL-2628 CD Delft, Netherlands
基金
荷兰研究理事会;
关键词
Metadata; Data integration; Training; Federated learning; Data privacy; Soft sensors; Training data; Machine learning; data integration; federated learning;
D O I
10.1109/TKDE.2024.3357389
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning (ML) training data is often scattered across disparate collections of datasets, called data silos. This fragmentation poses a major challenge for data-intensive ML applications: integrating and transforming data residing in different sources demand a lot of manual work and computational resources. With data privacy constraints, data often cannot leave the premises of data silos; hence model training should proceed in a decentralized manner. In this work, we present a vision of bridging traditional data integration (DI) techniques with the requirements of modern machine learning systems. We explore the possibilities of utilizing metadata obtained from data integration processes for improving the effectiveness, efficiency, and privacy of ML models. Towards this direction, we analyze ML training and inference over data silos. Bringing data integration and machine learning together, we highlight new research opportunities from the aspects of systems, representations, factorized learning, and federated learning.
引用
收藏
页码:7353 / 7367
页数:15
相关论文
共 50 条
  • [21] Drug repositioning: a machine-learning approach through data integration
    Napolitano, Francesco
    Zhao, Yan
    Moreira, Vania M.
    Tagliaferri, Roberto
    Kere, Juha
    D'Amato, Mauro
    Greco, Dario
    JOURNAL OF CHEMINFORMATICS, 2013, 5
  • [22] Integration strategies of multi-omics data for machine learning analysis
    Picard, Milan
    Scott-Boyer, Marie -Pier
    Bodein, Antoine
    Perin, Olivier
    Droit, Arnaud
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 3735 - 3746
  • [23] Predicting childhood asthma using machine learning and data integration approaches
    Kothalawala, Dilini
    Murray, Clare
    Simpson, Angela
    Custovic, Adnan
    Tapper, William
    Arshad, Hasan
    Holloway, John
    Rezwan, Faisal
    CLINICAL AND EXPERIMENTAL ALLERGY, 2021, 51 (12): : 1683 - 1683
  • [24] Caching and Machine Learning Integration Methods on Named Data Network: a Survey
    Negara, Ridha Muldina
    Syambas, Nana Rachmana
    PROCEEDING OF 14TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATION SYSTEMS, SERVICES, AND APPLICATIONS (TSSA), 2020,
  • [25] Transcriptomic and neuroimaging data integration enhances machine learning classification of schizophrenia
    Wang, Mengya
    Zhao, Shu-Wan
    Wu, Di
    Zhang, Ya-Hong
    Han, Yan-Kun
    Zhao, Kun
    Qi, Ting
    Liu, Yong
    Cui, Long-Biao
    Wei, Yongbin
    PSYCHORADIOLOGY, 2024, 4
  • [26] Data Fusion and Machine Learning Integration for Transformer Loss of Life Estimation
    Mahoor, Mohsen
    Khodaei, Amin
    2018 IEEE/PES TRANSMISSION AND DISTRIBUTION CONFERENCE AND EXPOSITION (T&D), 2018,
  • [27] Integration strategies of multi-omics data for machine learning analysis
    Picard M.
    Scott-Boyer M.-P.
    Bodein A.
    Périn O.
    Droit A.
    Computational and Structural Biotechnology Journal, 2021, 19 : 3735 - 3746
  • [28] Integration of metabolomics, lipidomics and clinical data using a machine learning method
    Acharjee, Animesh
    Ament, Zsuzsanna
    West, James A.
    Stanley, Elizabeth
    Griffin, Julian L.
    BMC BIOINFORMATICS, 2016, 17
  • [29] Integration of metabolomics, lipidomics and clinical data using a machine learning method
    Animesh Acharjee
    Zsuzsanna Ament
    James A. West
    Elizabeth Stanley
    Julian L. Griffin
    BMC Bioinformatics, 17
  • [30] A Survey on Data Collection for Machine Learning: A Big Data-AI Integration Perspective
    Roh, Yuji
    Heo, Geon
    Whang, Steven Euijong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (04) : 1328 - 1347