Machine Learning with Distributed Data Management and Process Architecture

被引:0
|
作者
Baysal, Engin [1 ]
Bayilmis, Cuneyt [2 ]
机构
[1] Istanbul Gedik Univ, Gedik Vocat Sch, Istanbul, Turkey
[2] Sakarya Unveristy, Comp & Informat Engn, Sakarya, Turkey
关键词
big data; big data analytics; machine learning; apache spark; pyspark; logistic regression; yarn;
D O I
10.1109/ubmk.2019.8907073
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the development of technology that takes place more and more every day in our lives, it becomes almost impossible to manage and process the data produced and thus brought about the necessity of storage and analysis. Both the data size and the increase in the variety of data have necessitated the development of new methods in this context. In this study, distributed data management and analysis tools which are developed for data that cannot be processed in traditional regulations have been used. The machine learning application has been developed by using Logistic Regression classification algorithm. The application was implemented with the data set obtained from the sensors using pyspark libraries on the Spark cluster created using the Google Cloud service. And the working environment managed by YARN, has been observed during the implementation of the application.
引用
收藏
页码:53 / 57
页数:5
相关论文
共 50 条
  • [21] RECONCILIATION PROCESS FOR DATA MANAGEMENT IN DISTRIBUTED ENVIRONMENTS
    BALLOU, DP
    TAYI, GK
    MIS QUARTERLY, 1985, 9 (02) : 97 - 108
  • [22] Optimal Data Splitting in Distributed Optimization for Machine Learning
    Medyakov, D.
    Molodtsov, G.
    Beznosikov, A.
    Gasnikov, A.
    DOKLADY MATHEMATICS, 2023, 108 (SUPPL 2) : S465 - S475
  • [23] DISTRIBUTED SAR DATA PROCESSING AIDED BY MACHINE LEARNING
    D'Aria, Davide
    Giudici, Davide
    Persico, Adriano
    Guccione, Pietro
    Gerace, Fabio
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 7848 - 7851
  • [24] Towards FAIR Data in Distributed Machine Learning Systems
    Mou, Yongli
    Guo, Fengyang
    Lu, Wei
    Li, Yongzhao
    Beyan, Oya
    Rose, Thomas
    Dustdar, Schahram
    Decker, Stefan
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 6450 - 6455
  • [25] SketchML: Accelerating Distributed Machine Learning with Data Sketches
    Jiang, Jiawei
    Fu, Fangcheng
    Yang, Tong
    Cui, Bin
    SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 1269 - 1284
  • [26] Data Manipulation Avoidance Schemes for Distributed Machine Learning
    Chen, Yijin
    Chen, Dongpo
    Wei, Yunkai
    Leng, Supeng
    Mao, Yuming
    Lin, Jing
    ICC 2019 - 2019 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2019,
  • [27] Data Poison Detection Schemes for Distributed Machine Learning
    Chen, Yijin
    Mao, Yuming
    Liang, Haoyang
    Yu, Shui
    Wei, Yunkai
    Leng, Supeng
    IEEE ACCESS, 2020, 8 : 7442 - 7454
  • [28] Strategies and Principles of Distributed Machine Learning on Big Data
    Xing, Eric P.
    Ho, Qirong
    Xie, Pengtao
    Wei, Dai
    ENGINEERING, 2016, 2 (02) : 179 - 195
  • [29] Enhancing dexterous hand control: a distributed architecture for machine learning integration
    Tu, Baoxu
    Zhang, Yuanfei
    Li, Wangyang
    Ni, Fenglei
    Jin, Minghe
    INDUSTRIAL ROBOT-THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH AND APPLICATION, 2024, 51 (06): : 1006 - 1014
  • [30] Accelerating Machine Learning on Sparse Datasets with a Distributed Memory Vector Architecture
    Araki, Takuya
    2017 16TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC-2017), 2017, : 112 - 121