Large-scale virtual screening on public cloud resources with Apache Spark

被引:14
|
作者
Capuccini, Marco [1 ,2 ]
Ahmed, Laeeq [3 ]
Schaal, Wesley [2 ]
Laure, Erwin [3 ]
Spjuth, Ola [2 ]
机构
[1] Uppsala Univ, Dept Informat Technol, Box 337, S-75105 Uppsala, Sweden
[2] Uppsala Univ, Dept Pharmaceut Biosci, Box 591, S-75124 Uppsala, Sweden
[3] Royal Inst Technol KTH, Dept Computat Sci & Technol, Lindstedtsvagen 5, S-10044 Stockholm, Sweden
来源
关键词
Virtual screening; Docking; Cloud computing; Apache Spark; MAPREDUCE;
D O I
10.1186/s13321-017-0204-4
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Background: Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. Results: We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against similar to 2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Conclusion: Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries.
引用
收藏
页数:6
相关论文
共 50 条
  • [11] Particle Swarm Optimization for Large-Scale Clustering on Apache Spark
    Sherar, Matthew
    Zulkernine, Farhana
    2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 801 - 808
  • [12] GeoMatch: Efficient Large-scale Map Matching on Apache Spark
    Zeidan, Ayman
    Lagerspetz, Eemil
    Zhao, Kai
    Nurmi, Petteri
    Tarkoma, Sasu
    Vo, Huy T.
    ACM/IMS Transactions on Data Science, 2020, 1 (03):
  • [13] A Parallel Fast Fourier Transform Algorithm for Large-Scale Signal Data Using Apache Spark in Cloud
    Yang, Cheng
    Bao, Weidong
    Zhu, Xiaomin
    Wang, Ji
    Xiao, Wenhua
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2018, PT III, 2018, 11336 : 293 - 310
  • [14] Large-scale digital forensic investigation for Windows registry on Apache Spark
    Lee, Jun-Ha
    Kwon, Hyuk-Yoon
    PLOS ONE, 2022, 17 (12):
  • [15] Building a Large-Scale Microscopic Road Network Traffic Simulator in Apache Spark
    Fu, Zishan
    Yu, Jia
    Sarwat, Mohamed
    2019 20TH INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (MDM 2019), 2019, : 320 - 328
  • [16] A Large-Scale Sentiment Data Classification for Online Reviews Under Apache Spark
    Al-Saqqa, Samar
    Al-Naymat, Ghazi
    Awajan, Arafat
    9TH INTERNATIONAL CONFERENCE ON EMERGING UBIQUITOUS SYSTEMS AND PERVASIVE NETWORKS (EUSPN-2018) / 8TH INTERNATIONAL CONFERENCE ON CURRENT AND FUTURE TRENDS OF INFORMATION AND COMMUNICATION TECHNOLOGIES IN HEALTHCARE (ICTH-2018), 2018, 141 : 183 - 189
  • [17] Virtual Slice Assignment in Large-Scale Cloud Interconnects
    Kim-Khoa Nguyen
    Cheriet, Mohamed
    Lemieux, Yves
    IEEE INTERNET COMPUTING, 2014, 18 (04) : 37 - 46
  • [18] A Strategy of Parallel SLIC Superpixels for Handling Large-Scale Images over Apache Spark
    Wang, Ning
    Chen, Fang
    Yu, Bo
    Wang, Lei
    REMOTE SENSING, 2022, 14 (07)
  • [19] Enhancing KBQA Performance in Large-Scale Chinese Knowledge Graphs Using Apache Spark
    Su, Yi-Jen
    Wu, Cheng-Wei
    Chen, Yi-Ju
    2024 6TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND THE INTERNET, ICCCI 2024, 2024, : 181 - 186
  • [20] Supervised Papers Classification on Large-Scale High-Dimensional Data with Apache Spark
    Akritidis, Leonidas
    Bozanis, Panayiotis
    Fevgas, Athanasios
    2018 16TH IEEE INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP, 16TH IEEE INT CONF ON PERVAS INTELLIGENCE AND COMP, 4TH IEEE INT CONF ON BIG DATA INTELLIGENCE AND COMP, 3RD IEEE CYBER SCI AND TECHNOL CONGRESS (DASC/PICOM/DATACOM/CYBERSCITECH), 2018, : 987 - 994