Large-scale virtual screening on public cloud resources with Apache Spark

被引:14
|
作者
Capuccini, Marco [1 ,2 ]
Ahmed, Laeeq [3 ]
Schaal, Wesley [2 ]
Laure, Erwin [3 ]
Spjuth, Ola [2 ]
机构
[1] Uppsala Univ, Dept Informat Technol, Box 337, S-75105 Uppsala, Sweden
[2] Uppsala Univ, Dept Pharmaceut Biosci, Box 591, S-75124 Uppsala, Sweden
[3] Royal Inst Technol KTH, Dept Computat Sci & Technol, Lindstedtsvagen 5, S-10044 Stockholm, Sweden
来源
关键词
Virtual screening; Docking; Cloud computing; Apache Spark; MAPREDUCE;
D O I
10.1186/s13321-017-0204-4
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Background: Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. Results: We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against similar to 2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Conclusion: Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Fuzzy Filtering in Large-Scale Prediction of Intrinsically Disordered Regions of Proteins on Apache Spark
    Malysiak-Mrozek, Bozena
    Bozek, Lukasz
    Mrozek, Dariusz
    2021 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC 2021), 2021, : 1020 - 1027
  • [22] Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark
    Chaudhury, Mousumi
    Karami, Amin
    Ghazanfar, Mustansar Ali
    ELECTRONICS, 2022, 11 (16)
  • [23] Outsourcing Large-Scale Quadratic Programming to a Public Cloud
    Zhou, Lifeng
    Li, Chunguang
    IEEE ACCESS, 2015, 3 : 2581 - 2589
  • [24] Efficient iterative virtual screening with Apache Spark and conformal prediction
    Ahmed, Laeeq
    Georgiev, Valentin
    Capuccini, Marco
    Toor, Salman
    Schaal, Wesley
    Laure, Erwin
    Spjuth, Ola
    JOURNAL OF CHEMINFORMATICS, 2018, 10
  • [25] Efficient iterative virtual screening with Apache Spark and conformal prediction
    Laeeq Ahmed
    Valentin Georgiev
    Marco Capuccini
    Salman Toor
    Wesley Schaal
    Erwin Laure
    Ola Spjuth
    Journal of Cheminformatics, 10
  • [26] Optimal Virtual Machine Placement in Large-Scale Cloud Systems
    Teyeb, Hana
    Balma, Ali
    Ben Hadj-Alouane, Nejib
    Tata, Samir
    2014 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2014, : 425 - 432
  • [27] Optimizing Apache Spark MLlib: Predictive Performance of Large-Scale Models for Big Data Analytics
    Theodorakopoulos, Leonidas
    Karras, Aristeidis
    Krimpas, George A.
    ALGORITHMS, 2025, 18 (02)
  • [28] Large-scale virtual screening for discovering leads in the postgenomic era
    Waszkowycz, B
    Perkins, TDJ
    Sykes, RA
    Li, J
    IBM SYSTEMS JOURNAL, 2001, 40 (02) : 360 - 376
  • [29] Performance analysis and optimization of AMGA for the large-scale virtual screening
    Ahn, Sunil
    Kim, Namgyu
    Lee, Seehoon
    Nam, Dukyun
    Hwang, Soonwook
    Koblitz, Birger
    Breton, Vincent
    Han, Sangyong
    SOFTWARE-PRACTICE & EXPERIENCE, 2009, 39 (12): : 1055 - 1072
  • [30] Efficient Large Scale NLP Feature Engineering with Apache Spark
    Esmaeilzadeh, Armin
    Heidari, Maryam
    Abdolazimi, Reyhaneh
    Hajibabaee, Parisa
    Malekzadeh, Masoud
    2022 IEEE 12TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2022, : 274 - 280