Large-scale virtual screening on public cloud resources with Apache Spark

被引：14

作者：

Capuccini, Marco ^{[1
,2
]}

Ahmed, Laeeq ^{[3
]}

Schaal, Wesley ^{[2
]}

Laure, Erwin ^{[3
]}

Spjuth, Ola ^{[2
]}

机构：

[1] Uppsala Univ, Dept Informat Technol, Box 337, S-75105 Uppsala, Sweden

[2] Uppsala Univ, Dept Pharmaceut Biosci, Box 591, S-75124 Uppsala, Sweden

[3] Royal Inst Technol KTH, Dept Computat Sci & Technol, Lindstedtsvagen 5, S-10044 Stockholm, Sweden

来源：

JOURNAL OF CHEMINFORMATICS | 2017年 / 9卷

关键词：

Virtual screening; Docking; Cloud computing; Apache Spark; MAPREDUCE;

D O I：

10.1186/s13321-017-0204-4

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Background: Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. Results: We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against similar to 2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Conclusion: Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries.

引用

页数：6

共 50 条

[21] Fuzzy Filtering in Large-Scale Prediction of Intrinsically Disordered Regions of Proteins on Apache Spark
Malysiak-Mrozek, Bozena
Bozek, Lukasz
Mrozek, Dariusz
2021 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC 2021), 2021, : 1020 - 1027
[22] Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark
Chaudhury, Mousumi
Karami, Amin
Ghazanfar, Mustansar Ali
ELECTRONICS, 2022, 11 (16)
[23] Outsourcing Large-Scale Quadratic Programming to a Public Cloud
Zhou, Lifeng
Li, Chunguang
IEEE ACCESS, 2015, 3 : 2581 - 2589
[24] Efficient iterative virtual screening with Apache Spark and conformal prediction
Ahmed, Laeeq
Georgiev, Valentin
Capuccini, Marco
Toor, Salman
Schaal, Wesley
Laure, Erwin
Spjuth, Ola
JOURNAL OF CHEMINFORMATICS, 2018, 10
[25] Efficient iterative virtual screening with Apache Spark and conformal prediction
Laeeq Ahmed
Valentin Georgiev
Marco Capuccini
Salman Toor
Wesley Schaal
Erwin Laure
Ola Spjuth
Journal of Cheminformatics, 10
[26] Optimal Virtual Machine Placement in Large-Scale Cloud Systems
Teyeb, Hana
Balma, Ali
Ben Hadj-Alouane, Nejib
Tata, Samir
2014 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2014, : 425 - 432
[27] Optimizing Apache Spark MLlib: Predictive Performance of Large-Scale Models for Big Data Analytics
Theodorakopoulos, Leonidas
Karras, Aristeidis
Krimpas, George A.
ALGORITHMS, 2025, 18 (02)
[28] Large-scale virtual screening for discovering leads in the postgenomic era
Waszkowycz, B
Perkins, TDJ
Sykes, RA
Li, J
IBM SYSTEMS JOURNAL, 2001, 40 (02) : 360 - 376
[29] Performance analysis and optimization of AMGA for the large-scale virtual screening
Ahn, Sunil
Kim, Namgyu
Lee, Seehoon
Nam, Dukyun
Hwang, Soonwook
Koblitz, Birger
Breton, Vincent
Han, Sangyong
SOFTWARE-PRACTICE & EXPERIENCE, 2009, 39 (12): : 1055 - 1072
[30] Efficient Large Scale NLP Feature Engineering with Apache Spark
Esmaeilzadeh, Armin
Heidari, Maryam
Abdolazimi, Reyhaneh
Hajibabaee, Parisa
Malekzadeh, Masoud
2022 IEEE 12TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2022, : 274 - 280

← 1 2 3 4 5 →