Reliability-aware resource management for computational grid/cluster environments

被引：0

作者：

Limaye, K ^{[1
]}

Leangsuksun, B ^{[1
]}

Liu, YD ^{[1
]}

Greenwood, Z ^{[1
]}

Scott, SL ^{[1
]}

Libby, R ^{[1
]}

Chanchio, K ^{[1
]}

机构：

[1] Louisiana Tech Univ, Ruston, LA 71270 USA

来源：

2005 6TH INTERNATIONAL WORKSHOP ON GRID COMPUTING (GRID) | 2005年

关键词：

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The collective resource utilization achieved through grid computing is critical to the overall computing capacity of the collaborative community and should be guaranteed. Especially, in an existing environment where job sites are Beowulf cluster systems, a service node failure may render the whole system outage. Current grid fault tolerance techniques only address these issues in an opportunistic fashion. Thus, there is a need for complementing these approaches by proactively handling failures at a job-site level, ensuring the system high availability with no loss of user submitted jobs. Our grid-aware cluster resource management effort was motivated by the fact that a cluster turns into a popular job site in the computational rid environment. We propose a solution dealing with fault tolerance at the service level complementing the task-based solutions as being done in some recent studies. We discuss various service availability: issues related to the grid, and preliminary results obtained while implementing the smart failover and transparent job-queue replication mechanism and the automated grid installation package. Our report entails the benefits outweighing acceptable overhead after implementing our proof-of-concept framework.

引用

页码：211 / 218

页数：8

共 50 条

[21] RATE: Reliability-Aware Task Service in Fog-Enabled IoV Environments
Tiwari, Minu
Maity, Ilora
Misra, Sudip
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2024, 10 (04) : 1525 - 1534
[22] Reliability-aware multi-objective approach for predictive asset management: A Danish distribution grid case study
Mirshekali, Hamid
Mortensen, Lasse Kappel
Shaker, Hamid Reza
APPLIED ENERGY, 2024, 358
[23] Reliability-aware performance model for optimal GPU-enabled cluster environment
Supada Laosooksathit
Raja Nassar
Chokchai Leangsuksun
Mihaela Paun
The Journal of Supercomputing, 2014, 68 : 1630 - 1651
[24] Reliability-Aware Ratioed Logic Operations for Energy-Efficient Computational ReRAM
Fernandez, Carlos
Vourkas, Ioannis
PROCEEDINGS OF THE 2022 IFIP/IEEE 30TH INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION (VLSI-SOC), 2022,
[25] Reliability-aware Fog Resource Provisioning for Deadline-driven IoT Services
Yao, Jingjing
Ansari, Nirwan
2018 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2018,
[26] Reliability-aware performance model for optimal GPU-enabled cluster environment
Laosooksathit, Supada
Nassar, Raja
Leangsuksun, Chokchai
Paun, Mihaela
JOURNAL OF SUPERCOMPUTING, 2014, 68 (03): : 1630 - 1651
[27] Latency and Reliability-Aware Task Offloading and Resource Allocation for Mobile Edge Computing
Liu, Chen-Feng
Bennis, Mehdi
Poor, H. Vincent
2017 IEEE GLOBECOM WORKSHOPS (GC WKSHPS), 2017,
[28] Reliability-Aware Design to Suppress Aging
Amrouch, Hussam
Khaleghi, Behnam
Gerstlauer, Andreas
Henkel, Joerg
2016 ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2016,
[29] Robust Reliability-aware Buffer Management for DTN Multicast in Disaster Scenarios
Begerow, Peggy
Krug, Silvia
Schellenberg, Sebastian
Seitz, Jochen
2015 7TH INTERNATIONAL WORKSHOP ON RELIABLE NETWORKS DESIGN AND MODELING (RNDM) PROCE4EDINGS, 2015, : 274 - 280
[30] Reliability-aware energy management for periodic real-time tasks
Zhu, Dakai
Aydin, Hakan
RTAS 2007: 13TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2007, : 225 - +

← 1 2 3 4 5 →