LU Decomposition on Cell Broadband Engine: An Empirical Study to Exploit Heterogeneous Chip Multiprocessors

被引:0
|
作者
Mao, Feng [1 ]
Shen, Xipeng [1 ]
机构
[1] Coll William & Mary, Dept Comp Sci, Williamsburg, VA 23185 USA
来源
关键词
Software cache; Heterogeneous architecture; LU decomposistion; CELL Broadband Engine;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
To meet the needs of high performance computing, the Cell Broadband Engine owns many features that differ from traditional processors, such as the large number of synergistic processor elements, large register files, the ability to hide main-storage latency with concurrent computation and DMA transfers. The exploitation of those features requires the programmer to carefully tailor programs and simutaneously deal with various performance factors, including locality, load balance, communication overhead, and multi-level parallelism. These factors, unfortunately, are dependent on each other; an optimization that enhances one factor may degrade another. This paper presents our experience on optimizing LU decomposition, one of the commonly used algebra kernels in scientific computing, on Cell Broadband Engine. The optimizations exploit task-level, data-level, and communication-level parallelism. We study the effects of different task distribution strategies, prefetch, and software cache, and explore the tradeoff among different performance factors, stressing the interactions between different optimizations. This work offers some insights in the optimizations on heterogenous multi-core processors, including the selection of programming models, considerations in task distribution, and the holistic perspective required in optimizations.
引用
收藏
页码:61 / 75
页数:15
相关论文
共 10 条
  • [1] Using advanced compiler technology to exploit the performance of the Cell Broadband Engine™ architecture
    Eichenberger, AE
    O'Brien, JK
    O'Brien, KM
    Wu, P
    Chen, T
    Oden, PH
    Prener, DA
    Shepherd, JC
    So, B
    Sura, Z
    Wang, A
    Zhang, T
    Zhao, P
    Gschwind, MK
    Archambault, R
    Gao, Y
    Koo, R
    IBM SYSTEMS JOURNAL, 2006, 45 (01) : 59 - 84
  • [2] A heterogeneous data parallel computational model for cell broadband engine
    Li, Bo
    Jin, Hai
    Zheng, Ran
    Zhang, Qin
    PROCEEDINGS OF THE THIRD CHINAGRID ANNUAL CONFERENCE, 2008, : 325 - 330
  • [3] The cell broadband engine: Exploiting multiple levels of parallelism in a chip multiprocessor
    Gschwind, Michael
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2007, 35 (03) : 233 - 262
  • [4] The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor
    Michael Gschwind
    International Journal of Parallel Programming, 2007, 35 : 233 - 262
  • [5] Cell-broadband-engine-based realtime wavelet decomposition for hdtv video images and beyond
    Asahara, Akihiro
    Doi, Munehiro
    Mori, Yumi
    Nishiyama, Hiroki
    Nakano, Hiroki
    2006 IEEE International Conference on Multimedia and Expo - ICME 2006, Vols 1-5, Proceedings, 2006, : 445 - 448
  • [6] Chip/Package design and technology trade-offs in the 65nm cell broadband engine
    Harvey, P.
    Zhou, Y.
    Yamada, G.
    Kawasaki, K.
    Questad, D.
    Lafontant, G.
    Mandrekar, R.
    Suminaga, S.
    Yamaji, Y.
    Noma, H.
    Nishio, T.
    Mori, H.
    Tamura, T.
    Yazawa, K.
    Takiguchi, I.
    Obde, T.
    White, R.
    Malhotra, A.
    Audet, J.
    Wakil, J.
    Sauter, W.
    Hosomi, E.
    57TH ELECTRONIC COMPONENTS & TECHNOLOGY CONFERENCE, 2007 PROCEEDINGS, 2007, : 27 - +
  • [7] A memory access technology of heterogeneous multi-core system based on cell broadband engine architecture
    Feng, Guofu
    Dong, Xiaoshe
    Ding, Yanfei
    Wang, Xuhao
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2009, 43 (02): : 1 - 5
  • [8] Evaluation and Decomposition of Tourism Inefficiency Considering Heterogeneous Technology: An Empirical Study from China
    Zha, Jianping
    He, Dongqin
    Zhu, Ying
    Yang, Xiaojie
    Luo, Mingzhi
    JOURNAL OF HOSPITALITY & TOURISM RESEARCH, 2022, 46 (02) : 370 - 399
  • [9] Transitive Closure on the Cell Broadband Engine: A study on Self-Scheduling in a Multicore Processor
    Vinjamuri, Sudhir
    Prasanna, Viktor K.
    2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 999 - 1009
  • [10] Study of Data Locality for Real-Time Biomedical Signal Processing of Streaming Data on Cell Broadband Engine
    Panday, Ashish
    Joshi, Bharat
    Ravindran, Arun
    Byun, Jongho
    Zaveri, Hitten
    IEEE SOUTHEASTCON 2010: ENERGIZING OUR FUTURE, 2010, : 123 - 126