Transitive Closure on the Cell Broadband Engine: A study on Self-Scheduling in a Multicore Processor

被引:0
|
作者
Vinjamuri, Sudhir [1 ]
Prasanna, Viktor K. [1 ]
机构
[1] Univ So Calif, Dept Elect Engn, Los Angeles, CA 90007 USA
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper we present a mapping methodology and optimizations for solving transitive closure on the Cell multicore processor Using our approach, it is possible to achieve near peak performance for transitive closure on the Cell processor We first parallelize the standard Floyd Warshall algorithm and show through analysis and experimental results that data communication is a bottleneck for performance and scalability. We parallelize a cache optimized version of Floyd Warshall algorithm to remove the memory bottleneck. As is the case with several scientific computing and industrial applications on a multicore processor, synchronization and scheduling of the cores plays a crucial role in determining the performance of this algorithm. We define a self-scheduling mechanism for the cores of a multicore processor and design a self-scheduler for Blocked Floyd Warshall algorithm on the Cell multicore processor to remove the scheduling bottleneck. We also present optimizations in scheduling order to remove synchronization points. Our implementations achieved up to 78GFLOPS.
引用
收藏
页码:999 / 1009
页数:11
相关论文
共 50 条
  • [31] Performance of Static and Dynamic Task Scheduling for Real-Time Engine Control System on Embedded Multicore Processor
    Oki, Yoshitake
    Mikami, Hiroki
    Nishida, Hikaru
    Umeda, Dan
    Kimura, Keiji
    Kasahara, Hironori
    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, LCPC 2019, 2021, 11998 : 1 - 14
  • [32] Using hybrid MPI and OpenMP programming to optimize communications in parallel loop self-scheduling schemes for multicore PC clusters
    Chao-Chin Wu
    Lien-Fu Lai
    Chao-Tung Yang
    Po-Hsun Chiu
    The Journal of Supercomputing, 2012, 60 : 31 - 61
  • [33] Performance-based parallel loop self-scheduling using hybrid OpenMP and MPI programming on multicore SMP clusters
    Yang, Chao-Tung
    Wu, Chao-Chin
    Chang, Jen-Hsiang
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2011, 23 (08): : 721 - 744
  • [34] Using hybrid MPI and OpenMP programming to optimize communications in parallel loop self-scheduling schemes for multicore PC clusters
    Wu, Chao-Chin
    Lai, Lien-Fu
    Yang, Chao-Tung
    Chiu, Po-Hsun
    JOURNAL OF SUPERCOMPUTING, 2012, 60 (01): : 31 - 61
  • [35] Implementation of a cone-beam backprojection algorithm on the Cell Broadband Engine processor
    Bockenbach, Olivier
    Knaup, Michael
    Kachelriess, Marc
    MEDICAL IMAGING 2007: PHYSICS OF MEDICAL IMAGING, PTS 1-3, 2007, 6510
  • [36] Circuit design techniques for a first-generation Cell Broadband Engine processor
    Warnock, James
    Wendel, Dieter
    Aipperspach, Tony
    Behnen, Erwin
    Cordes, Robert A.
    Dhong, Sang H.
    Hirairi, Koji
    Murakami, Hiroaki
    Onishi, Shohji
    Pham, Dac C.
    Pille, Jurgen
    Posluszny, Stephen D.
    Takahashi, Osamu
    Wen, Huajun
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2006, 41 (08) : 1692 - 1706
  • [37] Cell Broadband Engine processor performance optimization: Tracing tools implementation and use
    Biberstein, M.
    Dori-Hacohen, S.
    Harel, Y.
    Heilper, A.
    Mendelson, B.
    Shvadron, U.
    Treister, E.
    Turek, J.
    Chang, M. S.
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2009, 53 (05)
  • [38] Acceleration of Finite Difference Time Domain Method using Cell Broadband Engine Processor
    Watanabe, Shinya
    Hashimoto, Osamu
    2010 ASIA-PACIFIC MICROWAVE CONFERENCE, 2010, : 2161 - 2163
  • [39] Accelerating mutual-information-based linear registration on the cell broadband engine processor
    Ohara, Moriyoshi
    Yeo, Hangu
    Savino, Frank
    Iyengar, Giridharan
    Gong, Leiguang
    Inoue, Hiroshi
    Komatsu, Hideaki
    Sheinin, Vadim
    Daijavad, Shahrokh
    2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 272 - +
  • [40] Accelerating 3D nonrigid registration using the Cell Broadband Engine processor
    Rohrer, J.
    Gong, L.
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2009, 53 (05)