An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs

被引:0
|
作者
Xu, Shixiong [1 ]
Gregg, David [2 ]
机构
[1] Univ Dublin, Trinity Coll Dublin, Sch Comp Sci & Stat, Software Tools Grp, Dublin, Ireland
[2] Lero Irish Software Engn Res Ctr, Copenhagen, Denmark
关键词
D O I
10.1109/PACT.2015.56
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
引用
收藏
页码:488 / 489
页数:2
相关论文
共 50 条
  • [1] CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications
    Yi Yang
    Chao Li
    Huiyang Zhou
    Journal of Computer Science and Technology, 2015, 30 : 3 - 19
  • [2] CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications
    Yang, Yi
    Zhou, Huiyang
    ACM SIGPLAN NOTICES, 2014, 49 (08) : 93 - 105
  • [3] CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications
    Yang, Yi
    Li, Chao
    Zhou, Huiyang
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2015, 30 (01) : 3 - 19
  • [4] Enabling Coordinated Register Allocation and Thread-level Parallelism Optimization for GPUs
    Xie, Xiaolong
    Liang, Yun
    Li, Xiuhong
    Wu, Yudong
    Sun, Guangyu
    Wang, Tao
    Fan, Dongrui
    PROCEEDINGS OF THE 48TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO-48), 2015, : 395 - 406
  • [5] CRAT: Enabling Coordinated Register Allocation and Thread-Level Parallelism Optimization for GPUs
    Xie, Xiaolong
    Liang, Yun
    Li, Xiuhong
    Wu, Yudong
    Sun, Guangyu
    Wang, Tao
    Fan, Dongrui
    IEEE TRANSACTIONS ON COMPUTERS, 2018, 67 (06) : 890 - 897
  • [6] Exploitation of Nested Thread-Level Speculative Parallelism on Multi-Core Systems
    Kejariwal, Arun
    Girkar, Milind
    Tian, Xinmin
    Saito, Hideki
    Nicolau, Alexandru
    Veidenbaum, Alexander V.
    Banerjee, Utpal
    Polychronopoulos, Constantine D.
    PROCEEDINGS OF THE 2010 COMPUTING FRONTIERS CONFERENCE (CF 2010), 2010, : 99 - 100
  • [7] Improving Thread-level Parallelism in GPUs Through Expanding Register File to Scratchpad Memory
    Yu, Chao
    Bai, Yuebin
    Sun, Qingxiao
    Yang, Hailong
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2019, 15 (04)
  • [8] Poise: Balancing Thread-Level Parallelism and Memory System Performance in GPUs using Machine Learning
    Dublish, Saumay
    Nagarajan, Vijay
    Topham, Nigel
    2019 25TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2019, : 492 - 505
  • [9] Evolution of Thread-Level Parallelism in Desktop Applications
    Blake, Geoffrey
    Dreslinski, Ronald G.
    Mudge, Trevor
    Flautner, Krisztian
    ISCA 2010: THE 37TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, 2010, : 302 - 313
  • [10] Thread partitioning and value prediction for exploiting speculative thread-level parallelism
    Marcuello, P
    González, A
    Tubella, J
    IEEE TRANSACTIONS ON COMPUTERS, 2004, 53 (02) : 114 - 125