000 06649nam a22006491i 4500
001 8363085
003 IEEE
005 20200413152930.0
006 m eo d
007 cr cn |||m|||a
008 180525s2018 caua foab 000 0 eng d
020 _a9781627056182
_qebook
020 _z9781627059237
_qpaperback
020 _z9781681733586
_qhardcover
024 7 _a10.2200/S00848ED1V01Y201804CAC044
_2doi
035 _a(CaBNVSL)swl408345
035 _a(OCoLC)1037800609
040 _aCaBNVSL
_beng
_erda
_cCaBNVSL
_dCaBNVSL
050 4 _aT385
_b.A243 2018
082 0 4 _a006.6869
_223
100 1 _aAamodt, Tor M.,
_eauthor.
245 1 0 _aGeneral-purpose graphics processor architectures /
_cTor M. Aamodt, Wilson Wai Lun Fung, Timothy G. Rogers.
264 1 _a[San Rafael, California] :
_bMorgan & Claypool,
_c2018.
300 _a1 PDF (xvii, 122 pages) :
_billustrations.
336 _atext
_2rdacontent
337 _aelectronic
_2isbdmedia
338 _aonline resource
_2rdacarrier
490 1 _aSynthesis lectures on computer architecture,
_x1935-3243 ;
_v# 44
538 _aMode of access: World Wide Web.
500 _aPart of: Synthesis digital library of engineering and computer science.
504 _aIncludes bibliographical references (pages 103-119).
505 0 _a1. Introduction -- 1.1 The landscape of computation accelerators -- 1.2 GPU hardware basics -- 1.3 A brief history of GPUs -- 1.4 Book outline --
505 8 _a2. Programming model -- 2.1 Execution model -- 2.2 GPU instruction set architectures -- 2.2.1 NVIDIA GPU instruction set architectures -- 2.2.2 AMD graphics core next instruction set architecture --
505 8 _a3. The SIMT core: instruction and register data flow -- 3.1 One-loop approximation -- 3.1.1 SIMT execution masking -- 3.1.2 SIMT deadlock and stackless SIMT architectures -- 3.1.3 Warp scheduling -- 3.2 Two-loop approximation -- 3.3 Three-loop approximation -- 3.3.1 Operand collector -- 3.3.2 Instruction replay: handling structural hazards -- 3.4 Research directions on branch divergence -- 3.4.1 Warp compaction -- 3.4.2 Intra-warp divergent path management -- 3.4.3 Adding MIMD capability -- 3.4.4 Complexity-effective divergence management -- 3.5 Research directions on scalarization and affine execution -- 3.5.1 Detection of uniform or affine variables -- 3.5.2 Exploiting uniform or affine variables in GPU -- 3.6 Research directions on register file architecture -- 3.6.1 Hierarchical register file -- 3.6.2 Drowsy state register file -- 3.6.3 Register file virtualization -- 3.6.4 Partitioned register file -- 3.6.5 RegLess --
505 8 _a4. Memory system -- 4.1 First-level memory structures -- 4.1.1 Scratchpad memory and L1 data cache -- 4.1.2 L1 texture cache -- 4.1.3 Unified texture and data cache -- 4.2 On-chip interconnection network -- 4.3 Memory partition unit -- 4.3.1 L2 cache -- 4.3.2 Atomic operations -- 4.3.3 Memory access scheduler -- 4.4 Research directions for GPU memory systems -- 4.4.1 Memory access scheduling and interconnection network design -- 4.4.2 Caching effectiveness -- 4.4.3 Memory request prioritization and cache bypassing -- 4.4.4 Exploiting inter-warp heterogeneity -- 4.4.5 Coordinated cache bypassing -- 4.4.6 Adaptive cache management -- 4.4.7 Cache prioritization -- 4.4.8 Virtual memory page placement -- 4.4.9 Data placement -- 4.4.10 Multi-chip-module GPUs --
505 8 _a5. Crosscutting research on GPU computing architectures -- 5.1 Thread scheduling -- 5.1.1 Research on assignment of threadblocks to cores -- 5.1.2 Research on cycle-by-cycle scheduling decisions -- 5.1.3 Research on scheduling multiple kernels -- 5.1.4 Fine-grain synchronization aware scheduling -- 5.2 Alternative ways of expressing parallelism -- 5.3 Support for transactional memory -- 5.3.1 Kilo TM -- 5.3.2 Warp TM and temporal conflict detection -- 5.4 Heterogeneous systems --
505 8 _aBibliography -- Authors' biographies.
506 _aAbstract freely available; full-text restricted to subscribers or individual document purchasers.
510 0 _aCompendex
510 0 _aINSPEC
510 0 _aGoogle scholar
510 0 _aGoogle book search
520 3 _aOriginally developed to support video games, graphics processor units (GPUs) are now increasingly used for general-purpose (non-graphics) applications ranging from machine learning to mining of cryptographic currencies. GPUs can achieve improved performance and efficiency versus central processing units (CPUs) by dedicating a larger fraction of hardware resources to computation. In addition, their general-purpose programmability makes contemporary GPUs appealing to software developers in comparison to domain-specific accelerators. This book provides an introduction to those interested in studying the architecture of GPUs that support general-purpose computing. It collects together information currently only found among a wide range of disparate sources. The authors led development of the GPGPU-Sim simulator widely used in academic research on GPU architectures. The first chapter of this book describes the basic hardware structure of GPUs and provides a brief overview of their history. Chapter 2 provides a summary of GPU programming models relevant to the rest of the book. Chapter 3 explores the architecture of GPU compute cores. Chapter 4 explores the architecture of the GPU memory system. After describing the architecture of existing systems, Chapters 3 and 4 provide an overview of related research. Chapter 5 summarizes cross-cutting research impacting both the compute core and memory system. This book should provide a valuable resource for those wishing to understand the architecture of graphics processor units (GPUs) used for acceleration of general-purpose applications and to those who want to obtain an introduction to the rapidly growing body of research exploring how to improve the architecture of these GPUs.
530 _aAlso available in print.
588 _aTitle from PDF title page (viewed on May 25, 2018).
650 0 _aGraphics processing units.
650 0 _aComputer architecture.
653 _aGPGPU
653 _aComputer architecture
655 0 _aElectronic books.
700 1 _aFung, Wilson Wai Lun,
_eauthor.
700 1 _aRogers, Timothy G.,
_eauthor.
776 0 8 _iPrint version:
_z9781627059237
_z9781681733586
830 0 _aSynthesis digital library of engineering and computer science.
830 0 _aSynthesis lectures in computer architecture ;
_v# 44.
_x1935-3243
856 4 2 _3Abstract with links to resource
_uhttps://ieeexplore.ieee.org/servlet/opac?bknumber=8363085
999 _c562377
_d562377