MARC View

000			06649nam a22006491i 4500
001			8363085
003			IEEE
005			20200413152930.0
006			m eo d
007			cr cn \|\|\|m\|\|\|a
008			180525s2018 caua foab 000 0 eng d
020			_a9781627056182 _qebook
020			_z9781627059237 _qpaperback
020			_z9781681733586 _qhardcover
024	7		_a10.2200/S00848ED1V01Y201804CAC044 _2doi
035			_a(CaBNVSL)swl408345
035			_a(OCoLC)1037800609
040			_aCaBNVSL _beng _erda _cCaBNVSL _dCaBNVSL
050		4	_aT385 _b.A243 2018
082	0	4	_a006.6869 _223
100	1		_aAamodt, Tor M., _eauthor.
245	1	0	_aGeneral-purpose graphics processor architectures / _cTor M. Aamodt, Wilson Wai Lun Fung, Timothy G. Rogers.
264		1	_a[San Rafael, California] : _bMorgan & Claypool, _c2018.
300			_a1 PDF (xvii, 122 pages) : _billustrations.
336			_atext _2rdacontent
337			_aelectronic _2isbdmedia
338			_aonline resource _2rdacarrier
490	1		_aSynthesis lectures on computer architecture, _x1935-3243 ; _v# 44
538			_aMode of access: World Wide Web.
500			_aPart of: Synthesis digital library of engineering and computer science.
504			_aIncludes bibliographical references (pages 103-119).
505	0		_a1. Introduction -- 1.1 The landscape of computation accelerators -- 1.2 GPU hardware basics -- 1.3 A brief history of GPUs -- 1.4 Book outline --
505	8		_a2. Programming model -- 2.1 Execution model -- 2.2 GPU instruction set architectures -- 2.2.1 NVIDIA GPU instruction set architectures -- 2.2.2 AMD graphics core next instruction set architecture --
505	8		_a3. The SIMT core: instruction and register data flow -- 3.1 One-loop approximation -- 3.1.1 SIMT execution masking -- 3.1.2 SIMT deadlock and stackless SIMT architectures -- 3.1.3 Warp scheduling -- 3.2 Two-loop approximation -- 3.3 Three-loop approximation -- 3.3.1 Operand collector -- 3.3.2 Instruction replay: handling structural hazards -- 3.4 Research directions on branch divergence -- 3.4.1 Warp compaction -- 3.4.2 Intra-warp divergent path management -- 3.4.3 Adding MIMD capability -- 3.4.4 Complexity-effective divergence management -- 3.5 Research directions on scalarization and affine execution -- 3.5.1 Detection of uniform or affine variables -- 3.5.2 Exploiting uniform or affine variables in GPU -- 3.6 Research directions on register file architecture -- 3.6.1 Hierarchical register file -- 3.6.2 Drowsy state register file -- 3.6.3 Register file virtualization -- 3.6.4 Partitioned register file -- 3.6.5 RegLess --
505	8		_a4. Memory system -- 4.1 First-level memory structures -- 4.1.1 Scratchpad memory and L1 data cache -- 4.1.2 L1 texture cache -- 4.1.3 Unified texture and data cache -- 4.2 On-chip interconnection network -- 4.3 Memory partition unit -- 4.3.1 L2 cache -- 4.3.2 Atomic operations -- 4.3.3 Memory access scheduler -- 4.4 Research directions for GPU memory systems -- 4.4.1 Memory access scheduling and interconnection network design -- 4.4.2 Caching effectiveness -- 4.4.3 Memory request prioritization and cache bypassing -- 4.4.4 Exploiting inter-warp heterogeneity -- 4.4.5 Coordinated cache bypassing -- 4.4.6 Adaptive cache management -- 4.4.7 Cache prioritization -- 4.4.8 Virtual memory page placement -- 4.4.9 Data placement -- 4.4.10 Multi-chip-module GPUs --
505	8		_a5. Crosscutting research on GPU computing architectures -- 5.1 Thread scheduling -- 5.1.1 Research on assignment of threadblocks to cores -- 5.1.2 Research on cycle-by-cycle scheduling decisions -- 5.1.3 Research on scheduling multiple kernels -- 5.1.4 Fine-grain synchronization aware scheduling -- 5.2 Alternative ways of expressing parallelism -- 5.3 Support for transactional memory -- 5.3.1 Kilo TM -- 5.3.2 Warp TM and temporal conflict detection -- 5.4 Heterogeneous systems --
505	8		_aBibliography -- Authors' biographies.
506			_aAbstract freely available; full-text restricted to subscribers or individual document purchasers.
510	0		_aCompendex
510	0		_aINSPEC
510	0		_aGoogle scholar
510	0		_aGoogle book search
520	3		_aOriginally developed to support video games, graphics processor units (GPUs) are now increasingly used for general-purpose (non-graphics) applications ranging from machine learning to mining of cryptographic currencies. GPUs can achieve improved performance and efficiency versus central processing units (CPUs) by dedicating a larger fraction of hardware resources to computation. In addition, their general-purpose programmability makes contemporary GPUs appealing to software developers in comparison to domain-specific accelerators. This book provides an introduction to those interested in studying the architecture of GPUs that support general-purpose computing. It collects together information currently only found among a wide range of disparate sources. The authors led development of the GPGPU-Sim simulator widely used in academic research on GPU architectures. The first chapter of this book describes the basic hardware structure of GPUs and provides a brief overview of their history. Chapter 2 provides a summary of GPU programming models relevant to the rest of the book. Chapter 3 explores the architecture of GPU compute cores. Chapter 4 explores the architecture of the GPU memory system. After describing the architecture of existing systems, Chapters 3 and 4 provide an overview of related research. Chapter 5 summarizes cross-cutting research impacting both the compute core and memory system. This book should provide a valuable resource for those wishing to understand the architecture of graphics processor units (GPUs) used for acceleration of general-purpose applications and to those who want to obtain an introduction to the rapidly growing body of research exploring how to improve the architecture of these GPUs.
530			_aAlso available in print.
588			_aTitle from PDF title page (viewed on May 25, 2018).
650		0	_aGraphics processing units.
650		0	_aComputer architecture.
653			_aGPGPU
653			_aComputer architecture
655		0	_aElectronic books.
700	1		_aFung, Wilson Wai Lun, _eauthor.
700	1		_aRogers, Timothy G., _eauthor.
776	0	8	_iPrint version: _z9781627059237 _z9781681733586
830		0	_aSynthesis digital library of engineering and computer science.
830		0	_aSynthesis lectures in computer architecture ; _v# 44. _x1935-3243
856	4	2	_3Abstract with links to resource _uhttps://ieeexplore.ieee.org/servlet/opac?bknumber=8363085
999			_c562377 _d562377