MARC View

000			05833nam a2200757 i 4500
001			7123244
003			IEEE
005			20200413152918.0
006			m eo d
007			cr cn \|\|\|m\|\|\|a
008			150620s2015 caua foab 000 0 eng d
020			_a9781627057646 _qebook
020			_z9781627057639 _qprint
024	7		_a10.2200/S00647ED1V01Y201505CAC032 _2doi
035			_a(CaBNVSL)swl00405127
035			_a(OCoLC)911246004
040			_aCaBNVSL _beng _erda _cCaBNVSL _dCaBNVSL
050		4	_aQA76.9.A73 _bH847 2015
082	0	4	_a004.22 _223
100	1		_aHughes, Christopher J., _eauthor.
245	1	0	_aSingle-instruction multiple-data execution / _cChristopher J. Hughes.
264		1	_aSan Rafael, California (1537 Fourth Street, San Rafael, CA 94901 USA) : _bMorgan & Claypool, _c2015.
300			_a1 PDF (xv, 105 pages) : _billustrations.
336			_atext _2rdacontent
337			_aelectronic _2isbdmedia
338			_aonline resource _2rdacarrier
490	1		_aSynthesis lectures on computer architecture, _x1935-3243 ; _v# 32
538			_aMode of access: World Wide Web.
538			_aSystem requirements: Adobe Acrobat Reader.
500			_aPart of: Synthesis digital library of engineering and computer science.
504			_aIncludes bibliographical references (pages 95-103).
505	0		_a1. Data parallelism -- 1.1 Data parallelism -- 1.2 Data parallelism in applications -- 1.2.1 Physical simulation -- 1.2.2 Computer vision -- 1.2.3 Speech recognition -- 1.2.4 Database management systems -- 1.2.5 Financial analytics -- 1.2.6 Medical imaging --
505	8		_a2. Exploiting data parallelism with SIMD execution -- 2.1 Exploiting data parallelism -- 2.2 SIMD execution -- 2.3 SIMD performance and energy benefits -- 2.4 Limits to SIMD scaling -- 2.5 Programming and compilation -- 2.5.1 Programming for SIMD execution -- 2.5.2 Challenges of static analysis --
505	8		_a3. Computation and control flow -- 3.1 SIMD registers -- 3.2 SIMD computation -- 3.2.1 Basic arithmetic and logic -- 3.2.2 Data element size and overflow -- 3.2.3 Advanced arithmetic -- 3.3 Control flow -- 3.3.1 SIMD execution with control flow -- 3.3.2 Conditional SIMD execution -- 3.3.3 Efficiency implications of control divergence --
505	8		_a4. Memory operations -- 4.1 Contiguous patterns -- 4.1.1 Unaligned accesses -- 4.1.2 Throughput implications -- 4.2 Non-contiguous patterns -- 4.2.1 Programming model issues -- 4.2.2 Implementing gather and scatter instructions -- 4.2.3 Locality in gathers and scatters --
505	8		_a5. Horizontal operations -- 5.1 Limits to horizontal operations -- 5.2 Data movement -- 5.3 Reductions -- 5.4 Reducing control divergence -- 5.5 Potential dependences -- 5.5.1 Single-index case -- 5.5.2 Multi-index case --
505	8		_a6. Conclusions -- 6.1 Future directions -- Bibliography -- Author's biography.
506	1		_aAbstract freely available; full-text restricted to subscribers or individual document purchasers.
510	0		_aCompendex
510	0		_aINSPEC
510	0		_aGoogle scholar
510	0		_aGoogle book search
520	3		_aHaving hit power limitations to even more aggressive out-of-order execution in processor cores, many architects in the past decade have turned to single-instruction-multiple-data (SIMD) execution to increase single-threaded performance. SIMD execution, or having a single instruction drive execution of an identical operation on multiple data items, was already well established as a technique to efficiently exploit data parallelism. Furthermore, support for it was already included in many commodity processors. However, in the past decade, SIMD execution has seen a dramatic increase in the set of applications using it, which has motivated big improvements in hardware support in mainstream microprocessors. The easiest way to provide a big performance boost to SIMD hardware is to make it wider. i.e., increase the number of data items hardware operates on simultaneously. Indeed, microprocessor vendors have done this. However, as we exploit more data parallelism in applications, certain challenges can negatively impact performance. In particular, conditional execution, noncontiguous memory accesses, and the presence of some dependences across data items are key roadblocks to achieving peak performance with SIMD execution. This book first describes data parallelism, and why it is so common in popular applications. We then describe SIMD execution, and explain where its performance and energy benefits come from compared to other techniques to exploit parallelism. Finally, we describe SIMD hardware support in current commodity microprocessors. This includes both expected design tradeoffs, as well as unexpected ones, as we work to overcome challenges encountered when trying to map real software to SIMD execution.
530			_aAlso available in print.
588			_aTitle from PDF title page (viewed on June 20, 2015).
650		0	_aSIMD (Computer architecture)
650		0	_aParallel file systems (Computer science)
653			_aSIMD
653			_avector processor
653			_adata parallelism
653			_aautovectorization
653			_acontrol divergence
653			_avector masks
653			_aunaligned accesses
653			_anon-contiguous accesses
653			_agather/scatter
653			_ahorizontal operations
653			_avector reductions
653			_ashuffle
653			_apermute
653			_aconflict detection
776	0	8	_iPrint version: _z9781627057639
830		0	_aSynthesis digital library of engineering and computer science.
830		0	_aSynthesis lectures in computer architecture ; _v# 32. _x1935-3243
856	4	2	_3Abstract with links to resource _uhttp://ieeexplore.ieee.org/servlet/opac?bknumber=7123244
999			_c562140 _d562140