000 | 05833nam a2200757 i 4500 | ||
---|---|---|---|
001 | 7123244 | ||
003 | IEEE | ||
005 | 20200413152918.0 | ||
006 | m eo d | ||
007 | cr cn |||m|||a | ||
008 | 150620s2015 caua foab 000 0 eng d | ||
020 |
_a9781627057646 _qebook |
||
020 |
_z9781627057639 _qprint |
||
024 | 7 |
_a10.2200/S00647ED1V01Y201505CAC032 _2doi |
|
035 | _a(CaBNVSL)swl00405127 | ||
035 | _a(OCoLC)911246004 | ||
040 |
_aCaBNVSL _beng _erda _cCaBNVSL _dCaBNVSL |
||
050 | 4 |
_aQA76.9.A73 _bH847 2015 |
|
082 | 0 | 4 |
_a004.22 _223 |
100 | 1 |
_aHughes, Christopher J., _eauthor. |
|
245 | 1 | 0 |
_aSingle-instruction multiple-data execution / _cChristopher J. Hughes. |
264 | 1 |
_aSan Rafael, California (1537 Fourth Street, San Rafael, CA 94901 USA) : _bMorgan & Claypool, _c2015. |
|
300 |
_a1 PDF (xv, 105 pages) : _billustrations. |
||
336 |
_atext _2rdacontent |
||
337 |
_aelectronic _2isbdmedia |
||
338 |
_aonline resource _2rdacarrier |
||
490 | 1 |
_aSynthesis lectures on computer architecture, _x1935-3243 ; _v# 32 |
|
538 | _aMode of access: World Wide Web. | ||
538 | _aSystem requirements: Adobe Acrobat Reader. | ||
500 | _aPart of: Synthesis digital library of engineering and computer science. | ||
504 | _aIncludes bibliographical references (pages 95-103). | ||
505 | 0 | _a1. Data parallelism -- 1.1 Data parallelism -- 1.2 Data parallelism in applications -- 1.2.1 Physical simulation -- 1.2.2 Computer vision -- 1.2.3 Speech recognition -- 1.2.4 Database management systems -- 1.2.5 Financial analytics -- 1.2.6 Medical imaging -- | |
505 | 8 | _a2. Exploiting data parallelism with SIMD execution -- 2.1 Exploiting data parallelism -- 2.2 SIMD execution -- 2.3 SIMD performance and energy benefits -- 2.4 Limits to SIMD scaling -- 2.5 Programming and compilation -- 2.5.1 Programming for SIMD execution -- 2.5.2 Challenges of static analysis -- | |
505 | 8 | _a3. Computation and control flow -- 3.1 SIMD registers -- 3.2 SIMD computation -- 3.2.1 Basic arithmetic and logic -- 3.2.2 Data element size and overflow -- 3.2.3 Advanced arithmetic -- 3.3 Control flow -- 3.3.1 SIMD execution with control flow -- 3.3.2 Conditional SIMD execution -- 3.3.3 Efficiency implications of control divergence -- | |
505 | 8 | _a4. Memory operations -- 4.1 Contiguous patterns -- 4.1.1 Unaligned accesses -- 4.1.2 Throughput implications -- 4.2 Non-contiguous patterns -- 4.2.1 Programming model issues -- 4.2.2 Implementing gather and scatter instructions -- 4.2.3 Locality in gathers and scatters -- | |
505 | 8 | _a5. Horizontal operations -- 5.1 Limits to horizontal operations -- 5.2 Data movement -- 5.3 Reductions -- 5.4 Reducing control divergence -- 5.5 Potential dependences -- 5.5.1 Single-index case -- 5.5.2 Multi-index case -- | |
505 | 8 | _a6. Conclusions -- 6.1 Future directions -- Bibliography -- Author's biography. | |
506 | 1 | _aAbstract freely available; full-text restricted to subscribers or individual document purchasers. | |
510 | 0 | _aCompendex | |
510 | 0 | _aINSPEC | |
510 | 0 | _aGoogle scholar | |
510 | 0 | _aGoogle book search | |
520 | 3 | _aHaving hit power limitations to even more aggressive out-of-order execution in processor cores, many architects in the past decade have turned to single-instruction-multiple-data (SIMD) execution to increase single-threaded performance. SIMD execution, or having a single instruction drive execution of an identical operation on multiple data items, was already well established as a technique to efficiently exploit data parallelism. Furthermore, support for it was already included in many commodity processors. However, in the past decade, SIMD execution has seen a dramatic increase in the set of applications using it, which has motivated big improvements in hardware support in mainstream microprocessors. The easiest way to provide a big performance boost to SIMD hardware is to make it wider. i.e., increase the number of data items hardware operates on simultaneously. Indeed, microprocessor vendors have done this. However, as we exploit more data parallelism in applications, certain challenges can negatively impact performance. In particular, conditional execution, noncontiguous memory accesses, and the presence of some dependences across data items are key roadblocks to achieving peak performance with SIMD execution. This book first describes data parallelism, and why it is so common in popular applications. We then describe SIMD execution, and explain where its performance and energy benefits come from compared to other techniques to exploit parallelism. Finally, we describe SIMD hardware support in current commodity microprocessors. This includes both expected design tradeoffs, as well as unexpected ones, as we work to overcome challenges encountered when trying to map real software to SIMD execution. | |
530 | _aAlso available in print. | ||
588 | _aTitle from PDF title page (viewed on June 20, 2015). | ||
650 | 0 | _aSIMD (Computer architecture) | |
650 | 0 | _aParallel file systems (Computer science) | |
653 | _aSIMD | ||
653 | _avector processor | ||
653 | _adata parallelism | ||
653 | _aautovectorization | ||
653 | _acontrol divergence | ||
653 | _avector masks | ||
653 | _aunaligned accesses | ||
653 | _anon-contiguous accesses | ||
653 | _agather/scatter | ||
653 | _ahorizontal operations | ||
653 | _avector reductions | ||
653 | _ashuffle | ||
653 | _apermute | ||
653 | _aconflict detection | ||
776 | 0 | 8 |
_iPrint version: _z9781627057639 |
830 | 0 | _aSynthesis digital library of engineering and computer science. | |
830 | 0 |
_aSynthesis lectures in computer architecture ; _v# 32. _x1935-3243 |
|
856 | 4 | 2 |
_3Abstract with links to resource _uhttp://ieeexplore.ieee.org/servlet/opac?bknumber=7123244 |
999 |
_c562140 _d562140 |