000 05833nam a2200757 i 4500
001 7123244
003 IEEE
005 20200413152918.0
006 m eo d
007 cr cn |||m|||a
008 150620s2015 caua foab 000 0 eng d
020 _a9781627057646
_qebook
020 _z9781627057639
_qprint
024 7 _a10.2200/S00647ED1V01Y201505CAC032
_2doi
035 _a(CaBNVSL)swl00405127
035 _a(OCoLC)911246004
040 _aCaBNVSL
_beng
_erda
_cCaBNVSL
_dCaBNVSL
050 4 _aQA76.9.A73
_bH847 2015
082 0 4 _a004.22
_223
100 1 _aHughes, Christopher J.,
_eauthor.
245 1 0 _aSingle-instruction multiple-data execution /
_cChristopher J. Hughes.
264 1 _aSan Rafael, California (1537 Fourth Street, San Rafael, CA 94901 USA) :
_bMorgan & Claypool,
_c2015.
300 _a1 PDF (xv, 105 pages) :
_billustrations.
336 _atext
_2rdacontent
337 _aelectronic
_2isbdmedia
338 _aonline resource
_2rdacarrier
490 1 _aSynthesis lectures on computer architecture,
_x1935-3243 ;
_v# 32
538 _aMode of access: World Wide Web.
538 _aSystem requirements: Adobe Acrobat Reader.
500 _aPart of: Synthesis digital library of engineering and computer science.
504 _aIncludes bibliographical references (pages 95-103).
505 0 _a1. Data parallelism -- 1.1 Data parallelism -- 1.2 Data parallelism in applications -- 1.2.1 Physical simulation -- 1.2.2 Computer vision -- 1.2.3 Speech recognition -- 1.2.4 Database management systems -- 1.2.5 Financial analytics -- 1.2.6 Medical imaging --
505 8 _a2. Exploiting data parallelism with SIMD execution -- 2.1 Exploiting data parallelism -- 2.2 SIMD execution -- 2.3 SIMD performance and energy benefits -- 2.4 Limits to SIMD scaling -- 2.5 Programming and compilation -- 2.5.1 Programming for SIMD execution -- 2.5.2 Challenges of static analysis --
505 8 _a3. Computation and control flow -- 3.1 SIMD registers -- 3.2 SIMD computation -- 3.2.1 Basic arithmetic and logic -- 3.2.2 Data element size and overflow -- 3.2.3 Advanced arithmetic -- 3.3 Control flow -- 3.3.1 SIMD execution with control flow -- 3.3.2 Conditional SIMD execution -- 3.3.3 Efficiency implications of control divergence --
505 8 _a4. Memory operations -- 4.1 Contiguous patterns -- 4.1.1 Unaligned accesses -- 4.1.2 Throughput implications -- 4.2 Non-contiguous patterns -- 4.2.1 Programming model issues -- 4.2.2 Implementing gather and scatter instructions -- 4.2.3 Locality in gathers and scatters --
505 8 _a5. Horizontal operations -- 5.1 Limits to horizontal operations -- 5.2 Data movement -- 5.3 Reductions -- 5.4 Reducing control divergence -- 5.5 Potential dependences -- 5.5.1 Single-index case -- 5.5.2 Multi-index case --
505 8 _a6. Conclusions -- 6.1 Future directions -- Bibliography -- Author's biography.
506 1 _aAbstract freely available; full-text restricted to subscribers or individual document purchasers.
510 0 _aCompendex
510 0 _aINSPEC
510 0 _aGoogle scholar
510 0 _aGoogle book search
520 3 _aHaving hit power limitations to even more aggressive out-of-order execution in processor cores, many architects in the past decade have turned to single-instruction-multiple-data (SIMD) execution to increase single-threaded performance. SIMD execution, or having a single instruction drive execution of an identical operation on multiple data items, was already well established as a technique to efficiently exploit data parallelism. Furthermore, support for it was already included in many commodity processors. However, in the past decade, SIMD execution has seen a dramatic increase in the set of applications using it, which has motivated big improvements in hardware support in mainstream microprocessors. The easiest way to provide a big performance boost to SIMD hardware is to make it wider. i.e., increase the number of data items hardware operates on simultaneously. Indeed, microprocessor vendors have done this. However, as we exploit more data parallelism in applications, certain challenges can negatively impact performance. In particular, conditional execution, noncontiguous memory accesses, and the presence of some dependences across data items are key roadblocks to achieving peak performance with SIMD execution. This book first describes data parallelism, and why it is so common in popular applications. We then describe SIMD execution, and explain where its performance and energy benefits come from compared to other techniques to exploit parallelism. Finally, we describe SIMD hardware support in current commodity microprocessors. This includes both expected design tradeoffs, as well as unexpected ones, as we work to overcome challenges encountered when trying to map real software to SIMD execution.
530 _aAlso available in print.
588 _aTitle from PDF title page (viewed on June 20, 2015).
650 0 _aSIMD (Computer architecture)
650 0 _aParallel file systems (Computer science)
653 _aSIMD
653 _avector processor
653 _adata parallelism
653 _aautovectorization
653 _acontrol divergence
653 _avector masks
653 _aunaligned accesses
653 _anon-contiguous accesses
653 _agather/scatter
653 _ahorizontal operations
653 _avector reductions
653 _ashuffle
653 _apermute
653 _aconflict detection
776 0 8 _iPrint version:
_z9781627057639
830 0 _aSynthesis digital library of engineering and computer science.
830 0 _aSynthesis lectures in computer architecture ;
_v# 32.
_x1935-3243
856 4 2 _3Abstract with links to resource
_uhttp://ieeexplore.ieee.org/servlet/opac?bknumber=7123244
999 _c562140
_d562140