Welcome to P K Kelkar Library, Online Public Access Catalogue (OPAC)

Normal view MARC view ISBD view

Fault tolerant computer architecture

By: Sorin, Daniel J.
Material type: materialTypeLabelBookSeries: Synthesis lectures on computer architecture: # 5.Publisher: San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) : Morgan & Claypool Publishers, c2009Description: 1 electronic text (xii, 103 p. : ill.) : digital file.ISBN: 9781598299540 (electronic bk.).Uniform titles: Synthesis digital library of engineering and computer science. Subject(s): Fault-tolerant computing | Self-stabilization (Computer science) | Computer architecture | Fault tolerance (or fault tolerant) | Reliability | Dependability | Computer architecture | Error detection | Error recovery | Fault diagnosis | Self-repair | Autonomous | Dynamic verificationDDC classification: 004.2 Online resources: Abstract with links to resource Also available in print.
Contents:
Introduction -- Goals of this book -- Faults, errors, and failures -- Masking -- Duration of faults and errors -- Underlying physical phenomena -- Trends leading to increased fault rates -- Smaller devices and hotter chips -- More devices per processor -- More complicated designs -- Error models -- Error type -- Error duration -- Number of simultaneous errors -- Fault tolerance metrics -- Availability -- Reliability -- Mean time to failure -- Mean time between failures -- Failures in time -- Architectural vulnerability factor -- The rest of this book -- References -- Error detection -- General concepts -- Physical redundancy -- Temporal redundancy -- Information redundancy -- The end-to-end argument -- Microprocessor cores -- Functional units -- Register files -- Tightly lockstepped redundant cores -- Redundant multithreading without lockstepping -- Dynamic verification of invariants -- High-level anomaly detection -- Using software to detect hardware errors -- Error detection tailored to specific fault models -- Caches and memory -- Error code implementation -- Beyond EDCs -- Detecting errors in content addressable memories -- Detecting errors in addressing -- Multiprocessor memory systems -- Dynamic verification of cache coherence -- Dynamic verification of memory consistency -- Interconnection networks -- Conclusions -- References -- Error recovery -- General concepts -- Forward error recovery -- Backward error recovery -- Comparing the performance of FER and BER -- Microprocessor cores -- FER for cores -- BER for cores -- Single-core memory systems -- FER for caches and memory -- BER for caches and memory -- Issues unique to multiprocessors -- What state to save for the recovery point -- Which algorithm to use for saving the recovery point -- Where to save the recovery point -- How to restore the recovery point state -- Software-implemented BER -- Conclusions -- References -- Diagnosis -- General concepts -- The benefits of diagnosis -- System model implications -- Built-in self-test -- Microprocessor core -- Using periodic BIST -- Diagnosing during normal execution -- Caches and memory -- Multiprocessors -- Conclusions -- References -- Self-repair -- General concepts -- Microprocessor cores -- Superscalar cores -- Simple cores -- Caches and memory -- Multiprocessors -- Core replacement -- Using the scheduler to hide faulty functional units -- Sharing resources across cores -- Self-repair of noncore components -- Conclusions -- References -- The future -- Adoption by industry -- Future relationships between fault tolerance and other fields -- Power and temperature -- Security -- Static design verification -- Fault vulnerability reduction -- Tolerating software bugs -- References.
Abstract: For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore's law into remarkable increases in performance. Recently, however, the bounty provided by Moore's law has been accompanied by several challenges that have arisen as devices have become smaller, including a decrease in dependability due to physical faults. In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it. The two main purposes of this book are to explore the key ideas in fault-tolerant computer architecture and to present the current state-of-the-art--over approximately the past 10 years--in academia and industry.
    average rating: 0.0 (0 votes)
Item type Current location Call number Status Date due Barcode Item holds
E books E books PK Kelkar Library, IIT Kanpur
Available EBKE185
Total holds: 0

Mode of access: World Wide Web.

System requirements: Adobe Acrobat reader.

Part of: Synthesis digital library of engineering and computer science.

Series from website.

Includes bibliographical references.

Introduction -- Goals of this book -- Faults, errors, and failures -- Masking -- Duration of faults and errors -- Underlying physical phenomena -- Trends leading to increased fault rates -- Smaller devices and hotter chips -- More devices per processor -- More complicated designs -- Error models -- Error type -- Error duration -- Number of simultaneous errors -- Fault tolerance metrics -- Availability -- Reliability -- Mean time to failure -- Mean time between failures -- Failures in time -- Architectural vulnerability factor -- The rest of this book -- References -- Error detection -- General concepts -- Physical redundancy -- Temporal redundancy -- Information redundancy -- The end-to-end argument -- Microprocessor cores -- Functional units -- Register files -- Tightly lockstepped redundant cores -- Redundant multithreading without lockstepping -- Dynamic verification of invariants -- High-level anomaly detection -- Using software to detect hardware errors -- Error detection tailored to specific fault models -- Caches and memory -- Error code implementation -- Beyond EDCs -- Detecting errors in content addressable memories -- Detecting errors in addressing -- Multiprocessor memory systems -- Dynamic verification of cache coherence -- Dynamic verification of memory consistency -- Interconnection networks -- Conclusions -- References -- Error recovery -- General concepts -- Forward error recovery -- Backward error recovery -- Comparing the performance of FER and BER -- Microprocessor cores -- FER for cores -- BER for cores -- Single-core memory systems -- FER for caches and memory -- BER for caches and memory -- Issues unique to multiprocessors -- What state to save for the recovery point -- Which algorithm to use for saving the recovery point -- Where to save the recovery point -- How to restore the recovery point state -- Software-implemented BER -- Conclusions -- References -- Diagnosis -- General concepts -- The benefits of diagnosis -- System model implications -- Built-in self-test -- Microprocessor core -- Using periodic BIST -- Diagnosing during normal execution -- Caches and memory -- Multiprocessors -- Conclusions -- References -- Self-repair -- General concepts -- Microprocessor cores -- Superscalar cores -- Simple cores -- Caches and memory -- Multiprocessors -- Core replacement -- Using the scheduler to hide faulty functional units -- Sharing resources across cores -- Self-repair of noncore components -- Conclusions -- References -- The future -- Adoption by industry -- Future relationships between fault tolerance and other fields -- Power and temperature -- Security -- Static design verification -- Fault vulnerability reduction -- Tolerating software bugs -- References.

Abstract freely available; full-text restricted to subscribers or individual document purchasers.

Compendex

INSPEC

Google scholar

Google book search

For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore's law into remarkable increases in performance. Recently, however, the bounty provided by Moore's law has been accompanied by several challenges that have arisen as devices have become smaller, including a decrease in dependability due to physical faults. In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it. The two main purposes of this book are to explore the key ideas in fault-tolerant computer architecture and to present the current state-of-the-art--over approximately the past 10 years--in academia and industry.

Also available in print.

Title from PDF t.p. (viewed on June 4, 2009).

There are no comments for this item.

Log in to your account to post a comment.

Powered by Koha