000 06362nam a2200673 i 4500
001 6812612
003 IEEE
005 20200413152908.0
006 m eo d
007 cr cn |||m|||a
008 130118s2013 caua foab 000 0 eng d
020 _a9781608454747 (electronic bk.)
020 _z9781608454730 (pbk.)
024 7 _a10.2200/S00462ED1V01Y201212SAP010
_2doi
035 _a(CaBNVSL)swl00402005
035 _a(OCoLC)824619566
040 _aCaBNVSL
_cCaBNVSL
_dCaBNVSL
050 4 _aTK7882.S65
_bH677 2013
082 0 4 _a006.454
_223
100 1 _aHori, Takaaki.
245 1 0 _aSpeech recognition algorithms using weighted finite-state transducers
_h[electronic resource] /
_cTakaaki Hori and Atsushi Nakamura.
260 _aSan Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) :
_bMorgan & Claypool,
_cc2013.
300 _a1 electronic text (xii, 150 p.) :
_bill., digital file.
490 1 _aSynthesis lectures on speech and audio processing,
_x1932-1678 ;
_v# 10
538 _aMode of access: World Wide Web.
538 _aSystem requirements: Adobe Acrobat Reader.
500 _aPart of: Synthesis digital library of engineering and computer science.
500 _aSeries from website.
504 _aIncludes bibliographical references (p. 137-148).
505 0 _aPreface -- 1. Introduction -- 1.1 Speech recognition and computation -- 1.2 Why WFST? -- 1.3 Purpose of this book -- 1.4 Book organization --
505 8 _a2. Brief overview of speech recognition -- 2.1 Statistical framework of speech recognition -- 2.2 Speech analysis -- 2.3 Acoustic model -- 2.3.1 Hidden Markov model -- 2.3.2 Computation of acoustic likelihood -- 2.3.3 Output probability distribution -- 2.4 Subword models and pronunciation lexicon -- 2.5 Context-dependent phone models -- 2.6 Language model -- 2.6.1 Finite-state grammar -- 2.6.2 N-gram model -- 2.6.3 Back-off smoothing -- 2.7 Decoder -- 2.7.1 Viterbi algorithm for continuous speech recognition -- 2.7.2 Time-synchronous Viterbi beam search -- 2.7.3 Practical techniques for LVCSR -- 2.7.4 Context-dependent phone search network -- 2.7.5 Lattice generation and N-best search --
505 8 _a3. Introduction to weighted finite-state transducers -- 3.1 Finite automata -- 3.2 Basic properties of finite automata -- 3.3 Semiring -- 3.4 Basic operations -- 3.5 Transducer composition -- 3.6 Optimization -- 3.6.1 Determinization -- 3.6.2 Weight pushing -- 3.6.3 Minimization -- 3.7 Epsilon removal --
505 8 _a4. Speech recognition by weighted finite-state transducers -- 4.1 Overview of WFST-based speech recognition -- 4.2 Construction of component WFSTs -- 4.2.1 Acoustic models -- 4.2.2 Phone context dependency -- 4.2.3 Pronunciation lexicon -- 4.2.4 Language models -- 4.3 Composition and optimization -- 4.4 Decoding algorithm using a single WFST -- 4.5 Decoding performance --
505 8 _a5. Dynamic decoders with on-the-fly WFST operations -- 5.1 Problems in the native WFST approach -- 5.2 On-the-fly composition and optimization -- 5.3 Known problems of on-the-fly composition approach -- 5.4 Look-ahead composition -- 5.4.1 How to obtain prospective output labels -- 5.4.2 Basic principle of look-ahead composition -- 5.4.3 Realization of look-ahead composition using a filter transducer -- 5.4.4 Look-ahead composition with weight pushing -- 5.4.5 Generalized composition -- 5.4.6 Interval representation of label sets -- 5.5 On-the-fly rescoring approach -- 5.5.1 Construction of component WFSTs for on-the-fly rescoring -- 5.5.2 Concept -- 5.5.3 Algorithm -- 5.5.4 Approximation in decoding -- 5.5.5 Comparison with look-ahead composition --
505 8 _a6. Summary and perspective -- 6.1 Realization of advanced speech recognition techniques using WFSTs -- 6.1.1 WFSTs for extended language models -- 6.1.2 Dynamic grammars based on WFSTs -- 6.1.3 Wide-context-dependent HMMs -- 6.1.4 Extension of WFSTs for multi-modal inputs -- 6.1.5 Use of WFSTs for learning -- 6.2 Integration of speech and language processing -- 6.3 Other speech applications using WFSTs -- 6.4 Conclusion --
505 8 _aBibliography -- Authors' biographies.
506 1 _aAbstract freely available; full-text restricted to subscribers or individual document purchasers.
510 0 _aCompendex
510 0 _aINSPEC
510 0 _aGoogle scholar
510 0 _aGoogle book search
520 3 _aThis book introduces the theory, algorithms, and implementation techniques for efficient decoding in speech recognition mainly focusing on the Weighted Finite-State Transducer (WFST) approach. The decoding process for speech recognition is viewed as a search problem whose goal is to find a sequence of words that best matches an input speech signal. Since this process becomes computationally more expensive as the system vocabulary size increases, research has long been devoted to reducing the computational cost. Recently, the WFST approach has become an important state-of-the-art speech recognition technology, because it offers improved decoding speed with fewer recognition errors compared with conventional methods. However, it is not easy to understand all the algorithms used in this framework, and they are still in a black box for many people. In this book, we review the WFST approach and aim to provide comprehensive interpretations of WFST operations and decoding algorithms to help anyone who wants to understand, develop, and study WFST-based speech recognizers. We also mention recent advances in this framework and its applications to spoken language processing.
530 _aAlso available in print.
588 _aTitle from PDF t.p. (viewed on January 18, 2013).
650 0 _aSpeech processing systems.
650 0 _aAutomatic speech recognition.
650 0 _aTransducers.
653 _aspeech recognition
653 _aautomaton
653 _aweighted finite-state transducer
653 _aViterbi algorithm
653 _adecoder
653 _aoptimization
700 1 _aNakamura, Atsushi.
776 0 8 _iPrint version:
_z9781608454730
830 0 _aSynthesis digital library of engineering and computer science.
830 0 _aSynthesis lectures on speech and audio processing ;
_v# 10.
_x1932-1678
856 4 2 _3Abstract with links to resource
_uhttp://ieeexplore.ieee.org/servlet/opac?bknumber=6812612
999 _c561959
_d561959