Syntax-based statistical machine translation /
By: Williams, Philip [author.].
Contributor(s): Sennrich, Rico [author.] | Post, Matt [author.] | Koehn, Philipp [author.].
Material type: BookSeries: Synthesis digital library of engineering and computer science: ; Synthesis lectures on human language technologies: # 33.Publisher: [San Rafael, California] : Morgan & Claypool, 2016.Description: 1 PDF (xvii, 190 pages) : illustrations.Content type: text Media type: electronic Carrier type: online resourceISBN: 9781627055024.Subject(s): Machine translating | Translating and interpreting -- Data processing | statistical machine translation | syntax | synchronous grammar formalisms | natural language processing | computational linguistics | machine learning | statistical modelingDDC classification: 418.020285 Online resources: Abstract with links to resource Also available in print.Item type | Current location | Call number | Status | Date due | Barcode | Item holds |
---|---|---|---|---|---|---|
E books | PK Kelkar Library, IIT Kanpur | Available | EBKE722 |
Mode of access: World Wide Web.
System requirements: Adobe Acrobat Reader.
Part of: Synthesis digital library of engineering and computer science.
Includes bibliographical references (pages 159-175) and index.
1. Models -- 1.1 Syntactic translation units -- 1.1.1 Phrases -- 1.1.2 Phrases with gaps -- 1.1.3 Phrases with labels -- 1.1.4 Phrases with internal tree structure -- 1.2 Grammar formalisms -- 1.2.1 Context-free grammar -- 1.2.2 Synchronous context-free grammar -- 1.2.3 Synchronous tree-substitution grammar -- 1.2.4 Probabilistic and weighted grammars -- 1.3 Statistical models -- 1.3.1 Generative models -- 1.3.2 Discriminative models -- 1.4 A classification of syntax-based models -- 1.4.1 String-to-string -- 1.4.2 String-to-tree -- 1.4.3 Tree-to-string -- 1.4.4 Tree-to-tree -- 1.5 A brief history of syntax-based SMT --
2. Learning from parallel text -- 2.1 Preliminaries -- 2.2 Hierarchical phrase-based grammar -- 2.2.1 Rule extraction -- 2.2.2 Features -- 2.3 Syntax-augmented grammar -- 2.3.1 Rule extraction -- 2.3.2 Extraction heuristics -- 2.3.3 Features -- 2.4 GHKM -- 2.4.1 Identifying frontier nodes -- 2.4.2 Extracting minimal rules -- 2.4.3 Unaligned source words -- 2.4.4 Composed rules -- 2.4.5 Features -- 2.5 A comparison -- 2.6 Summary --
3. Decoding I: preliminaries -- 3.1 Hypergraphs, forests, and derivations -- 3.1.1 Basic definitions -- 3.1.2 Parse forests -- 3.1.3 Translation forests -- 3.1.4 Derivations -- 3.1.5 Weighted derivations -- 3.2 Algorithms on hypergraphs -- 3.2.1 The topological sort algorithm -- 3.2.2 The Viterbi max-derivation algorithm -- 3.2.3 The CYK max-derivation algorithm -- 3.2.4 The eager and lazy k-best algorithms -- 3.3 Historical notes and further reading --
4. Decoding II: tree decoding -- 4.1 Decoding with local features -- 4.1.1 A basic decoding algorithm -- 4.1.2 Hyperedge bundling -- 4.2 State splitting -- 4.2.1 Adding a bigram language model feature -- 4.2.2 The state-split hypergraph -- 4.2.3 Complexity -- 4.3 Beam search -- 4.3.1 The beam -- 4.3.2 Rest cost estimation -- 4.3.3 Monotonicity redux -- 4.3.4 Exhaustive beam filling -- 4.3.5 Cube pruning -- 4.3.6 Cube growing -- 4.3.7 State refinement -- 4.4 Efficient tree parsing -- 4.5 Tree-to-tree decoding -- 4.6 Historical notes and further reading --
5. Decoding III: string decoding -- 5.1 Basic beam search -- 5.1.1 Parse forest complexity -- 5.2 Faster beam search -- 5.2.1 Constrained width parsing -- 5.2.2 Per-subspan beam search -- 5.3 Handling non-binary grammars -- 5.3.1 Binarization -- 5.3.2 Alternatives to binarization -- 5.4 Interim summary -- 5.5 Parsing algorithms -- 5.5.1 The CYK+ algorithm -- 5.5.2 Trie-based grammar storage -- 5.5.3 The recursive CYK+ algorithm -- 5.6 STSG and distinct-category SCFG -- 5.6.1 STSG -- 5.6.2 Distinct-category SCFG -- 5.7 Historical notes and further reading --
6. Selected topics -- 6.1 Transformations on trees -- 6.1.1 Tree restructuring -- 6.1.2 Tree re-labeling -- 6.1.3 Fuzzy syntax -- 6.1.4 Forest-based approaches -- 6.1.5 Beyond context-free models -- 6.2 Dependency structure -- 6.2.1 Dependency treelet translation -- 6.2.2 String-to-dependency SMT -- 6.3 Improving grammaticality -- 6.3.1 Agreement -- 6.3.2 Subcategorization -- 6.3.3 Morphological structure in synchronous grammars -- 6.3.4 Syntactic language models -- 6.4 Evaluation metrics --
7. Closing remarks -- 7.1 Which approach is best? -- 7.2 What's next? --
A. Open-source tools -- Bibliography -- Authors' biographies -- Author index -- Index.
Abstract freely available; full-text restricted to subscribers or individual document purchasers.
Compendex
INSPEC
Google scholar
Google book search
This unique book provides a comprehensive introduction to the most popular syntax-based statistical machine translation models, filling a gap in the current literature for researchers and developers in human language technologies. While phrase-based models have previously dominated the field, syntax-based approaches have proved a popular alternative, as they elegantly solve many of the shortcomings of phrase-based models. The heart of this book is a detailed introduction to decoding for syntax-based models. The book begins with an overview of synchronous-context free grammar (SCFG) and synchronous tree-substitution grammar (STSG) along with their associated statistical models. It also describes how three popular instantiations (Hiero, SAMT, and GHKM) are learned from parallel corpora. It introduces and details hypergraphs and associated general algorithms, as well as algorithms for decoding with both tree and string input. Special attention is given to efficiency, including search approximations such as beam search and cube pruning, data structures, and parsing algorithms. The book consistently highlights the strengths (and limitations) of syntax-based approaches, including their ability to generalize phrase-based translation units, their modeling of specific linguistic phenomena, and their function of structuring the search space.
Also available in print.
Title from PDF title page (viewed on August 16, 2016).
There are no comments for this item.