Welcome to P K Kelkar Library, Online Public Access Catalogue (OPAC)

Normal view MARC view ISBD view

Bayesian analysis in natural language processing /

By: Cohen, Shay [author.].
Material type: materialTypeLabelBookSeries: Synthesis digital library of engineering and computer science: ; Synthesis lectures on human language technologies: # 35.Publisher: [San Rafael, California] : Morgan & Claypool, 2016.Description: 1 PDF (xxvii, 246 pages) : illustrations.Content type: text Media type: electronic Carrier type: online resourceISBN: 9781627054218.Subject(s): Natural language processing (Computer science) | Bayesian statistical decision theory | natural language processing | computational linguistics | Bayesian statistics | Bayesian NLP | statistical learning | inference in NLP | grammar modeling in NLPDDC classification: 006.35 Online resources: Abstract with links to resource Also available in print.
Contents:
1. Preliminaries -- 1.1 Probability measures -- 1.2 Random variables -- 1.2.1 Continuous and discrete random variables -- 1.2.2 Joint distribution over multiple random variables -- 1.3 Conditional distributions -- 1.3.1 Bayes' rule -- 1.3.2 Independent and conditionally independent random variables -- 1.3.3 Exchangeable random variables -- 1.4 Expectations of random variables -- 1.5 Models -- 1.5.1 Parametric vs. nonparametric models -- 1.5.2 Inference with models -- 1.5.3 Generative models -- 1.5.4 Independence assumptions in models -- 1.5.5 Directed graphical models -- 1.6 Learning from data scenarios -- 1.7 Bayesian and frequentist philosophy (tip of the iceberg) -- 1.8 Summary -- 1.9 Exercises --
2. Introduction -- 2.1 Overview: where Bayesian statistics and NLP meet -- 2.2 First example: The latent Dirichlet allocation model -- 2.2.1 The Dirichlet distribution -- 2.2.2 Inference -- 2.2.3 Summary -- 2.3 Second example: Bayesian text regression -- 2.4 Conclusion and summary -- 2.5 Exercises --
3. Priors -- 3.1 Conjugate priors -- 3.1.1 Conjugate priors and normalization constants -- 3.1.2 The use of conjugate priors with latent variable models -- 3.1.3 Mixture of conjugate priors -- 3.1.4 Renormalized conjugate distributions -- 3.1.5 Discussion: to be or not to be conjugate? -- 3.1.6 Summary -- 3.2 Priors over multinomial and categorical distributions -- 3.2.1 The Dirichlet distribution re-visited -- 3.2.2 The logistic normal distribution -- 3.2.3 Discussion -- 3.2.4 Summary -- 3.3 Non-informative priors -- 3.3.1 Uniform and improper priors -- 3.3.2 Jeffreys prior -- 3.3.3 Discussion -- 3.4 Conjugacy and exponential models -- 3.5 Multiple parameter draws in models -- 3.6 Structural priors -- 3.7 Conclusion and summary -- 3.8 Exercises --
4. Bayesian estimation -- 4.1 Learning with latent variables: two views -- 4.2 Bayesian point estimation -- 4.2.1 Maximum a posteriori estimation -- 4.2.2 Posterior approximations based on the map solution -- 4.2.3 Decision-theoretic point estimation -- 4.2.4 Discussion and summary -- 4.3 Empirical Bayes -- 4.4 Asymptotic behavior of the posterior -- 4.5 Summary -- 4.6 Exercises --
5. Sampling methods -- 5.1 MCMC algorithms: overview -- 5.2 NLP model structure for MCMC inference -- 5.2.1 Partitioning the latent variables -- 5.3 Gibbs sampling -- 5.3.1 Collapsed Gibbs sampling -- 5.3.2 Operator view -- 5.3.3 Parallelizing the Gibbs sampler -- 5.3.4 Summary -- 5.4 The Metropolis-Hastings algorithm -- 5.4.1 Variants of Metropolis-Hastings -- 5.5 Slice sampling -- 5.5.1 Auxiliary variable sampling -- 5.5.2 The use of slice sampling and auxiliary variable sampling in NLP -- 5.6 Simulated annealing -- 5.7 Convergence of MCMC algorithms -- 5.8 Markov chain: basic theory -- 5.9 Sampling algorithms not in the MCMC realm -- 5.10 Monte Carlo integration -- 5.11 Discussion -- 5.11.1 Computability of distribution vs. sampling -- 5.11.2 Nested MCMC sampling -- 5.11.3 Runtime of MCMC samplers -- 5.11.4 Particle filtering -- 5.12 Conclusion and summary -- 5.13 Exercises --
6. Variational inference -- 6.1 Variational bound on marginal log-likelihood -- 6.2 Mean-field approximation -- 6.3 Mean-field variational inference algorithm -- 6.3.1 Dirichlet-multinomial variational inference -- 6.3.2 Connection to the expectation-maximization algorithm -- 6.4 Empirical Bayes with variational inference -- 6.5 Discussion -- 6.5.1 Initialization of the inference algorithms -- 6.5.2 Convergence diagnosis -- 6.5.3 The use of variational inference for decoding -- 6.5.4 Variational inference as KL divergence minimization -- 6.5.5 Online variational inference -- 6.6 Summary -- 6.7 Exercises --
7. Nonparametric priors -- 7.1 The Dirichlet process: three views -- 7.1.1 The stick-breaking process -- 7.1.2 The Chinese restaurant process -- 7.2 Dirichlet process mixtures -- 7.2.1 Inference with Dirichlet process mixtures -- 7.2.2 Dirichlet process mixture as a limit of mixture models -- 7.3 The hierarchical Dirichlet process -- 7.4 The Pitman-Yor process -- 7.4.1 Pitman-Yor process for language modeling -- 7.4.2 Power-law behavior of the Pitman-Yor process -- 7.5 Discussion -- 7.5.1 Gaussian processes -- 7.5.2 The Indian buffet process -- 7.5.3 Nested Chinese restaurant process -- 7.5.4 Distance-dependent Chinese restaurant process -- 7.5.5 Sequence memoizers -- 7.6 Summary -- 7.7 Exercises --
8. Bayesian grammar models -- 8.1 Bayesian hidden Markov models -- 8.1.1 Hidden Markov models with an infinite state space -- 8.2 Probabilistic context-free grammars -- 8.2.1 PCFGs as a collection of multinomials -- 8.2.2 Basic inference algorithms for PCFGs -- 8.2.3 Hidden Markov models as PCFGs -- 8.3 Bayesian probabilistic context-free grammars -- 8.3.1 Priors on PCFGs -- 8.3.2 Monte Carlo inference with Bayesian PCFGs -- 8.3.3 Variational inference with Bayesian PCFGs -- 8.4 Adaptor grammars -- 8.4.1 Pitman-Yor adaptor grammars -- 8.4.2 Stick-breaking view of PYAG -- 8.4.3 Inference with PYAG -- 8.5 Hierarchical Dirichlet process PCFGs (HDP-PCFGs) -- 8.5.1 Extensions to the HDP-PCFG model -- 8.6 Dependency grammars -- 8.6.1 State-split nonparametric dependency models -- 8.7 Synchronous grammars -- 8.8 Multilingual learning -- 8.8.1 Part-of-speech tagging -- 8.8.2 Grammar induction -- 8.9 Further reading -- 8.10 Summary -- 8.11 Exercises -- Closing remarks --
A. Basic concepts -- A1. Basic concepts in information theory -- Entropy and cross entropy -- Kullback-Leibler divergence -- A2. Other basic concepts -- Jensen's inequality -- Transformation of continuous random variables -- The expectation-maximization algorithm --
B. Distribution catalog -- The multinomial distribution -- The Dirichlet distribution -- The Poisson distribution -- The gamma distribution -- The multivariate normal distribution -- The Laplace distribution -- The logistic normal distribution -- The inverse Wishart distribution --
Bibliography -- Author's biography -- Index.
Abstract: Natural language processing (NLP) went through a profound transformation in the mid-1980s when it shifted to make heavy use of corpora and data-driven techniques to analyze language. Since then, the use of statistical techniques in NLP has evolved in several ways. One such example of evolution took place in the late 1990s or early 2000s, when full-fledged Bayesian machinery was introduced to NLP. This Bayesian approach to NLP has come to accommodate for various shortcomings in the frequentist approach and to enrich it, especially in the unsupervised setting, where statistical learning is done without target prediction examples. We cover the methods and algorithms that are needed to fluently read Bayesian learning papers in NLP and to do research in the area. These methods and algorithms are partially borrowed from both machine learning and statistics and are partially developed "in-house" in NLP. We cover inference techniques such as Markov chain Monte Carlo sampling and variational inference, Bayesian estimation, and nonparametric modeling. We also cover fundamental concepts in Bayesian statistics such as prior distributions, conjugacy, and generative modeling. Finally, we cover some of the fundamental modeling techniques in NLP, such as grammar modeling and their use with Bayesian analysis.
    average rating: 0.0 (0 votes)
Item type Current location Call number Status Date due Barcode Item holds
E books E books PK Kelkar Library, IIT Kanpur
Available EBKE716
Total holds: 0

Mode of access: World Wide Web.

System requirements: Adobe Acrobat Reader.

Part of: Synthesis digital library of engineering and computer science.

Includes bibliographical references (pages 221-240) and index.

1. Preliminaries -- 1.1 Probability measures -- 1.2 Random variables -- 1.2.1 Continuous and discrete random variables -- 1.2.2 Joint distribution over multiple random variables -- 1.3 Conditional distributions -- 1.3.1 Bayes' rule -- 1.3.2 Independent and conditionally independent random variables -- 1.3.3 Exchangeable random variables -- 1.4 Expectations of random variables -- 1.5 Models -- 1.5.1 Parametric vs. nonparametric models -- 1.5.2 Inference with models -- 1.5.3 Generative models -- 1.5.4 Independence assumptions in models -- 1.5.5 Directed graphical models -- 1.6 Learning from data scenarios -- 1.7 Bayesian and frequentist philosophy (tip of the iceberg) -- 1.8 Summary -- 1.9 Exercises --

2. Introduction -- 2.1 Overview: where Bayesian statistics and NLP meet -- 2.2 First example: The latent Dirichlet allocation model -- 2.2.1 The Dirichlet distribution -- 2.2.2 Inference -- 2.2.3 Summary -- 2.3 Second example: Bayesian text regression -- 2.4 Conclusion and summary -- 2.5 Exercises --

3. Priors -- 3.1 Conjugate priors -- 3.1.1 Conjugate priors and normalization constants -- 3.1.2 The use of conjugate priors with latent variable models -- 3.1.3 Mixture of conjugate priors -- 3.1.4 Renormalized conjugate distributions -- 3.1.5 Discussion: to be or not to be conjugate? -- 3.1.6 Summary -- 3.2 Priors over multinomial and categorical distributions -- 3.2.1 The Dirichlet distribution re-visited -- 3.2.2 The logistic normal distribution -- 3.2.3 Discussion -- 3.2.4 Summary -- 3.3 Non-informative priors -- 3.3.1 Uniform and improper priors -- 3.3.2 Jeffreys prior -- 3.3.3 Discussion -- 3.4 Conjugacy and exponential models -- 3.5 Multiple parameter draws in models -- 3.6 Structural priors -- 3.7 Conclusion and summary -- 3.8 Exercises --

4. Bayesian estimation -- 4.1 Learning with latent variables: two views -- 4.2 Bayesian point estimation -- 4.2.1 Maximum a posteriori estimation -- 4.2.2 Posterior approximations based on the map solution -- 4.2.3 Decision-theoretic point estimation -- 4.2.4 Discussion and summary -- 4.3 Empirical Bayes -- 4.4 Asymptotic behavior of the posterior -- 4.5 Summary -- 4.6 Exercises --

5. Sampling methods -- 5.1 MCMC algorithms: overview -- 5.2 NLP model structure for MCMC inference -- 5.2.1 Partitioning the latent variables -- 5.3 Gibbs sampling -- 5.3.1 Collapsed Gibbs sampling -- 5.3.2 Operator view -- 5.3.3 Parallelizing the Gibbs sampler -- 5.3.4 Summary -- 5.4 The Metropolis-Hastings algorithm -- 5.4.1 Variants of Metropolis-Hastings -- 5.5 Slice sampling -- 5.5.1 Auxiliary variable sampling -- 5.5.2 The use of slice sampling and auxiliary variable sampling in NLP -- 5.6 Simulated annealing -- 5.7 Convergence of MCMC algorithms -- 5.8 Markov chain: basic theory -- 5.9 Sampling algorithms not in the MCMC realm -- 5.10 Monte Carlo integration -- 5.11 Discussion -- 5.11.1 Computability of distribution vs. sampling -- 5.11.2 Nested MCMC sampling -- 5.11.3 Runtime of MCMC samplers -- 5.11.4 Particle filtering -- 5.12 Conclusion and summary -- 5.13 Exercises --

6. Variational inference -- 6.1 Variational bound on marginal log-likelihood -- 6.2 Mean-field approximation -- 6.3 Mean-field variational inference algorithm -- 6.3.1 Dirichlet-multinomial variational inference -- 6.3.2 Connection to the expectation-maximization algorithm -- 6.4 Empirical Bayes with variational inference -- 6.5 Discussion -- 6.5.1 Initialization of the inference algorithms -- 6.5.2 Convergence diagnosis -- 6.5.3 The use of variational inference for decoding -- 6.5.4 Variational inference as KL divergence minimization -- 6.5.5 Online variational inference -- 6.6 Summary -- 6.7 Exercises --

7. Nonparametric priors -- 7.1 The Dirichlet process: three views -- 7.1.1 The stick-breaking process -- 7.1.2 The Chinese restaurant process -- 7.2 Dirichlet process mixtures -- 7.2.1 Inference with Dirichlet process mixtures -- 7.2.2 Dirichlet process mixture as a limit of mixture models -- 7.3 The hierarchical Dirichlet process -- 7.4 The Pitman-Yor process -- 7.4.1 Pitman-Yor process for language modeling -- 7.4.2 Power-law behavior of the Pitman-Yor process -- 7.5 Discussion -- 7.5.1 Gaussian processes -- 7.5.2 The Indian buffet process -- 7.5.3 Nested Chinese restaurant process -- 7.5.4 Distance-dependent Chinese restaurant process -- 7.5.5 Sequence memoizers -- 7.6 Summary -- 7.7 Exercises --

8. Bayesian grammar models -- 8.1 Bayesian hidden Markov models -- 8.1.1 Hidden Markov models with an infinite state space -- 8.2 Probabilistic context-free grammars -- 8.2.1 PCFGs as a collection of multinomials -- 8.2.2 Basic inference algorithms for PCFGs -- 8.2.3 Hidden Markov models as PCFGs -- 8.3 Bayesian probabilistic context-free grammars -- 8.3.1 Priors on PCFGs -- 8.3.2 Monte Carlo inference with Bayesian PCFGs -- 8.3.3 Variational inference with Bayesian PCFGs -- 8.4 Adaptor grammars -- 8.4.1 Pitman-Yor adaptor grammars -- 8.4.2 Stick-breaking view of PYAG -- 8.4.3 Inference with PYAG -- 8.5 Hierarchical Dirichlet process PCFGs (HDP-PCFGs) -- 8.5.1 Extensions to the HDP-PCFG model -- 8.6 Dependency grammars -- 8.6.1 State-split nonparametric dependency models -- 8.7 Synchronous grammars -- 8.8 Multilingual learning -- 8.8.1 Part-of-speech tagging -- 8.8.2 Grammar induction -- 8.9 Further reading -- 8.10 Summary -- 8.11 Exercises -- Closing remarks --

A. Basic concepts -- A1. Basic concepts in information theory -- Entropy and cross entropy -- Kullback-Leibler divergence -- A2. Other basic concepts -- Jensen's inequality -- Transformation of continuous random variables -- The expectation-maximization algorithm --

B. Distribution catalog -- The multinomial distribution -- The Dirichlet distribution -- The Poisson distribution -- The gamma distribution -- The multivariate normal distribution -- The Laplace distribution -- The logistic normal distribution -- The inverse Wishart distribution --

Bibliography -- Author's biography -- Index.

Abstract freely available; full-text restricted to subscribers or individual document purchasers.

Compendex

INSPEC

Google scholar

Google book search

Natural language processing (NLP) went through a profound transformation in the mid-1980s when it shifted to make heavy use of corpora and data-driven techniques to analyze language. Since then, the use of statistical techniques in NLP has evolved in several ways. One such example of evolution took place in the late 1990s or early 2000s, when full-fledged Bayesian machinery was introduced to NLP. This Bayesian approach to NLP has come to accommodate for various shortcomings in the frequentist approach and to enrich it, especially in the unsupervised setting, where statistical learning is done without target prediction examples. We cover the methods and algorithms that are needed to fluently read Bayesian learning papers in NLP and to do research in the area. These methods and algorithms are partially borrowed from both machine learning and statistics and are partially developed "in-house" in NLP. We cover inference techniques such as Markov chain Monte Carlo sampling and variational inference, Bayesian estimation, and nonparametric modeling. We also cover fundamental concepts in Bayesian statistics such as prior distributions, conjugacy, and generative modeling. Finally, we cover some of the fundamental modeling techniques in NLP, such as grammar modeling and their use with Bayesian analysis.

Also available in print.

Title from PDF title page (viewed on June 18, 2016).

There are no comments for this item.

Log in to your account to post a comment.

Powered by Koha