Product Cover Image

Multilingual Natural Language Processing Applications: From Theory to Practice

By Daniel Bikel, Imed Zitouni

Published by IBM Press

Published Date: Jan 11, 2012

Description

Multilingual Natural Language Processing Applications is the first comprehensive single-source guide to building robust and accurate multilingual NLP systems. Edited by two leading experts, it integrates cutting-edge advances with practical solutions drawn from extensive field experience.

 

Part I introduces the core concepts and theoretical foundations of modern multilingual natural language processing, presenting today’s best practices for understanding word and document structure, analyzing syntax, modeling language, recognizing entailment, and detecting redundancy.

 

Part II thoroughly addresses the practical considerations associated with building real-world applications, including information extraction, machine translation, information retrieval/search, summarization, question answering, distillation, processing pipelines, and more.

 

This book contains important new contributions from leading researchers at IBM, Google, Microsoft, Thomson Reuters, BBN, CMU, University of Edinburgh, University of Washington, University of North Texas, and others.

 

Coverage includes

Core NLP problems, and today’s best algorithms for attacking them

  • Processing the diverse morphologies present in the world’s languages
  • Uncovering syntactical structure, parsing semantics, using semantic role labeling, and scoring grammaticality
  • Recognizing inferences, subjectivity, and opinion polarity
  • Managing key algorithmic and design tradeoffs in real-world applications
  • Extracting information via mention detection, coreference resolution, and events
  • Building large-scale systems for machine translation, information retrieval, and summarization
  • Answering complex questions through distillation and other advanced techniques
  • Creating dialog systems that leverage advances in speech recognition, synthesis, and dialog management
  • Constructing common infrastructure for multiple multilingual text processing applications

 

This book will be invaluable for all engineers, software developers, researchers, and graduate students who want to process large quantities of text in multiple languages, in any environment: government, corporate, or academic.

Table of Contents

Preface         xxi

Acknowledgments         xxv

About the Authors         xxvii

 

Part I: In Theory         1

Chapter 1: Finding the Structure of Words         3

1.1 Words and Their Components   4

1.2 Issues and Challenges   8

1.3 Morphological Models   15

1.4 Summary   22

 

Chapter 2: Finding the Structure of Documents         29

2.1 Introduction   29

2.2 Methods   33

2.3 Complexity of the Approaches   40

2.4 Performances of the Approaches   41

2.5 Features   41

2.6 Processing Stages   48

2.7 Discussion   48

2.8 Summary   49

 

Chapter 3: Syntax         57

3.1 Parsing Natural Language   57

3.2 Treebanks: A Data-Driven Approach to Syntax   59

3.3 Representation of Syntactic Structure   63

3.4 Parsing Algorithms 70

3.5 Models for Ambiguity Resolution in Parsing   80

3.6 Multilingual Issues: What Is a Token?   87

3.7 Summary   92

 

Chapter 4: Semantic Parsing         97

4.1 Introduction   97

4.2 Semantic Interpretation   98

4.3 System Paradigms   101

4.4 Word Sense   102

4.5 Predicate-Argument Structure 118

4.6 Meaning Representation   147

4.7 Summary   152

 

Chapter 5: Language Modeling          169

5.1 Introduction   169

5.2 n-Gram Models   170

5.3 Language Model Evaluation   170

5.4 Parameter Estimation   171

5.5 Language Model Adaptation   176

5.6 Types of Language Models   178

5.7 Language-Specific Modeling Problems  188

5.8 Multilingual and Crosslingual Language Modeling   195

5.9 Summary   198

 

Chapter 6: Recognizing Textual Entailment         209

6.1 Introduction   209

6.2 The Recognizing Textual Entailment Task   210

6.3 A Framework for Recognizing Textual Entailment   219

6.4 Case Studies   238

6.5 Taking RTE Further   248

6.6 Useful Resources   252

6.7 Summary   253

 

Chapter 7: Multilingual Sentiment and Subjectivity Analysis         259

7.1 Introduction   259

7.2 Definitions   260

7.3 Sentiment and Subjectivity Analysis on English   262

7.4 Word- and Phrase-Level Annotations   264

7.5 Sentence-Level Annotations   270

7.6 Document-Level Annotations   272

7.7 What Works, What Doesn’t   274

7.8 Summary   277

 

Part II: In Practice         283

Chapter 8: Entity Detection and Tracking         285

8.1 Introduction   285

8.2 Mention Detection   287

8.3 Coreference Resolution   296

8.4 Summary   303

 

Chapter 9: Relations and Events         309

9.1 Introduction   309

9.2 Relations and Events   310

9.3 Types of Relations   311

9.4 Relation Extraction as Classification   312

9.5 Other Approaches to Relation Extraction   317

9.6 Events   320

9.7 Event Extraction Approaches   320

9.8 Moving Beyond the Sentence   323

9.9 Event Matching   323

9.10 Future Directions for Event Extraction   326

9.11 Summary   326

 

Chapter 10: Machine Translation         331

10.1 Machine Translation Today   331

10.2 Machine Translation Evaluation   332

10.3 Word Alignment   337

10.4 Phrase-Based Models   343

10.5 Tree-Based Models   350

10.6 Linguistic Challenges   354

10.7 Tools and Data Resources   356

10.8 Future Directions   358

10.9 Summary   359

 

Chapter 11: Multilingual Information Retrieval         365

11.1 Introduction   366

11.2 Document Preprocessing   366

11.3 Monolingual Information Retrieval   372

11.4 CLIR   378

11.5 MLIR   382

11.6 Evaluation in Information Retrieval   386

11.7 Tools, Software, and Resources   391

11.8 Summary   393

 

Chapter 12: Multilingual Automatic Summarization         397

12.1 Introduction   397

12.2 Approaches to Summarization   399

12.3 Evaluation   412

12.4 How to Build a Summarizer   420

12.5 Competitions and Datasets   424

12.6 Summary   426

 

Chapter 13: Question Answering         433

13.1 Introduction and History   433

13.2 Architectures   435

13.3 Source Acquisition and Preprocessing   437

13.4 Question Analysis   440

13.5 Search and Candidate Extraction   443

13.6 Answer Scoring   450

13.7 Crosslingual Question Answering   454

13.8 A Case Study   455

13.9 Evaluation   460

13.10 Current and Future Challenges   464

13.11 Summary and Further Reading   465

 

Chapter 14: Distillation         475

14.1 Introduction   475

14.2 An Example   476

14.3 Relevance and Redundancy   477

14.4 The Rosetta Consortium Distillation System   479

14.5 Other Distillation Approaches   488

14.6 Evaluation and Metrics   491

14.7 Summary   495

 

Chapter 15: Spoken Dialog Systems         499

15.1 Introduction   499

15.2 Spoken Dialog Systems   499

15.3 Forms of Dialog   509

15.4 Natural Language Call Routing   510

15.5 Three Generations of Dialog Applications   510

15.6 Continuous Improvement Cycle   512

15.7 Transcription and Annotation of Utterances   513

15.8 Localization of Spoken Dialog Systems   513

15.9 Summary   520

 

Chapter 16: Combining Natural Language Processing Engines         523

16.1 Introduction   523

16.2 Desired Attributes of Architectures for Aggregating Speech and NLP Engines   524

16.3 Architectures for Aggregation   527

16.4 Case Studies   531

16.5 Lessons Learned   540

16.6 Summary   542

16.7 Sample UIMA Code   542

 

Index         551

Purchase Info

ISBN-10: 0-13-704785-1

ISBN-13: 978-0-13-704785-7

Format: eBook (Watermarked)?

This eBook includes the following formats, accessible from your Account page after purchase:

ePubEPUBThe open industry format known for its reflowable content and usability on supported mobile devices.

MOBIMOBIThe eBook format compatible with the Amazon Kindle and Amazon Kindle applications.

Adobe ReaderPDFThe popular standard, used most often with the free Adobe® Reader® software.

This eBook requires no passwords or activation to read. We customize your eBook by discretely watermarking it with your name, making it uniquely yours.

Includes EPUB, MOBI, and PDF

$103.99

Add to Cart