SFB 732 D5: Biased Learning for Syntactic Disambiguation

Similar documents
Annotation Projection for Discourse Connectives

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Context Free Grammars. Many slides from Michael Collins

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Accurate Unlexicalized Parsing for Modern Hebrew

LTAG-spinal and the Treebank

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Cross Language Information Retrieval

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

The taming of the data:

Linking Task: Identifying authors and book titles in verbose queries

Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

CS 598 Natural Language Processing

The KIT-LIMSI Translation System for WMT 2014

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

The Role of the Head in the Interpretation of English Deverbal Compounds

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

Character Stream Parsing of Mixed-lingual Text

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Introduction. Beáta B. Megyesi. Uppsala University Department of Linguistics and Philology Introduction 1(48)

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Ensemble Technique Utilization for Indonesian Dependency Parser

Training and evaluation of POS taggers on the French MULTITAG corpus

The stages of event extraction

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Learning Methods in Multilingual Speech Recognition

Specifying a shallow grammatical for parsing purposes

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

Bigrams in registers, domains, and varieties: a bigram gravity approach to the homogeneity of corpora

Natural Language Processing. George Konidaris

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

Multi-Lingual Text Leveling

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Memory-based grammatical error correction

Copyright and moral rights for this thesis are retained by the author

A Graph Based Authorship Identification Approach

arxiv: v1 [cs.cv] 10 May 2017

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

An Interactive Intelligent Language Tutor Over The Internet

arxiv:cmp-lg/ v1 7 Jun 1997 Abstract

Good-Enough Representations in Language Comprehension

Switchboard Language Model Improvement with Conversational Data from Gigaword

Constructing and exploiting an automatically annotated resource of legislative texts

Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities

Grammars & Parsing, Part 1:

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Prediction of Maximal Projection for Semantic Role Labeling

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Ambiguity in the Brain: What Brain Imaging Reveals About the Processing of Syntactically Ambiguous Sentences

On document relevance and lexical cohesion between query terms

Comparison of Linguistic Results: Literate structures in written texts first graders Germany / Turkey. Ulrich Mehlem Yazgül Şimşek

Parsing of part-of-speech tagged Assamese Texts

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

CS Machine Learning

THE VERB ARGUMENT BROWSER

Disambiguation of Thai Personal Name from Online News Articles

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Search right and thou shalt find... Using Web Queries for Learner Error Detection

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Language Model and Grammar Extraction Variation in Machine Translation

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Natural Language Processing: Interpretation, Reasoning and Machine Learning

Multilingual Sentiment and Subjectivity Analysis

Applying Speaking Criteria. For use from November 2010 GERMAN BREAKTHROUGH PAGRB01

The Discourse Anaphoric Properties of Connectives

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

The Ups and Downs of Preposition Error Detection in ESL Writing

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Comparing different approaches to treat Translation Ambiguity in CLIR: Structured Queries vs. Target Co occurrence Based Selection

Detecting English-French Cognates Using Orthographic Edit Distance

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Probabilistic Latent Semantic Analysis

Word Translation Disambiguation without Parallel Texts

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

An Efficient Implementation of a New POP Model

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Developing a TT-MCTAG for German with an RCG-based Parser

arxiv: v1 [cs.cl] 2 Apr 2017

Language Learning and Development. ISSN: (Print) (Online) Journal homepage:

Treebank mining with GrETEL. Liesbeth Augustinus Frank Van Eynde

Short Text Understanding Through Lexical-Semantic Analysis

Distant Supervised Relation Extraction with Wikipedia and Freebase

FIGURE IT OUT! MIDDLE SCHOOL TASKS. Texas Performance Standards Project

A High-Quality Web Corpus of Czech

Transcription:

SFB 732 D5: Biased Learning for Syntactic Disambiguation Blaubeuren - November 16, 2008

Research Areas Biased Learning for Syntactic Disambiguation Learning from monolingual text (grammatical dependencies, n-gram language model) Learning from bilingual text Disambiguating ambiguous German subjects and objects using the English translations in a German/English parallel text A general approach to improve English syntactic parsing using the German translations in German/English parallel text

SBAR CC who had gray hair DT NN and DT NN a baby a woman Figure: English parse with high attachment (incorrect)

CC DT NN and a baby SBAR DT NN who had gray hair a woman Figure: English parse with low attachment (correct)

C KON ART NN und ein Baby ART NN, S eine Frau, die graue Haare hatte Figure: German parse with low attachment

Reranking approach using rich bitext projection features Goal: improve English parsing accuracy

Reranking approach using rich bitext projection features Goal: improve English parsing accuracy Working on parallel text, e.g., proceedings of European Parliament

Reranking approach using rich bitext projection features Goal: improve English parsing accuracy Working on parallel text, e.g., proceedings of European Parliament Begin by parsing English sentence with Bitpar (Schmid). Select 100 most probable parses

Reranking approach using rich bitext projection features Goal: improve English parsing accuracy Working on parallel text, e.g., proceedings of European Parliament Begin by parsing English sentence with Bitpar (Schmid). Select 100 most probable parses Find most probable parse of German sentence

Reranking approach using rich bitext projection features Goal: improve English parsing accuracy Working on parallel text, e.g., proceedings of European Parliament Begin by parsing English sentence with Bitpar (Schmid). Select 100 most probable parses Find most probable parse of German sentence Using rich bitext projection features, calculate syntactic divergence of each English parse candidate and the (projection of) the German parse

Reranking approach using rich bitext projection features Goal: improve English parsing accuracy Working on parallel text, e.g., proceedings of European Parliament Begin by parsing English sentence with Bitpar (Schmid). Select 100 most probable parses Find most probable parse of German sentence Using rich bitext projection features, calculate syntactic divergence of each English parse candidate and the (projection of) the German parse Choose a high probability English parse candidate with low syntactic divergence

Rich bitext projection features Mix of probabilistic and heuristic features, combined in log-linear model, trained to maximize parsing accuracy General features: tag correspondence, span size difference, parse depth difference Specific features: coordination phenomena, structure Documented in EACL submission Current project: improve parses of Europarl corpus (1.4 million parallel sentences)

D5 contributes to 3 Area D Goals, one long-term SFB goal Types of contextual information: D5 uses contextual information derived from bilingual and monolingual syntactic analyses, at varying levels of granularity (e.g., parse tree vs. n-gram)

D5 contributes to 3 Area D Goals, one long-term SFB goal Types of contextual information: D5 uses contextual information derived from bilingual and monolingual syntactic analyses, at varying levels of granularity (e.g., parse tree vs. n-gram) Learnability of contextual information: D5 uses statistical models of context learned from bilingual and monolingual data, often itself a product of syntactic analysis

D5 contributes to 3 Area D Goals, one long-term SFB goal Types of contextual information: D5 uses contextual information derived from bilingual and monolingual syntactic analyses, at varying levels of granularity (e.g., parse tree vs. n-gram) Learnability of contextual information: D5 uses statistical models of context learned from bilingual and monolingual data, often itself a product of syntactic analysis Use of contextual information: D5 uses statistical models of context for improving syntactic analysis

D5 contributes to 3 Area D Goals, one long-term SFB goal Types of contextual information: D5 uses contextual information derived from bilingual and monolingual syntactic analyses, at varying levels of granularity (e.g., parse tree vs. n-gram) Learnability of contextual information: D5 uses statistical models of context learned from bilingual and monolingual data, often itself a product of syntactic analysis Use of contextual information: D5 uses statistical models of context for improving syntactic analysis Incorporating linguistic insights into statistical models: D5 uses insights into complementarity of English and German ambiguity to improve statistical syntactic disambiguation