Parsing Morphologically Rich Languages:

Similar documents
Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Accurate Unlexicalized Parsing for Modern Hebrew

An Out-of-Domain Test Suite for Dependency Parsing of German

Ensemble Technique Utilization for Indonesian Dependency Parser

Developing a TT-MCTAG for German with an RCG-based Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Annotation Projection for Discourse Connectives

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

The Effect of Multiple Grammatical Errors on Processing Non-Native Writing

Linking Task: Identifying authors and book titles in verbose queries

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

The stages of event extraction

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Prediction of Maximal Projection for Semantic Role Labeling

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

The Role of the Head in the Interpretation of English Deverbal Compounds

Online Updating of Word Representations for Part-of-Speech Tagging

ARNE - A tool for Namend Entity Recognition from Arabic Text

Experiments with a Higher-Order Projective Dependency Parser

Survey on parsing three dependency representations for English

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Parsing of part-of-speech tagged Assamese Texts

Using dialogue context to improve parsing performance in dialogue systems

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

A Case Study: News Classification Based on Term Frequency

CS 598 Natural Language Processing

Exploiting Wikipedia as External Knowledge for Named Entity Recognition

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

The ParisNLP entry at the ConLL UD Shared Task 2017: A Tale of a #ParsingTragedy

Learning Computational Grammars

Two methods to incorporate local morphosyntactic features in Hindi dependency

Training and evaluation of POS taggers on the French MULTITAG corpus

Character Stream Parsing of Mixed-lingual Text

LTAG-spinal and the Treebank

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

Beyond the Pipeline: Discrete Optimization in NLP

Leveraging Sentiment to Compute Word Similarity

Distant Supervised Relation Extraction with Wikipedia and Freebase

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

Semi-supervised Training for the Averaged Perceptron POS Tagger

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Named Entity Recognition: A Survey for the Indian Languages

Development of the First LRs for Macedonian: Current Projects

An Evaluation of POS Taggers for the CHILDES Corpus

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

The Smart/Empire TIPSTER IR System

LING 329 : MORPHOLOGY

Cross Language Information Retrieval

Speech Recognition at ICSI: Broadcast News and beyond

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Multilingual Sentiment and Subjectivity Analysis

Memory-based grammatical error correction

Grammars & Parsing, Part 1:

BYLINE [Heng Ji, Computer Science Department, New York University,

Improving coverage and parsing quality of a large-scale LFG for German

arxiv: v1 [cs.cl] 2 Apr 2017

Python Machine Learning

A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books

Deep Lexical Segmentation and Syntactic Parsing in the Easy-First Dependency Framework

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Phenomena of gender attraction in Polish *

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

An Efficient Implementation of a New POP Model

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Introduction to Text Mining

Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

ROSETTA STONE PRODUCT OVERVIEW

Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011

BULATS A2 WORDLIST 2

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Multi-Lingual Text Leveling

Using a Native Language Reference Grammar as a Language Learning Tool

The Indiana Cooperative Remote Search Task (CReST) Corpus

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Yoshida Honmachi, Sakyo-ku, Kyoto, Japan 1 Although the label set contains verb phrases, they

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

A Graph Based Authorship Identification Approach

Chapter 5: Language. Over 6,900 different languages worldwide

Indian Institute of Technology, Kanpur

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

A deep architecture for non-projective dependency parsing

Modeling function word errors in DNN-HMM based LVCSR systems

Specifying a shallow grammatical for parsing purposes

Florida Reading Endorsement Alignment Matrix Competency 1

Transcription:

1 / 39 Rich Languages: Sandra Kübler Indiana University

2 / 39 Rich Languages joint work with Daniel Dakota, Wolfgang Maier, Joakim Nivre, Djamé Seddah, Reut Tsarfaty, Daniel Whyatt, and many more def. morphologically rich language: expresses multiple levels of information at the word level add information about grammatical function of word, grammatical relations to other words, pronominal clitics, inflectional affixes, etc.

2 / 39 Rich Languages joint work with Daniel Dakota, Wolfgang Maier, Joakim Nivre, Djamé Seddah, Reut Tsarfaty, Daniel Whyatt, and many more def. morphologically rich language: expresses multiple levels of information at the word level add information about grammatical function of word, grammatical relations to other words, pronominal clitics, inflectional affixes, etc. Dan Bikel s classification (2010):

2 / 39 Rich Languages joint work with Daniel Dakota, Wolfgang Maier, Joakim Nivre, Djamé Seddah, Reut Tsarfaty, Daniel Whyatt, and many more def. morphologically rich language: expresses multiple levels of information at the word level add information about grammatical function of word, grammatical relations to other words, pronominal clitics, inflectional affixes, etc. Dan Bikel s classification (2010): morphologically clean: Chinese, English,...

2 / 39 Rich Languages joint work with Daniel Dakota, Wolfgang Maier, Joakim Nivre, Djamé Seddah, Reut Tsarfaty, Daniel Whyatt, and many more def. morphologically rich language: expresses multiple levels of information at the word level add information about grammatical function of word, grammatical relations to other words, pronominal clitics, inflectional affixes, etc. Dan Bikel s classification (2010): morphologically clean: Chinese, English,... morphologically dirty: German, Hungarian,...

2 / 39 Rich Languages joint work with Daniel Dakota, Wolfgang Maier, Joakim Nivre, Djamé Seddah, Reut Tsarfaty, Daniel Whyatt, and many more def. morphologically rich language: expresses multiple levels of information at the word level add information about grammatical function of word, grammatical relations to other words, pronominal clitics, inflectional affixes, etc. Dan Bikel s classification (2010): morphologically clean: Chinese, English,... morphologically dirty: German, Hungarian,... morphologically filthy: Arabic, Hebrew,...

3 / 39 segmentation: for some languages, words do not always correspond to ideal input tokens for parsing: multi-word expressions, syncretism morphology: how do we integrate morphology into the parsing step? lexicon: how do we cope with lower type/token ratio? language/annotation scheme: are some languages harder to parse, or is it the annotation scheme? language independent parsing?

3 / 39 segmentation: for some languages, words do not always correspond to ideal input tokens for parsing: multi-word expressions, syncretism morphology: how do we integrate morphology into the parsing step? lexicon: how do we cope with lower type/token ratio? language/annotation scheme: are some languages harder to parse, or is it the annotation scheme? language independent parsing?

4 / 39 and Question How do we integrate morphology into the parsing process?

5 / 39 and morphologically rich languages need morphology for parsing e.g., in German, case is indicator of grammatical func. of NPs morphological information can be attached to POS tags in pipeline (parsing on top of POS tags), information can be used by parser

6 / 39 Example: NP and Case VROOT S OA HD MO SB NP MO NK RC SB S HD OA NP NK NK APP NP NP PP AA MO NK NK PP NK dieses PDAT NK Buch NN finden VVFIN AC vor APPR NK allem PIS diejenigen PDS schwierig ADJD, $, die PRELS PM am PTKA HD meisten PIS Bildung NN haben VAFIN, $, AC vor APPR NK allem PIS psychoanalytische ADJA Bildung NN ( $(... $( ) $( Acc.Sg.Neut Acc.Sg.Neut 3.Pl.Pres.Ind Dat.Sg.Neut Nom.Pl.* Pos Nom.Pl.* *.*.* Acc.Sg.Fem 3.Pl.Pres.Ind Dat.Sg.Neut Pos.Acc.Sg.Fem Acc.Sg.Fem (1) dieses Buch finden vor allem diejenigen schwierig, die am meisten Bildung haben, vor allem psychoanalytische Bildung (...) this book is difficult, especially for those who have a higher education, especially a higher education in psychoanalysis (...)

7 / 39 Question What exactly is the effect of varying morphological granularity on POS tags on both POS tagging and parsing?

7 / 39 Question What exactly is the effect of varying morphological granularity on POS tags on both POS tagging and parsing? How well do the different POS taggers work with tagsets of a varying level of morphological granularity? Do the differences in POS tagger performance translate into similar differences in parsing quality?

8 / 39 Treebanks and Parser Treebanks TiGer version 2.2, last 5000/5000 sentences for dev/test, rest for training TüBa-D/Z release 8, same amount of sentences for training/dev/test, rest discarded Parser Berkeley Parser

9 / 39 POS Tagger Morfette: Averaged Perceptron RF-Tagger: HMMs/Decision Trees for fine grained tag sets Stanford Tagger: Maximum Entropy SVMTool: Support Vector Machine TnT: Cascaded HMMs Wapiti: CRFs

10 / 39 Tagset variants UTS: Universal Tagset built for cross-language use 12 tags (2) Aber Bremerhavens AfB fordert jetzt Untersuchungsausschuß CONJ NOUN NOUN VERB ADJ NOUN But the Bremerhaven AfB now demands a board of inquiry (3) Ausländische Investoren in Indien wieder willkommen ADJ NOUN ADP NOUN ADV ADJ Foreign investors welcome again in India

10 / 39 Tagset variants UTS: Universal Tagset built for cross-language use 12 tags STTS: Stuttgart-Tübingen tagset based on distributional regularities of German 54 tags (2) Aber Bremerhavens AfB fordert jetzt Untersuchungsausschuß KON NE NE VVFIN ADV NN But the Bremerhaven AfB now demands a board of inquiry (3) Ausländische Investoren in Indien wieder willkommen ADJA NN APPR NE ADV ADJD Foreign investors welcome again in India

10 / 39 Tagset variants UTS: Universal Tagset built for cross-language use 12 tags STTS: Stuttgart-Tübingen tagset based on distributional regularities of German 54 tags STTSmorph: STTS with morphological information morphological component for STTS 585/271 resp. 783/761 available/used in TiGer, resp. TüBa-D/Z (2) Aber Bremerhavens AfB fordert jetzt Untersuchungsausschuß KON NE%gsn NE%nsf VVFIN%3sis ADV NN%asm But the Bremerhaven AfB now demands a board of inquiry (3) Ausländische Investoren in Indien wieder willkommen ADJA%Pos.Nom.Pl.Masc NN%Nom.Pl.Masc APPR NE%Dat.Sg.Neut ADV ADJD%Pos Foreign investors welcome again in India

11 / 39 POS Tagging Evaluation TiGer TüBa-D/Z Tagset Tagger dev test dev test UTS Morfette 98.51 98.09 98.25 98.49 RF-Tagger 97.89 97.41 97.69 97.96 Stanford 97.88 96.83 97.11 97.26 SVMTool 98.54 98.01 98.09 98.28 TnT 97.94 97.48 97.72 97.92 Wapiti 97.54 96.67 97.47 97.80 STTS Morfette 94.12 93.23 92.95 93.41 RF-Tagger 97.04 96.24 96.68 96.84 Stanford 96.26 95.15 95.63 95.79 SVMTool 97.06 96.22 96.46 96.69 TnT 97.15 96.29 96.92 97.00 Wapiti 92.93 91.62 90.99 91.81 STTSmorph Morfette 82.71 80.10 81.19 82.26 RF-Tagger 86.56 83.90 85.68 86.31 Stanford SVMTool 82.47 79.53 80.33 81.31 TnT 85.77 82.77 84.67 85.45 Wapiti 79.83 75.92 77.27 78.29 STTSmorph STTS TnT 97.08 96.15 96.78 96.82

11 / 39 POS Tagging Evaluation TiGer TüBa-D/Z Tagset Tagger dev test dev test UTS Morfette 98.51 98.09 98.25 98.49 RF-Tagger 97.89 97.41 97.69 97.96 Stanford 97.88 96.83 97.11 97.26 SVMTool 98.54 98.01 98.09 98.28 TnT 97.94 97.48 97.72 97.92 Wapiti 97.54 96.67 97.47 97.80 STTS Morfette 94.12 93.23 92.95 93.41 RF-Tagger 97.04 96.24 96.68 96.84 Stanford 96.26 95.15 95.63 95.79 SVMTool 97.06 96.22 96.46 96.69 TnT 97.15 96.29 96.92 97.00 Wapiti 92.93 91.62 90.99 91.81 STTSmorph Morfette 82.71 80.10 81.19 82.26 RF-Tagger 86.56 83.90 85.68 86.31 Stanford SVMTool 82.47 79.53 80.33 81.31 TnT 85.77 82.77 84.67 85.45 Wapiti 79.83 75.92 77.27 78.29 STTSmorph STTS TnT 97.08 96.15 96.78 96.82

11 / 39 POS Tagging Evaluation TiGer TüBa-D/Z Tagset Tagger dev test dev test UTS Morfette 98.51 98.09 98.25 98.49 RF-Tagger 97.89 97.41 97.69 97.96 Stanford 97.88 96.83 97.11 97.26 SVMTool 98.54 98.01 98.09 98.28 TnT 97.94 97.48 97.72 97.92 Wapiti 97.54 96.67 97.47 97.80 STTS Morfette 94.12 93.23 92.95 93.41 RF-Tagger 97.04 96.24 96.68 96.84 Stanford 96.26 95.15 95.63 95.79 SVMTool 97.06 96.22 96.46 96.69 TnT 97.15 96.29 96.92 97.00 Wapiti 92.93 91.62 90.99 91.81 STTSmorph Morfette 82.71 80.10 81.19 82.26 RF-Tagger 86.56 83.90 85.68 86.31 Stanford SVMTool 82.47 79.53 80.33 81.31 TnT 85.77 82.77 84.67 85.45 Wapiti 79.83 75.92 77.27 78.29 STTSmorph STTS TnT 97.08 96.15 96.78 96.82

11 / 39 POS Tagging Evaluation TiGer TüBa-D/Z Tagset Tagger dev test dev test UTS Morfette 98.51 98.09 98.25 98.49 RF-Tagger 97.89 97.41 97.69 97.96 Stanford 97.88 96.83 97.11 97.26 SVMTool 98.54 98.01 98.09 98.28 TnT 97.94 97.48 97.72 97.92 Wapiti 97.54 96.67 97.47 97.80 STTS Morfette 94.12 93.23 92.95 93.41 RF-Tagger 97.04 96.24 96.68 96.84 Stanford 96.26 95.15 95.63 95.79 SVMTool 97.06 96.22 96.46 96.69 TnT 97.15 96.29 96.92 97.00 Wapiti 92.93 91.62 90.99 91.81 STTSmorph Morfette 82.71 80.10 81.19 82.26 RF-Tagger 86.56 83.90 85.68 86.31 Stanford SVMTool 82.47 79.53 80.33 81.31 TnT 85.77 82.77 84.67 85.45 Wapiti 79.83 75.92 77.27 78.29 STTSmorph STTS TnT 97.08 96.15 96.78 96.82

12 / 39 Results TiGer test Tags Tagset POS LP LR LF1 gold UTS 99.97 71.80 70.26 71.02 STTS 99.97 71.90 71.11 71.50 STTSmorph 88.70 67.68 67.99 67.83 parser UTS 97.83 71.13 69.50 70.30 STTS 96.18 71.16 69.84 70.49 STTSmorph 79.05 67.67 67.02 67.34 TnT UTS 96.01 68.37 66.78 67.57 STTS 96.19 71.16 69.84 70.49 STTSmorph 75.05 65.43 64.78 65.10 TüBa-D/Z test Tags Tagset POS LP LR LF1 gold UTS 99.98 82.24 81.94 82.09 STTS 99.99 84.54 84.46 84.50 STTSmorph 90.55 83.57 79.91 81.70 parser UTS 98.58 81.07 80.66 80.87 STTS 97.39 82.93 82.78 82.85 STTSmorph 81.68 81.89 78.20 80.00 TnT UTS 98.58 81.07 80.66 80.87 STTS 97.39 82.93 82.78 82.85 STTSmorph 81.68 81.89 78.20 80.00

12 / 39 Results TiGer test Tags Tagset POS LP LR LF1 gold UTS 99.97 71.80 70.26 71.02 STTS 99.97 71.90 71.11 71.50 STTSmorph 88.70 67.68 67.99 67.83 parser UTS 97.83 71.13 69.50 70.30 STTS 96.18 71.16 69.84 70.49 STTSmorph 79.05 67.67 67.02 67.34 TnT UTS 96.01 68.37 66.78 67.57 STTS 96.19 71.16 69.84 70.49 STTSmorph 75.05 65.43 64.78 65.10 TüBa-D/Z test Tags Tagset POS LP LR LF1 gold UTS 99.98 82.24 81.94 82.09 STTS 99.99 84.54 84.46 84.50 STTSmorph 90.55 83.57 79.91 81.70 parser UTS 98.58 81.07 80.66 80.87 STTS 97.39 82.93 82.78 82.85 STTSmorph 81.68 81.89 78.20 80.00 TnT UTS 98.58 81.07 80.66 80.87 STTS 97.39 82.93 82.78 82.85 STTSmorph 81.68 81.89 78.20 80.00

12 / 39 Results TiGer test Tags Tagset POS LP LR LF1 gold UTS 99.97 71.80 70.26 71.02 STTS 99.97 71.90 71.11 71.50 STTSmorph 88.70 67.68 67.99 67.83 parser UTS 97.83 71.13 69.50 70.30 STTS 96.18 71.16 69.84 70.49 STTSmorph 79.05 67.67 67.02 67.34 TnT UTS 96.01 68.37 66.78 67.57 STTS 96.19 71.16 69.84 70.49 STTSmorph 75.05 65.43 64.78 65.10 TüBa-D/Z test Tags Tagset POS LP LR LF1 gold UTS 99.98 82.24 81.94 82.09 STTS 99.99 84.54 84.46 84.50 STTSmorph 90.55 83.57 79.91 81.70 parser UTS 98.58 81.07 80.66 80.87 STTS 97.39 82.93 82.78 82.85 STTSmorph 81.68 81.89 78.20 80.00 TnT UTS 98.58 81.07 80.66 80.87 STTS 97.39 82.93 82.78 82.85 STTSmorph 81.68 81.89 78.20 80.00

12 / 39 Results TiGer test Tags Tagset POS LP LR LF1 gold UTS 99.97 71.80 70.26 71.02 STTS 99.97 71.90 71.11 71.50 STTSmorph 88.70 67.68 67.99 67.83 parser UTS 97.83 71.13 69.50 70.30 STTS 96.18 71.16 69.84 70.49 STTSmorph 79.05 67.67 67.02 67.34 TnT UTS 96.01 68.37 66.78 67.57 STTS 96.19 71.16 69.84 70.49 STTSmorph 75.05 65.43 64.78 65.10 TüBa-D/Z test Tags Tagset POS LP LR LF1 gold UTS 99.98 82.24 81.94 82.09 STTS 99.99 84.54 84.46 84.50 STTSmorph 90.55 83.57 79.91 81.70 parser UTS 98.58 81.07 80.66 80.87 STTS 97.39 82.93 82.78 82.85 STTSmorph 81.68 81.89 78.20 80.00 TnT UTS 98.58 81.07 80.66 80.87 STTS 97.39 82.93 82.78 82.85 STTSmorph 81.68 81.89 78.20 80.00

13 / 39 Further evaluation We manually checked the parser outputs. sometimes helps, sometimes not. can lead to correct grammatical function label (e.g. case) can lead to over-differentiation of grammatical function label (cf. PP attachment)

Further evaluation We manually checked the parser outputs. sometimes helps, sometimes not. can lead to correct grammatical function label (e.g. case) can lead to over-differentiation of grammatical function label (cf. PP attachment) Interaction between morphology and tree depth Comparison STTS vs. STTSmorph: substructures too flat in TüBa-D/Z and too hierarchical in TiGer confirmed by number of edges: TiGer: more edges in STTSmorph than in STTS TüBa-D/Z: more edges in STTS than in STTSmorph 13 / 39

14 / 39 What did We Learn? POS tagging is easier with less granular tagset amount of morphology for parsing needs to be just right even gold case is not useful for parsing, need different mechanisms for integrating case into parse morphology seems to influence depth of trees in parser output

15 / 39 Further Work Versley & Rehbein (2009) integrate subcategorization into constituent parsing Seeker & Kuhn (2013) use case as filter in dependency parsing they show that solution needs to be language dependent

16 / 39 Hard Language or Hard? Question If a parser performs worse on language A than on language B, does that means A is more difficult than B, or is the difference in the annotation schemes?

17 / 39 Comparing German Treebanks: NEGRA/TIGER vs. TüBa-D/Z differences: TüBa-D/Z: more structure in phrases, topological fields, no traces, no empty categories NEGRA: ver flat phrases, more structure on S level, crossing branches approach: make TüBa-D/Z more similar to NEGRA flatten phrase structure delete unary nodes

18 / 39 A NEGRA Tree In the foyer of the town hall, the history of research on the Hochheimer Spiegel is presented next to the trove.

19 / 39 A TüBa-D/Z Tree The car convoy of the visitors of the rehearsal goes along a street, which is even today called Lagerstraße.

20 / 39 Comparing NEGRA and TüBa-D/Z NEGRA NEG+tr. TüBa-D/Z crossing brackets 1.04 1.03 1.93 func. labeled recall 52.75 49.03 73.65 func. labeled precision 51.85 50.49 76.13 func. labeled F-score 52.30 49.75 74.87 nodes/words (treeb.) 0.88 0.88 2.38 nodes/words (parse) 0.62 0.63 1.30

20 / 39 Comparing NEGRA and TüBa-D/Z NEGRA NEG+tr. TüBa-D/Z crossing brackets 1.04 1.03 1.93 func. labeled recall 52.75 49.03 73.65 func. labeled precision 51.85 50.49 76.13 func. labeled F-score 52.30 49.75 74.87 nodes/words (treeb.) 0.88 0.88 2.38 nodes/words (parse) 0.62 0.63 1.30

21 / 39 A Flattened TüBa-D/Z Tree

22 / 39 Making NEGRA More Similar to TüBa-D/Z cr. br. LR LP F-score % not parsed NEGRA 1.04 52.75 51.85 52.30 12.59 NE field 1.21 69.85 69.53 69.19 2.17 TüBa 1.93 73.65 76.13 74.87 1.03 Tü NU 2.17 62.11 65.43 63.73 9.98 Tü flat 1.07 73.8 74.66 74.23 3.55 Tü fl N 1.29 53.63 58.87 56.13 18.87

22 / 39 Making NEGRA More Similar to TüBa-D/Z cr. br. LR LP F-score % not parsed NEGRA 1.04 52.75 51.85 52.30 12.59 NE field 1.21 69.85 69.53 69.19 2.17 TüBa 1.93 73.65 76.13 74.87 1.03 Tü NU 2.17 62.11 65.43 63.73 9.98 Tü flat 1.07 73.8 74.66 74.23 3.55 Tü fl N 1.29 53.63 58.87 56.13 18.87

22 / 39 Making NEGRA More Similar to TüBa-D/Z cr. br. LR LP F-score % not parsed NEGRA 1.04 52.75 51.85 52.30 12.59 NE field 1.21 69.85 69.53 69.19 2.17 TüBa 1.93 73.65 76.13 74.87 1.03 Tü NU 2.17 62.11 65.43 63.73 9.98 Tü flat 1.07 73.8 74.66 74.23 3.55 Tü fl N 1.29 53.63 58.87 56.13 18.87

23 / 39 What did We Learn? considerable differences between treebanks unclear whether annotation scheme, evaluation metric, differences in text unary nodes, more structure in phrases, and topological fields improve results more structure provides more coverage, but also more chances to make mistakes

24 / 39 Further Work Rehbein & van Genabith (2007): similar experiments, different results possible explanations: different data sets (shorter sentences, different split) evaluation metric favors trees with high number of nodes Kübler et al. (2008): extend work, evaluation with leaf-ancestor (LA) & on converted dependency representation: TIGER better than TüBa-D/Z BUT: LA artificially high BUT: conversion is lossy; loss on parsing structures unknown

25 / 39 Question What can we learn from a shared task scenario with 9 languages with aligned constituent and dependency representations?

26 / 39 Goals of the s Clear view on the state-of-the-art regardless of the framework (constituency or dependency) in a realistic parsing scenario with the most accurate evaluation protocol we could reasonably set up on as many MRLs as possible Trying to asses: what are the remaining challenges... in parsing MRLs in evaluating them

Data Sets 9 Languages Semitic: Arabic, Hebrew Romance: French Germanic: German, Swedish Isolated: Basque, Korean Uralic: Hungarian Slavic: Polish Available in two syntactic representations Constituents (ptb) and Dependency structures (conll) aligned at all levels (token, POS, sentence) containing at least the same morph information available with gold and predicted morphology Training sets: full and reduced (5k sent.) size 27 / 39

28 / 39 Evaluation Protocol 3 Scenarios Gold: provide unambiguous gold morphological provided: segmentation, POS tags, and morphological features Predicted: provided: disambiguated morphological segmentation; unknown: POS tags and morphological features Raw: provided: morphologically ambiguous input; unknown: morphological segmentation + features, POS tags Note, for all languages but Arabic and Hebrew: RAW = Predicted

29 / 39 Evaluation Protocol (2) Evaluation Metrics operating in different dimensions Cross-Parser Evaluation in Gold/Predicted scenarios constituent: Evalb Labeled F-score LeafAncestor s macro averaged accuracy dependency: Eval07 Labeled Attachment Scores also: MWE evaluation scores (for French on dep. structures) Cross-Parser Evaluation in Raw Scenarios Standard metrics not applicable with non-gold tokenization. Instead: TedEval s labeled accuracy (Tsarfaty et al, 2012) on sentences of length 70 tokens

30 / 39 Evaluation Protocol (3) Evaluation Metrics Operating in Different Dimensions (2) Cross-Framework Evaluation compare results by dep. and const. parser use unlabeled TedEval metric: internally converts all representation types into a normalized function tree Cross-Language Evaluation compare parsers for same representation type across different languages reasonable approximation: unlabeled TedEval metric

31 / 39 7 Teams / 20 Systems Teams 1. IMS-SZEGED-CIS 2. ALPAGE-DYALOG 3. MALTOPTIMIZER 4. AI-KU (multi) 5. BASQUE-TEAM (multi) 6. IGM-ALPAGE (French) 7. CADIM (Arabic) System overview 1. Ensemble System (strong POS tagging, morph lexicon, mate+turbo parser, (re)ranker, const. features) 2. Transition based + beam + lattices 3. maltoptimizer+automatic feature selection and splitting 4. maltoptimizer+unlabeled data (word clustering) 5. Ensemble System (Malt Blender)+maltoptimizer+efficient feature selection 6. CRF MWE tagger+lexica+voting system (Mate, pipeline and join) 7. Easy First+ rich lexicon and rich morph features

32 / 39 80,36 70,11 77,98 77,81 69,97 70,15 82,06 75,63 73,21 Results: Dependency (LAS), Predicted 85,86 83,20 Scenario (full) 72,57 82,32 69,01 78,92 81,86 76,35 84,25 84,51 88,66 84,97 80,88 90 88 86 84 82 AI-KU ALPAGE_DYA BASELINE_MA BASQUE_TEA CADIM IGM-ALPAGE IMS-SZEGED-C MALTOPTIMIZ 80 78 76 74 72 70 Arabic Basque French German Hebrew Hungarian Korean Polish Swedish soft_avg languages

33 / 39 red/5k gold/full gold/5k Correlation charts F1 (%, F1 (%, F1 (%, F1 (%, F1 (%, (%, F1 (%, F1 (%, Results: Arabic) Basque) Constituents French) German) Hebrew) (F1), Hungarian) Predicted Korean) Polish) Scenario (full) F1 (%, Swedish) W 79,19 70,50 80,38 78,30 86,96 81,62 71,42 79,23 79,18 GGED 78,66 74,74 79,76 78,28 85,42 85,22 78,56 86,75 80,64 81,32 87,86 81,83 81,27 89,46 91,85 84,27 87,55 83,99 F1 av 92 90 88 86 84 BASELINE_BKY_RAW BASELINE_BKY_TAGG IMS_SZEGED 82 80 78 76 74 72 70 Arabic Basque French German Hebrew Hungarian Korean Polish Swedish soft_avg languages

Results(TedEval Unlabeled Accuracy): Raw Scenario (full) 93,00 92,00 91,00 90,00 89,00 88,00 87,00 86,00 85,00 Arabic full Arabic 5k Hebrew 5k 84,00 IMS-SZEGED-CIS (Const) IMS-SZEGED-CIS (Dep) ALPAGE_DYALOG MALTOPTIMIZER CADIM ALPAGE_DYALOG_RAWLAT AI-KU 34 / 39

Correlation: Label Set, Training Set Size, and LAS Correlation between label set size, treebank size, and mean LAS 90 88 KoG 86 FrG GeG ArG 84 FrG PoG ArG GeP GeG PoG KoP 82 KoG ArP HuG PoP PoP FrP 80 HuG HuP SwG HeG GeP FrP BaG BaP ArP 78 HuP SwP BaG BaP 76 KoP 74 HeP 72 10 50 100 500 1 000 35 / 39

36 / 39 Cross Language Evaluation: (Dependency) 99 98 97 96 95 94 93 92 IMS_SZEGED_CIS-DEP ALPAGE_DYALOG BASELINE_MALT AI-KU MALTOPTIMIZER CADIM 91 Arabic Basque Fench German Hebrew Hungarian Korean Polish Swedish

37 / 39 Cross Framework Evaluation: (Dep. + Const.) 99 98 97 96 95 94 93 92 91 IMS_SZEGED_CIS-DEP IMS_SZEGED_CIS-CONST BASELINE-CONST BASELINE_MALT 90 89 Arabic Basque Fench German Hebrew Hungarian Korean Polish Swedish Avg

38 / 39 What did We Learn? parser (re-)ranking works best across languages clear differences between languages dependencies are not always better BUT: cross-language / cross-framework based on unlabeled data shared task 2014: more, automatically labeled training data (out of domain) does not help exception: languages with small data sets (mostly Swedish)

39 / 39 Go from Here? segmentation: lattice parsing? more discriminative parsing? integrate multi-word expressions

39 / 39 Go from Here? segmentation: lattice parsing? more discriminative parsing? integrate multi-word expressions morphology: need to integrate morphology in useful manner, language specific more discriminative parsing? better word clustering?

39 / 39 Go from Here? segmentation: lattice parsing? more discriminative parsing? integrate multi-word expressions morphology: need to integrate morphology in useful manner, language specific more discriminative parsing? better word clustering? lexicon: better clustering?

39 / 39 Go from Here? segmentation: lattice parsing? more discriminative parsing? integrate multi-word expressions morphology: need to integrate morphology in useful manner, language specific more discriminative parsing? better word clustering? lexicon: better clustering? language vs. annotation: adaptive parsers universal annotation???

Go from Here? segmentation: lattice parsing? more discriminative parsing? integrate multi-word expressions morphology: need to integrate morphology in useful manner, language specific more discriminative parsing? better word clustering? lexicon: better clustering? language vs. annotation: adaptive parsers universal annotation??? language independence: better feature engineering? get away from reranking? 39 / 39