Information Status in Generation Ranking
|
|
- Victoria McDaniel
- 6 years ago
- Views:
Transcription
1 Aoife Cahill nformation Status in Generation Ranking 1 / 57 nformation Status in Generation Ranking Aoife Cahill joint work with Arndt Riester Heidelberg Computational Linguistics Colloquium December 9, 2010
2 Aoife Cahill nformation Status in Generation Ranking 2 / 57 Outline 1 ntroduction 2 nformation Status 3 Approximating nformation Status 4 Generation Ranking 5 Predicting nformation Status 6 Generation Ranking Revisited 7 Conclusion
3 Aoife Cahill nformation Status in Generation Ranking 3 / 57 Outline 1 ntroduction 2 nformation Status 3 Approximating nformation Status 4 Generation Ranking 5 Predicting nformation Status 6 Generation Ranking Revisited 7 Conclusion
4 Aoife Cahill nformation Status in Generation Ranking 4 / 57 Outlining the problem German is considered a relatively free word order language (with a rich case system) Notion dates from a time when discourse information did not play much of a role in linguistics Our task: generating German strings from LFG F-structures The problem: how to choose the best string from the many grammatical strings output by the system?
5 Aoife Cahill nformation Status in Generation Ranking 5 / 57 Surface Realisation System Lexical Functional Grammar F-Structure Basic predicate argument structure "Die Nato werde nicht von der EU geführt." PRED SUBJ OBL-AG ADJUNCT CHECK 'führen<[249:von], [21:Nato]>' PRED 'Nato' _SPEC-TYPE _COUNT +, _DEF +, _DET attr CHECK _NFL strong-det NTYPE NSYN proper SPEC DET PRED 'die' DET-TYPE def 21 CASE nom, GEND fem, NUM sg, PERS 3 PRED OBJ 'von<[283:eu]>' PRED 'EU' CHECK _SPEC-TYPE _COUNT +, _DEF +, _DET attr _NFL strong-det NTYPE NSYN proper PRED 'die' SPEC DET DET-TYPE def 283 CASE dat, GEND fem, NUM sg, PERS PSEM dir, PTYPE sem PRED 'nicht' 215 ADJUNCT-TYPE neg _AUX-FORM werden-pass VLEX _VMORPH _AUX-SELECT sein _PARTCPLE perfect TNS-ASP MOOD subjunctive, PASS-SEM dynamic_, TENSE pres TOPC [21:Nato] 128 CLAUSE-TYPE decl, PASSVE +, STMT-TYPE decl, VTYPE main
6 Aoife Cahill nformation Status in Generation Ranking 6 / 57 Surface Realisation System Hand Crafted Large-Scale Grammar (Rohrer and Forst, 2006) generates all possible (grammatical) strings. NATO is not led by the EU. Die Nato werde von der EU nicht geführt. Nicht von der EU geführt werde die Nato. Nicht werde die Nato von der EU geführt. Nicht geführt werde die Nato von der EU. Von der EU werde die Nato nicht geführt. Von der EU geführt werde nicht die Nato. Geführt werde die Nato nicht von der EU. Geführt werde nicht von der EU die Nato. Geführt werde von der EU nicht die Nato. Die Nato werde nicht von der EU geführt. Nicht werde von der EU die Nato geführt. Nicht geführt werde von der EU die Nato. Von der EU nicht geführt werde die Nato. Von der EU werde nicht die Nato geführt. Von der EU geführt werde die Nato nicht. Geführt werde die Nato von der EU nicht. Geführt werde nicht die Nato von der EU. Geführt werde von der EU die Nato nicht.
7 Aoife Cahill nformation Status in Generation Ranking 7 / 57 Surface Realisation System (Cahill et al., 2007) Log-linear ranking model chooses most likely string Linguistically Motivated Feature Types 1. C-structure number of NPs, number of children of PP 2. C- & F-Structure SUBJ precedes OBJ 3. Language Model tri-gram score Outperforms a basic tri-gram language model, but can be further improved
8 Aoife Cahill nformation Status in Generation Ranking 7 / 57 Surface Realisation System (Cahill et al., 2007) Log-linear ranking model chooses most likely string Linguistically Motivated Feature Types 1. C-structure number of NPs, number of children of PP 2. C- & F-Structure SUBJ precedes OBJ 3. Language Model tri-gram score Outperforms a basic tri-gram language model, but can be further improved dea: Capturing the influence of discourse information can help choose the best string
9 Aoife Cahill nformation Status in Generation Ranking 8 / 57 Outline 1 ntroduction 2 nformation Status 3 Approximating nformation Status 4 Generation Ranking 5 Predicting nformation Status 6 Generation Ranking Revisited 7 Conclusion
10 Aoife Cahill nformation Status in Generation Ranking 9 / 57 nformation Status (S) (Prince 1981,1992) Means of discourse analysis Classifying (NP/PP/DP) constituents according to their givenness S is marked in prosody (Baumann, 2006; Schweitzer et al., 2009) as well as in syntax Corpus of German news texts manually annotated for S Advantages with regard to earlier S work: proper treatment of embedded phrases higher inter-annotator agreement on difficult texts closer to insights from semantic theory (e.g. semantic presuppositions)
11 Aoife Cahill nformation Status in Generation Ranking 10 / 57 S Labels: Riester, Lorenz, Seemann (2010) Full BRDGNG BRDGNG-CONTANED CATAPHOR EXPLETVE GVEN-EPTHET GVEN-PRONOUN GVEN-REFLEXVE GVEN-REPEATED GVEN-SHORT NDEF-GENERC NDEF-NEW NDEF-PARTTVE NDEF-PARTTVE-CONTANED NDEF-RESUMPTVE NULL RELATVE STUATVE UNUSED-KNOWN UNUSED-TYPE UNUSED-UNKNOWN Collapsed BRDGNG CATAPHOR EXPLETVE GVEN NDEF NULL RELATVE STUATVE UNUSED
12 Aoife Cahill nformation Status in Generation Ranking 11 / 57 Most mportant Classes GVEN BRDGNG UNUSED-KNOWN UNUSED-UNKNOWN coreferential anaphor non-coreferential but context dependent expression discourse new, familiar definite discourse new, unfamiliar definite Merkel... sie Stuttgart... der Bahnhof der Mond das neue Gesetz zur Gesundheitsreform STUATVE deictic expression am Dienstag NDEF indefinite einige hundert Menschen
13 Aoife Cahill nformation Status in Generation Ranking 12 / 57 Grammaticality and markedness Two grammatical sentences The army has even been able to recapture smaller territories. (1) Die Armee habe sogar kleinere Gebiete zurückerobern können. (ok) (2) Kleinere Gebiete habe die Armee sogar zurückerobern können. (strongly marked) A sentence is marked precisely if there are only few or very special contexts in which it is appropriate
14 Aoife Cahill nformation Status in Generation Ranking 13 / 57 Capturing context nformation status reflects context to a certain degree S labels taken from corpus The army has even been able to recapture smaller territories. (3) Die Armee GVEN-EPTHET habe sogar kleinere Gebiete NDEF-NEW zurückerobern können. The givenness/novelty of an expression characterise the class of contexts in which the expression can occur Compute the preferred order for each pair of S labels
15 Aoife Cahill nformation Status in Generation Ranking 14 / 57 Precedence of label pairs within a clause X before Y (e.g. BRDGNG before UNUSED-UNKNOWN) Die Gespräche BRDGNG sollen heute in Jerusalem UNUSED-KNOWN fortgesetzt werden. The talks shall be continued in Jerusalem today. Occurrences in corpus: 49
16 Aoife Cahill nformation Status in Generation Ranking 14 / 57 Precedence of label pairs within a clause X before Y (e.g. BRDGNG before UNUSED-UNKNOWN) Die Gespräche BRDGNG sollen heute in Jerusalem UNUSED-KNOWN fortgesetzt werden. The talks shall be continued in Jerusalem today. Occurrences in corpus: 49 Y before X (e.g. UNUSED-UNKNOWN before BRDGNG) So müsse dies die britische Regierung UNUSED-KNOWN den Bürgern BRDGNG klarmachen. Thus, the British Government should make this clear to the citizens. Occurrences in corpus: 81
17 Aoife Cahill nformation Status in Generation Ranking 14 / 57 Precedence of label pairs within a clause X before Y (e.g. BRDGNG before UNUSED-UNKNOWN) Die Gespräche BRDGNG sollen heute in Jerusalem UNUSED-KNOWN fortgesetzt werden. The talks shall be continued in Jerusalem today. Occurrences in corpus: 49 less prominent order B Y before X (e.g. UNUSED-UNKNOWN before BRDGNG) So müsse dies die britische Regierung UNUSED-KNOWN den Bürgern BRDGNG klarmachen. Thus, the British Government should make this clear to the citizens. Occurrences in corpus: 81 dominant order A
18 Aoife Cahill nformation Status in Generation Ranking 15 / 57 Defining a measure Asymmetry ratio A (Dominant order) B Asym. ratio B/A Total Compute asymmetry ratio for each pair of S labels.
19 Asymmetry tables (top) Dominant order Asym. ratio Freq UNUSED-KNOWN before CATAPHOR GVEN-REPEATED before UNUSED-TYPE GVEN-PRONOUN before STUATVE GVEN-REFLEXVE before NDEF-NEW GVEN-PRONOUN before CATAPHOR GVEN-PRONOUN before NDEF-NEW BRDGNG before NDEF-GENERC GVEN-SHORT before GVEN-REPEATED GVEN-PRONOUN before UNUSED-TYPE GVEN-REFLEXVE before UNUSED-TYPE GVEN-EPTHET before UNUSED-TYPE UNUSED-KNOWN before UNUSED-TYPE EXPLETVE before NDEF-NEW Aoife Cahill... nformation Status in Generation Ranking 16 / 57
20 Aoife Cahill nformation Status in Generation Ranking 17 / 57 The crucial problem S is an indicator for constituent order, but... there is no reliable automatic annotation system for S
21 Aoife Cahill nformation Status in Generation Ranking 17 / 57 The crucial problem S is an indicator for constituent order, but... there is no reliable automatic annotation system for S First Attempt (Cahill and Riester, 2009): use morphosyntactic features correlated with S
22 Aoife Cahill nformation Status in Generation Ranking 18 / 57 Outline 1 ntroduction 2 nformation Status 3 Approximating nformation Status 4 Generation Ranking 5 Predicting nformation Status 6 Generation Ranking Revisited 7 Conclusion
23 Syntactic Features We define an inventory of syntactic features that can appear under all S labels and automatically mark up the corpus with them. The features include: is simple definite is simple definite description with a possessive modifier is definite description with adjectival modifier is definite description with a genitive argument is definite description with an (obligatory/referentially restricting) PP adjunct is definite description including a relative clause is definite description including an embedded proper name and (perhaps) a title or job description is a combination of position/title and proper name (without article) is a bare proper name... Aoife Cahill nformation Status in Generation Ranking 19 / 57
24 Aoife Cahill nformation Status in Generation Ranking 20 / 57 Morphosyntactic correlates of S Some S categories directly derive from syntactic classes (1:1 correspondence) GVEN-REFLEXVE s a reflexive pronoun (all items) EXPLETVE s an expletive, e.g. es (all items)
25 Aoife Cahill nformation Status in Generation Ranking 21 / 57 Morphosyntactic correlates of S Some S categories are represented by various features UNUSED-KNOWN feature items example s a simple definite 145 the moon s a name with a title 55 President Obama s a bare noun 54 Africa s definite with apposition 36 the German Chancellor, Angela Merkel...
26 Aoife Cahill nformation Status in Generation Ranking 22 / 57 Syntactic Features and S phrases Extracting information from the corpus We have a corpus that is: annotated with S labels marked up with syntactic features For each phrase annotated with an S label, look at what syntactic features are present Collect statistics for each S label type
27 Aoife Cahill nformation Status in Generation Ranking 23 / 57 Syntactic Features associated with S labels GVEN-PRONOUN Syn. Feat Count S_PERS_PRON 88 S_DA_PRON 56 S_DEMON_PRON 41 S_GENERC_PRON 16
28 Aoife Cahill nformation Status in Generation Ranking 23 / 57 Syntactic Features associated with S labels GVEN-PRONOUN Syn. Feat Count S_PERS_PRON 88 S_DA_PRON 56 S_DEMON_PRON 41 S_GENERC_PRON 16 NDEF-NEW Syn. Feat Count S_SMPLE_NDEF 203 S_NDEF_ATTR 95 S_NDEF_NUM 85 S_NDEF_GENARG 20 S_NDEF_PPADJUNCT 19...
29 Aoife Cahill nformation Status in Generation Ranking 24 / 57 S asymmetries with syntactic features Label 1 Label 2 Ratio Freq. UNUSED-KNOWN CATAPHOR S_BAREPROPER 166 S_SMPLE_DEF 14 S_SMPLE_DEF 102 S_DA_PRON 13 S_PROPER 85 GVEN-REPEATED UNUSED-TYPE S_SMPLE_DEF 28 S_SMPLE_DEF 37 S_BAREPROPER 23 S_SMPLE_NDEF 36 GVEN-PRONOUN STUATVE S_PERS_PRON 88 S_TEMP_ADV 62 S_DA_PRON 56 S_SMPLE_DEF 44 S_DEMON_PRON 41 S_DEF_ATTR_ADJUNCT 23 S_GENERC_PRON 16 S_SMPLE_NDEF 19...
30 Aoife Cahill nformation Status in Generation Ranking 25 / 57 New Features From each S asymmetry extract precedence patterns of corresponding syntactic features GVEN-PRONOUN STUATVE S_PERS_PRON 88 S_TEMP_ADV 62 S_DA_PRON 56 S_SMPLE_DEF 44 S_DEMON_PRON 41 S_DEF_ATTR_ADJUNCT 23 S_GENERC_PRON 16 S_SMPLE_NDEF 19
31 Aoife Cahill nformation Status in Generation Ranking 25 / 57 New Features From each S asymmetry extract precedence patterns of corresponding syntactic features GVEN-PRONOUN STUATVE S_PERS_PRON 88 S_TEMP_ADV 62 S_DA_PRON 56 S_SMPLE_DEF 44 S_DEMON_PRON 41 S_DEF_ATTR_ADJUNCT 23 S_GENERC_PRON 16 S_SMPLE_NDEF 19 S_PERS_PRON precedes S_TEMP_ADV
32 Aoife Cahill nformation Status in Generation Ranking 25 / 57 New Features From each S asymmetry extract precedence patterns of corresponding syntactic features GVEN-PRONOUN STUATVE S_PERS_PRON 88 S_TEMP_ADV 62 S_DA_PRON 56 S_SMPLE_DEF 44 S_DEMON_PRON 41 S_DEF_ATTR_ADJUNCT 23 S_GENERC_PRON 16 S_SMPLE_NDEF 19 S_PERS_PRON precedes S_TEMP_ADV S_PERS_PRON precedes S_SMPLE_DEF
33 Aoife Cahill nformation Status in Generation Ranking 25 / 57 New Features From each S asymmetry extract precedence patterns of corresponding syntactic features GVEN-PRONOUN STUATVE S_PERS_PRON 88 S_TEMP_ADV 62 S_DA_PRON 56 S_SMPLE_DEF 44 S_DEMON_PRON 41 S_DEF_ATTR_ADJUNCT 23 S_GENERC_PRON 16 S_SMPLE_NDEF 19 S_PERS_PRON precedes S_TEMP_ADV S_PERS_PRON precedes S_SMPLE_DEF S_PERS_PRON precedes S_DEF_ATTR_ADJUNCT
34 Aoife Cahill nformation Status in Generation Ranking 25 / 57 New Features From each S asymmetry extract precedence patterns of corresponding syntactic features GVEN-PRONOUN STUATVE S_PERS_PRON 88 S_TEMP_ADV 62 S_DA_PRON 56 S_SMPLE_DEF 44 S_DEMON_PRON 41 S_DEF_ATTR_ADJUNCT 23 S_GENERC_PRON 16 S_SMPLE_NDEF 19 S_PERS_PRON precedes S_TEMP_ADV S_PERS_PRON precedes S_SMPLE_DEF S_PERS_PRON precedes S_DEF_ATTR_ADJUNCT S_PERS_PRON precedes S_SMPLE_NDEF
35 Aoife Cahill nformation Status in Generation Ranking 25 / 57 New Features From each S asymmetry extract precedence patterns of corresponding syntactic features GVEN-PRONOUN STUATVE S_PERS_PRON 88 S_TEMP_ADV 62 S_DA_PRON 56 S_SMPLE_DEF 44 S_DEMON_PRON 41 S_DEF_ATTR_ADJUNCT 23 S_GENERC_PRON 16 S_SMPLE_NDEF 19 S_PERS_PRON precedes S_TEMP_ADV S_PERS_PRON precedes S_SMPLE_DEF S_PERS_PRON precedes S_DEF_ATTR_ADJUNCT S_PERS_PRON precedes S_SMPLE_NDEF S_DA_PRON precedes S_TEMP_ADV
36 New Features From each S asymmetry extract precedence patterns of corresponding syntactic features GVEN-PRONOUN STUATVE S_PERS_PRON 88 S_TEMP_ADV 62 S_DA_PRON 56 S_SMPLE_DEF 44 S_DEMON_PRON 41 S_DEF_ATTR_ADJUNCT 23 S_GENERC_PRON 16 S_SMPLE_NDEF 19 S_PERS_PRON precedes S_TEMP_ADV S_PERS_PRON precedes S_SMPLE_DEF S_PERS_PRON precedes S_DEF_ATTR_ADJUNCT S_PERS_PRON precedes S_SMPLE_NDEF S_DA_PRON precedes S_TEMP_ADV S_DA_PRON precedes S_SMPLE_DEF S_DA_PRON precedes S_DEF_ATTR_ADJUNCT S_DA_PRON precedes S_SMPLE_NDEF S_DEMON_PRON precedes S_TEMP_ADV... Aoife Cahill nformation Status in Generation Ranking 25 / 57
37 Aoife Cahill nformation Status in Generation Ranking 26 / 57 mproved Generation Ranking Model We include these new features in our svm model for generation ranking Feature Types 1. C-structure number of NPs, number of children of PP 2. C- & F-Structure SUBJ precedes OBJ 3. Language Model tri-gram score 4. S asymmetric syntactic patterns S_PERS_PRON precedes S_TEMP_ADV
38 Aoife Cahill nformation Status in Generation Ranking 27 / 57 Outline 1 ntroduction 2 nformation Status 3 Approximating nformation Status 4 Generation Ranking 5 Predicting nformation Status 6 Generation Ranking Revisited 7 Conclusion
39 Aoife Cahill nformation Status in Generation Ranking 28 / 57 System Overview Machine Translation Sentence Condensation LFG F-Structure Grammar All Strings Summarisation Corpus Sentences Language Model Features Linguistically Motivated Features Ranking Model S features? Best String
40 Aoife Cahill nformation Status in Generation Ranking 29 / 57 Experimental Setup Experiment Train svm ranking model on 7161 syntactically annotated sentences from TGER Tune model parameters on development set of 55 sentences Carry out final evaluation on test set of 260 sentences
41 Aoife Cahill nformation Status in Generation Ranking 30 / 57 Results Evaluation on 260 sentences BLEU measures string similarity using ngrams Slightly different to Cahill and Riester (2009): Uses SVM rank instead of log-linear model asymmetries calculated from more data... but same features
42 Aoife Cahill nformation Status in Generation Ranking 30 / 57 Results Evaluation on 260 sentences BLEU measures string similarity using ngrams Slightly different to Cahill and Riester (2009): Uses SVM rank instead of log-linear model asymmetries calculated from more data... but same features BLEU Exact Match (%) Baseline S Approx
43 Aoife Cahill nformation Status in Generation Ranking 30 / 57 Results Evaluation on 260 sentences BLEU measures string similarity using ngrams Slightly different to Cahill and Riester (2009): Uses SVM rank instead of log-linear model asymmetries calculated from more data... but same features BLEU Exact Match (%) Baseline S Approx Statistically significant improvement with model including new S-inspired syntactic features
44 Aoife Cahill nformation Status in Generation Ranking 31 / 57 Example Sentences We have learnt from the scandal Gold Man hat aus der Affäre gelernt. One has from the scandal learnt.
45 Aoife Cahill nformation Status in Generation Ranking 31 / 57 Example Sentences We have learnt from the scandal Gold Man hat aus der Affäre gelernt. One has from the scandal learnt. Baseline Aus der Affäre hat man gelernt. From the scandal has one learnt.
46 Aoife Cahill nformation Status in Generation Ranking 31 / 57 Example Sentences We have learnt from the scandal Gold Man hat aus der Affäre gelernt. One has from the scandal learnt. Baseline Aus der Affäre hat man gelernt. From the scandal has one learnt. New Man hat aus der Affäre gelernt. One has from the scandal learnt.
47 Aoife Cahill nformation Status in Generation Ranking 32 / 57 Outline 1 ntroduction 2 nformation Status 3 Approximating nformation Status 4 Generation Ranking 5 Predicting nformation Status 6 Generation Ranking Revisited 7 Conclusion
48 Aoife Cahill nformation Status in Generation Ranking 33 / 57 Predicting nformation Status? We showed that for realisation ranking, the approximation of the morpho-syntactic features of the information status labels helped But what if we could automatically label raw text with information status labels?
49 Aoife Cahill nformation Status in Generation Ranking 34 / 57 Supervised Learning Task Given a corpus of manually annotated radio news 3454 sentences remove duplicates divide into 10% development (129 sentences), 90% training/test (1169 sentences) parse with XLE German grammar Task: sequence labelling Model: Conditional Random Field Designed Features to capture the basic geometry of the expressions
50 Aoife Cahill nformation Status in Generation Ranking 35 / 57 Capturing the Geometry of Expressions STUATVE STUATVE
51 Aoife Cahill nformation Status in Generation Ranking 36 / 57 Capturing the Geometry of Expressions GVEN-SHORT GVEN-PRONOUN
52 Aoife Cahill nformation Status in Generation Ranking 37 / 57 Capturing the Geometry of Expressions BRDGNG-CONTANED
53 Aoife Cahill nformation Status in Generation Ranking 38 / 57 Capturing the Geometry of Expressions UNUSED-UNKNOWN
54 Aoife Cahill nformation Status in Generation Ranking 39 / 57 Model Features Starting Point Morpho-syntactic features from previous work Things we count Words Specific syntactic categories: DP, NP, DP-APPOSS, LABELP, NAMEP, YEAR, A-CARD Children of the top category Maximum path length from top node to POS tags N-ary branching nodes (n > 1)
55 Aoife Cahill nformation Status in Generation Ranking 40 / 57 Model Features Binary Features Coordination Coreferent More than 1 DP and NP Pronoun First/Last label in the sentences Other Features Determiner type (definite, indefinite, unknown) Syntactic category of the top-most node dominating the string Syntactic function of the substring POS tag at left/right edge of the substring
56 Evaluation Carry out 10-fold cross validation on our test/train data (1169 sentence, 3705 labels) Evaluate on both sets of labels: full (20) and collapsed (9) Three Baselines: 1 Randomly assign a label to each phrase 2 Always assign the most frequent label to each phrase 3 nformed: assign the most frequent label, given the morpho-syntactic features from previous experiments Aoife Cahill nformation Status in Generation Ranking 41 / 57
57 Evaluation Carry out 10-fold cross validation on our test/train data (1169 sentence, 3705 labels) Evaluate on both sets of labels: full (20) and collapsed (9) Three Baselines: 1 Randomly assign a label to each phrase 2 Always assign the most frequent label to each phrase 3 nformed: assign the most frequent label, given the morpho-syntactic features from previous experiments Accuracy (%) Full Collapsed Random Most Frequent nformed Aoife Cahill nformation Status in Generation Ranking 41 / 57
58 Aoife Cahill nformation Status in Generation Ranking 42 / 57 CRF Model Prediction Results Accuracy (%) Full Collapsed Random Most Frequent nformed CRF % increase in full label set accuracy, 16.39% increase on collapsed set accuracy
59 Aoife Cahill nformation Status in Generation Ranking 43 / 57 Detailed CRF Prediction Results Label Total Precision Recall F-Score BRDGNG CATAPHOR EXPLETVE GVEN NDEF NULL RELATVE STUATVE UNUSED High level prediction could be used to suggest possible labels to annotators and possibly speed up the manual annotation process
60 Aoife Cahill nformation Status in Generation Ranking 43 / 57 Detailed CRF Prediction Results Label Total Precision Recall F-Score BRDGNG CATAPHOR EXPLETVE GVEN NDEF NULL RELATVE STUATVE UNUSED High level prediction could be used to suggest possible labels to annotators and possibly speed up the manual annotation process
61 Aoife Cahill nformation Status in Generation Ranking 44 / 57 Detailed CRF Prediction Results Label Total Precision Recall F-Score BRDGNG BRDGNG-CONTANED CATAPHOR EXPLETVE GVEN-EPTHET GVEN-PRONOUN GVEN-REFLEXVE GVEN-REPEATED GVEN-SHORT NDEF-GENERC NDEF-NEW NDEF-PARTTVE NDEF-PARTTVE-CONTANED NDEF-RESUMPTVE NULL RELATVE STUATVE UNUSED-KNOWN UNUSED-TYPE UNUSED-UNKNOWN
62 Aoife Cahill nformation Status in Generation Ranking 44 / 57 Detailed CRF Prediction Results Label Total Precision Recall F-Score BRDGNG BRDGNG-CONTANED CATAPHOR EXPLETVE GVEN-EPTHET GVEN-PRONOUN GVEN-REFLEXVE GVEN-REPEATED GVEN-SHORT NDEF-GENERC NDEF-NEW NDEF-PARTTVE NDEF-PARTTVE-CONTANED NDEF-RESUMPTVE NULL RELATVE STUATVE UNUSED-KNOWN UNUSED-TYPE UNUSED-UNKNOWN
63 Aoife Cahill nformation Status in Generation Ranking 44 / 57 Detailed CRF Prediction Results Label Total Precision Recall F-Score BRDGNG BRDGNG-CONTANED CATAPHOR EXPLETVE GVEN-EPTHET GVEN-PRONOUN GVEN-REFLEXVE GVEN-REPEATED GVEN-SHORT NDEF-GENERC NDEF-NEW NDEF-PARTTVE NDEF-PARTTVE-CONTANED NDEF-RESUMPTVE NULL RELATVE STUATVE UNUSED-KNOWN UNUSED-TYPE UNUSED-UNKNOWN
64 Aoife Cahill nformation Status in Generation Ranking 44 / 57 Detailed CRF Prediction Results Label Total Precision Recall F-Score BRDGNG BRDGNG-CONTANED CATAPHOR EXPLETVE GVEN-EPTHET GVEN-PRONOUN GVEN-REFLEXVE GVEN-REPEATED GVEN-SHORT NDEF-GENERC NDEF-NEW NDEF-PARTTVE NDEF-PARTTVE-CONTANED NDEF-RESUMPTVE NULL RELATVE STUATVE UNUSED-KNOWN UNUSED-TYPE UNUSED-UNKNOWN
65 Aoife Cahill nformation Status in Generation Ranking 45 / 57 Confusion Matrix (Human Annotators) Riester, Lorenz, Seemann (2010) A B C D E F G H J K L M N O P Q R S T A B C D E F G 65 1 H J K L M N O P Q 11 R 4 S 1 5 T 1 45
66 Aoife Cahill nformation Status in Generation Ranking 46 / 57 Confusion Matrix (Automatic System) A B C D E F G H J K L M N O P Q R S T A B C D 73 E F G 2 95 H J K L M N 3 19 O 1 P 7 Q R S T
67 Aoife Cahill nformation Status in Generation Ranking 46 / 57 Confusion Matrix (Automatic System) A B C D E F G H J K L M N O P Q R S T A B C D 73 E F G 2 95 H J K L M N 3 19 O 1 P 7 Q R S T
68 Aoife Cahill nformation Status in Generation Ranking 47 / 57 Confusion Matrix BRDGNG K R A BRDGNG-CONTANED C D E F G H NDEF-GENERC 1 76 NDEF-NEW NDEF-PARTTVE M 35 1 N 19 O P STUATVE UNUSED-KNOWN UNUSED-TYPE UNUSED-UNKNOWN
69 Aoife Cahill nformation Status in Generation Ranking 47 / 57 Confusion Matrix BRDGNG K R A BRDGNG-CONTANED C Confusing BRDGNG with UNUSED-KNOWN D Human annotators have E the same confusion 5/89 times F (4) Die Behörden gaben G eine Tsunami-Warnung für die H The authorities gave a Tsunami-warning for the Westküste heraus. NDEF-GENERC 1 76 west coast out. NDEF-NEW The authorities NDEF-PARTTVE gave a Tsunami-warning 3 78for the3 west M 35 1 coast N 19 O P STUATVE UNUSED-KNOWN UNUSED-TYPE UNUSED-UNKNOWN
70 Aoife Cahill nformation Status in Generation Ranking 47 / 57 Confusion Matrix A NDEF-NEW R BRDGNG BRDGNG-CONTANED C D E F G H NDEF-GENERC 1 76 K NDEF-PARTTVE NDEF-PARTTVE-CONTANED 35 1 NDEF-RESUMPTVE 19 O P STUATVE UNUSED-KNOWN UNUSED-TYPE UNUSED-UNKNOWN
71 Aoife Cahill nformation Status in Generation Ranking 47 / 57 Confusion Matrix A NDEF-NEW R BRDGNG BRDGNG-CONTANED C Confusing NDEF-NEW with NDEF-GENERC D Human annotators have E the same confusion 20/144 times F (5) Nach Angaben G japanischer Medien kam ein Mensch H According to reports Japanese media came a person ums Leben, NDEF-GENERC viele Einwohner wurden 1 verletzt. 76 for life, manykinhabitants were1 injured NDEF-PARTTVE According to Japanese media reports, one person died, NDEF-PARTTVE-CONTANED 35 1 many inhabitants were injured NDEF-RESUMPTVE 19 O P STUATVE UNUSED-KNOWN UNUSED-TYPE UNUSED-UNKNOWN
72 Aoife Cahill nformation Status in Generation Ranking 47 / 57 Confusion Matrix A K UNUSED-KNOWN BRDGNG BRDGNG-CONTANED C D E F G H J 1 76 NDEF-NEW NDEF-PARTTVE NDEF-PARTTVE-CONTANED 35 1 N 19 O P STUATVE UNUSED-KNOWN UNUSED-TYPE UNUSED-UNKNOWN
73 Aoife Cahill nformation Status in Generation Ranking 47 / 57 Confusion Matrix A K UNUSED-KNOWN BRDGNG BRDGNG-CONTANED C Confusing UNUSED-KNOWN with UNUSED-UNKNOWN D E F (6) Der Kölner Erzbischof G Meisner kritisiert die H The Cologne Archbishop Meisner criticised the Familienpolitik Jder Bundesregierung family politics NDEF-NEW of the federal government The Archbishop NDEF-PARTTVE of Cologne, Meisner, 3 78 criticised3the NDEF-PARTTVE-CONTANED 35 1 family policies of the federal government N 19 O P STUATVE UNUSED-KNOWN UNUSED-TYPE UNUSED-UNKNOWN Human annotators have the same confusion 7 / 134 times
74 Aoife Cahill nformation Status in Generation Ranking 48 / 57 Addressing our underlying assumptions 1 Gold-standard co-reference information (D-GVEN) 2 Gold-standard markables
75 Aoife Cahill nformation Status in Generation Ranking 48 / 57 Addressing our underlying assumptions 1 Gold-standard co-reference information (D-GVEN) 2 Gold-standard markables Real-world applications will not have access to this information Test two automatic co-reference systems on the data Accuracy (%) Full Collapsed Gold None Simple Unsupervised
76 Aoife Cahill nformation Status in Generation Ranking 49 / 57 Summary of Automatic S Label Prediction Trained a CRF on manually annotated text Results are high for collapsed label set (81.65%) and well above baseline for full label set (64.87%) Often the mistakes made by the automatic system are similar to the disagreements that human annotators have
77 Aoife Cahill nformation Status in Generation Ranking 49 / 57 Summary of Automatic S Label Prediction Trained a CRF on manually annotated text Results are high for collapsed label set (81.65%) and well above baseline for full label set (64.87%) Often the mistakes made by the automatic system are similar to the disagreements that human annotators have Q: How useful is it in practice?
78 Aoife Cahill nformation Status in Generation Ranking 50 / 57 Outline 1 ntroduction 2 nformation Status 3 Approximating nformation Status 4 Generation Ranking 5 Predicting nformation Status 6 Generation Ranking Revisited 7 Conclusion
79 Aoife Cahill nformation Status in Generation Ranking 51 / 57 An application for S Label Prediction Revisit our earlier realisation ranking experiments No need to use approximations of S Labels any more Train CRF on 1169 sentences of manually annotated corpus (test/train) Automatically assign an S label to every DP/NP in our TGER training data (21,341 phrases) Extract S Label order patterns directly
80 Aoife Cahill nformation Status in Generation Ranking 52 / 57 Even Newer Generation Ranking Model We include the S Label asymmetric patterns directly into the svm ranking model now Feature Types 1. C-structure number of NPs, number of children of PP 2. C- & F-Structure SUBJ precedes OBJ 3. Language Model tri-gram score 4. S asymmetric syntactic patterns S_PERS_PRON precedes S_TEMP_ADV 4. S label asymmetric patterns D-GVEN-SHORT precedes NDEF-NEW
81 Aoife Cahill nformation Status in Generation Ranking 53 / 57 Evaluation Evaluate on 260 sentences BLEU Exact Match (%) Baseline S Approx S Label (full) S Label (collapsed) Difference between the S Label (full) model and all other models is statistically significant
82 Aoife Cahill nformation Status in Generation Ranking 53 / 57 Evaluation Evaluate on 260 sentences BLEU Exact Match (%) Baseline S Approx S Label (full) S Label (collapsed) Difference between the S Label (full) model and all other models is statistically significant
83 Aoife Cahill nformation Status in Generation Ranking 53 / 57 Evaluation Evaluate on 260 sentences BLEU Exact Match (%) Baseline S Approx S Label (full) S Label (collapsed) Difference between the S Label (full) model and all other models is statistically significant
84 Aoife Cahill nformation Status in Generation Ranking 54 / 57 Sample mprovement (7) m September forderten Demonstranten den Abzug in September demanded 85,000 demonstrators the withdrawal der auf der nsel stationierten US-Soldaten. of the 29,000 on the island stationed US soldiers. 85,000 demonstrators demanded the withdrawal of the 29,000 US soldiers that were stationed on the island S Approximations Demonstranten forderten den Abzug der auf der nsel stationierten US-Soldaten im September. S Labels m September forderten Demonstranten den Abzug der auf der nsel stationierten US-Soldaten.
85 Aoife Cahill nformation Status in Generation Ranking 55 / 57 Outline 1 ntroduction 2 nformation Status 3 Approximating nformation Status 4 Generation Ranking 5 Predicting nformation Status 6 Generation Ranking Revisited 7 Conclusion
86 Aoife Cahill nformation Status in Generation Ranking 56 / 57 Conclusions We have shown that a realisation ranking system can benefit from information status Approximating the information status markup using morpho-syntactic features works well Using automatically assigned information status labels works better We trained a CRF model to automatically predict an S label for a phrase, given its parse Prediction quality on a subset of more general labels is high (81.65%) and for the full label set is well above the informed baseline (64.87%)
87 Aoife Cahill nformation Status in Generation Ranking 57 / 57 Outstanding ssues and Future Directions nvestigate the integration of lexical (and other) resources to improve the classification of certain phrases Currently we still only consider single sentences. Future work will also look at preceding context Look into carrying out an experiment with human annotators, automatically suggesting labels for them Continue working with colleagues to improve the automatic co-reference detection for our purposes and also apply it to the TGER training corpuse nvestigate other parsers during feature extraction for S label prediction model
88 Aoife Cahill nformation Status in Generation Ranking 58 / 57 Thank you! This work was funded by the Collaborative Research Centre (SFB 732) at the University of Stuttgart.
Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG
Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG Dr. Kakia Chatsiou, University of Essex achats at essex.ac.uk Explorations in Syntactic Government and Subcategorisation,
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationAnnotation Projection for Discourse Connectives
SFB 833 / Univ. Tübingen Penn Discourse Treebank Workshop Annotation projection Basic idea: Given a bitext E/F and annotation for F, how would the annotation look for E? Examples: Word Sense Disambiguation
More informationAdapting Stochastic Output for Rule-Based Semantics
Adapting Stochastic Output for Rule-Based Semantics Wissenschaftliche Arbeit zur Erlangung des Grades eines Diplom-Handelslehrers im Fachbereich Wirtschaftswissenschaften der Universität Konstanz Februar
More informationIntroduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.
to as a linguistic theory to to a member of the family of linguistic frameworks that are called generative grammars a grammar which is formalized to a high degree and thus makes exact predictions about
More informationTheoretical Syntax Winter Answers to practice problems
Linguistics 325 Sturman Theoretical Syntax Winter 2017 Answers to practice problems 1. Draw trees for the following English sentences. a. I have not been running in the mornings. 1 b. Joel frequently sings
More informationThe stages of event extraction
The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks
More informationLQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization
LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization Annemarie Friedrich, Marina Valeeva and Alexis Palmer COMPUTATIONAL LINGUISTICS & PHONETICS SAARLAND UNIVERSITY, GERMANY
More informationThe presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.
Lecture 4: OT Syntax Sources: Kager 1999, Section 8; Legendre et al. 1998; Grimshaw 1997; Barbosa et al. 1998, Introduction; Bresnan 1998; Fanselow et al. 1999; Gibson & Broihier 1998. OT is not a theory
More informationChapter 4: Valence & Agreement CSLI Publications
Chapter 4: Valence & Agreement Reminder: Where We Are Simple CFG doesn t allow us to cross-classify categories, e.g., verbs can be grouped by transitivity (deny vs. disappear) or by number (deny vs. denies).
More informationApproaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque
Approaches to control phenomena handout 6 5.4 Obligatory control and morphological case: Icelandic and Basque Icelandinc quirky case (displaying properties of both structural and inherent case: lexically
More informationarxiv: v1 [cs.cl] 2 Apr 2017
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationIntra-talker Variation: Audience Design Factors Affecting Lexical Selections
Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationThe Role of the Head in the Interpretation of English Deverbal Compounds
The Role of the Head in the Interpretation of English Deverbal Compounds Gianina Iordăchioaia i, Lonneke van der Plas ii, Glorianna Jagfeld i (Universität Stuttgart i, University of Malta ii ) Wen wurmt
More informationSEMAFOR: Frame Argument Resolution with Log-Linear Models
SEMAFOR: Frame Argument Resolution with Log-Linear Models Desai Chen or, The Case of the Missing Arguments Nathan Schneider SemEval July 16, 2010 Dipanjan Das School of Computer Science Carnegie Mellon
More informationInleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3
Inleiding Taalkunde Docent: Paola Monachesi Blok 4, 2001/2002 Contents 1 Syntax 2 2 Phrases and constituent structure 2 3 A minigrammar of Italian 3 4 Trees 3 5 Developing an Italian lexicon 4 6 S(emantic)-selection
More informationBeyond the Pipeline: Discrete Optimization in NLP
Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We
More informationPrediction of Maximal Projection for Semantic Role Labeling
Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba
More informationA Computational Evaluation of Case-Assignment Algorithms
A Computational Evaluation of Case-Assignment Algorithms Miles Calabresi Advisors: Bob Frank and Jim Wood Submitted to the faculty of the Department of Linguistics in partial fulfillment of the requirements
More informationImproving coverage and parsing quality of a large-scale LFG for German
Improving coverage and parsing quality of a large-scale LFG for German Christian Rohrer, Martin Forst Institute for Natural Language Processing (IMS) University of Stuttgart Azenbergstr. 12 70174 Stuttgart,
More informationTowards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la
Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing Grzegorz Chrupa la A dissertation submitted in fulfilment of the requirements for the award of Doctor of Philosophy (Ph.D.)
More informationEnhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities
Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion
More informationSyntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm syntax: from the Greek syntaxis, meaning setting out together
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationWeb as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics
(L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes
More information11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation
tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each
More informationAccurate Unlexicalized Parsing for Modern Hebrew
Accurate Unlexicalized Parsing for Modern Hebrew Reut Tsarfaty and Khalil Sima an Institute for Logic, Language and Computation, University of Amsterdam Plantage Muidergracht 24, 1018TV Amsterdam, The
More informationCS 598 Natural Language Processing
CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@
More informationSusanne J. Jekat
IUED: Institute for Translation and Interpreting Respeaking: Loss, Addition and Change of Information during the Transfer Process Susanne J. Jekat susanne.jekat@zhaw.ch This work was funded by Swiss TxT
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationEnsemble Technique Utilization for Indonesian Dependency Parser
Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id
More informationTHE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES
THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES PRO and Control in Lexical Functional Grammar: Lexical or Theory Motivated? Evidence from Kikuyu Njuguna Githitu Bernard Ph.D. Student, University
More informationThe Internet as a Normative Corpus: Grammar Checking with a Search Engine
The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a
More informationEdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,
More informationControl and Boundedness
Control and Boundedness Having eliminated rules, we would expect constructions to follow from the lexical categories (of heads and specifiers of syntactic constructions) alone. Combinatory syntax simply
More informationFeature-Based Grammar
8 Feature-Based Grammar James P. Blevins 8.1 Introduction This chapter considers some of the basic ideas about language and linguistic analysis that define the family of feature-based grammars. Underlying
More informationCan Human Verb Associations help identify Salient Features for Semantic Verb Classification?
Can Human Verb Associations help identify Salient Features for Semantic Verb Classification? Sabine Schulte im Walde Institut für Maschinelle Sprachverarbeitung Universität Stuttgart Seminar für Sprachwissenschaft,
More informationLNGT0101 Introduction to Linguistics
LNGT0101 Introduction to Linguistics Lecture #11 Oct 15 th, 2014 Announcements HW3 is now posted. It s due Wed Oct 22 by 5pm. Today is a sociolinguistics talk by Toni Cook at 4:30 at Hillcrest 103. Extra
More informationMulti-Lingual Text Leveling
Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency
More informationA relational approach to translation
A relational approach to translation Rémi Zajac Project POLYGLOSS* University of Stuttgart IMS-CL /IfI-AIS, KeplerstraBe 17 7000 Stuttgart 1, West-Germany zajac@is.informatik.uni-stuttgart.dbp.de Abstract.
More informationBasic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.
Basic Syntax Doug Arnold doug@essex.ac.uk We review some basic grammatical ideas and terminology, and look at some common constructions in English. 1 Categories 1.1 Word level (lexical and functional)
More informationSome Principles of Automated Natural Language Information Extraction
Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationTHE VERB ARGUMENT BROWSER
THE VERB ARGUMENT BROWSER Bálint Sass sass.balint@itk.ppke.hu Péter Pázmány Catholic University, Budapest, Hungary 11 th International Conference on Text, Speech and Dialog 8-12 September 2008, Brno PREVIEW
More informationTarget Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data
Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se
More informationTHE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING
SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,
More informationA Graph Based Authorship Identification Approach
A Graph Based Authorship Identification Approach Notebook for PAN at CLEF 2015 Helena Gómez-Adorno 1, Grigori Sidorov 1, David Pinto 2, and Ilia Markov 1 1 Center for Computing Research, Instituto Politécnico
More informationFreitag 7. Januar = QUIZ = REFLEXIVE VERBEN = IM KLASSENZIMMER = JUDD 115
DEUTSCH 3 DIE DEBATTE: GEFÄHRLICHE HAUSTIERE Debatte: Freitag 14. JANUAR, 2011 Bewertung: zwei kleine Prüfungen. Bewertungssystem: (see attached) Thema:Wir haben schon die Geschichte Gefährliche Haustiere
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationA Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many
Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.
More informationMemory-based grammatical error correction
Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,
More informationExtracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models
Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Richard Johansson and Alessandro Moschitti DISI, University of Trento Via Sommarive 14, 38123 Trento (TN),
More informationProof Theory for Syntacticians
Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 5, 2012 Logics for Linguistics Many different kinds of logic are directly applicable to formalizing theories in syntax
More informationDeveloping a TT-MCTAG for German with an RCG-based Parser
Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,
More informationThe optimal placement of up and ab A comparison 1
The optimal placement of up and ab A comparison 1 Nicole Dehé Humboldt-University, Berlin December 2002 1 Introduction This paper presents an optimality theoretic approach to the transitive particle verb
More informationAN EXPERIMENTAL APPROACH TO NEW AND OLD INFORMATION IN TURKISH LOCATIVES AND EXISTENTIALS
AN EXPERIMENTAL APPROACH TO NEW AND OLD INFORMATION IN TURKISH LOCATIVES AND EXISTENTIALS Engin ARIK 1, Pınar ÖZTOP 2, and Esen BÜYÜKSÖKMEN 1 Doguş University, 2 Plymouth University enginarik@enginarik.com
More informationParsing of part-of-speech tagged Assamese Texts
IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal
More informationAn Interactive Intelligent Language Tutor Over The Internet
An Interactive Intelligent Language Tutor Over The Internet Trude Heift Linguistics Department and Language Learning Centre Simon Fraser University, B.C. Canada V5A1S6 E-mail: heift@sfu.ca Abstract: This
More informationA Comparison of Two Text Representations for Sentiment Analysis
010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational
More informationInteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN:
Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN: 1137-3601 revista@aepia.org Asociación Española para la Inteligencia Artificial España Lucena, Diego Jesus de; Bastos Pereira,
More informationUnderlying and Surface Grammatical Relations in Greek consider
0 Underlying and Surface Grammatical Relations in Greek consider Sentences Brian D. Joseph The Ohio State University Abbreviated Title Grammatical Relations in Greek consider Sentences Brian D. Joseph
More informationENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist
Meeting 2 Chapter 7 (Morphology) and chapter 9 (Syntax) Today s agenda Repetition of meeting 1 Mini-lecture on morphology Seminar on chapter 7, worksheet Mini-lecture on syntax Seminar on chapter 9, worksheet
More informationThe College Board Redesigned SAT Grade 12
A Correlation of, 2017 To the Redesigned SAT Introduction This document demonstrates how myperspectives English Language Arts meets the Reading, Writing and Language and Essay Domains of Redesigned SAT.
More informationEAGLE: an Error-Annotated Corpus of Beginning Learner German
EAGLE: an Error-Annotated Corpus of Beginning Learner German Adriane Boyd Department of Linguistics The Ohio State University adriane@ling.osu.edu Abstract This paper describes the Error-Annotated German
More informationLING 329 : MORPHOLOGY
LING 329 : MORPHOLOGY TTh 10:30 11:50 AM, Physics 121 Course Syllabus Spring 2013 Matt Pearson Office: Vollum 313 Email: pearsonm@reed.edu Phone: 7618 (off campus: 503-517-7618) Office hrs: Mon 1:30 2:30,
More informationIntroduction to Causal Inference. Problem Set 1. Required Problems
Introduction to Causal Inference Problem Set 1 Professor: Teppei Yamamoto Due Friday, July 15 (at beginning of class) Only the required problems are due on the above date. The optional problems will not
More informationUsing Semantic Relations to Refine Coreference Decisions
Using Semantic Relations to Refine Coreference Decisions Heng Ji David Westbrook Ralph Grishman Department of Computer Science New York University New York, NY, 10003, USA hengji@cs.nyu.edu westbroo@cs.nyu.edu
More informationApplications of memory-based natural language processing
Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal
More informationWE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT
WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationHindi-Urdu Phrase Structure Annotation
Hindi-Urdu Phrase Structure Annotation Rajesh Bhatt and Owen Rambow January 12, 2009 1 Design Principle: Minimal Commitments Binary Branching Representations. Mostly lexical projections (P,, AP, AdvP)
More informationModeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures
Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Ulrike Baldewein (ulrike@coli.uni-sb.de) Computational Psycholinguistics, Saarland University D-66041 Saarbrücken,
More informationLoughton School s curriculum evening. 28 th February 2017
Loughton School s curriculum evening 28 th February 2017 Aims of this session Share our approach to teaching writing, reading, SPaG and maths. Share resources, ideas and strategies to support children's
More informationWhat the National Curriculum requires in reading at Y5 and Y6
What the National Curriculum requires in reading at Y5 and Y6 Word reading apply their growing knowledge of root words, prefixes and suffixes (morphology and etymology), as listed in Appendix 1 of the
More informationIn Udmurt (Uralic, Russia) possessors bear genitive case except in accusative DPs where they receive ablative case.
Sören E. Worbs The University of Leipzig Modul 04-046-2015 soeren.e.worbs@gmail.de November 22, 2016 Case stacking below the surface: On the possessor case alternation in Udmurt (Assmann et al. 2014) 1
More informationIndian Institute of Technology, Kanpur
Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar
More informationAQUA: An Ontology-Driven Question Answering System
AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.
More informationBasic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1
Basic Parsing with Context-Free Grammars Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Announcements HW 2 to go out today. Next Tuesday most important for background to assignment Sign up
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationcmp-lg/ Jul 1995
A CONSTRAINT-BASED CASE FRAME LEXICON ARCHITECTURE 1 Introduction Kemal Oazer and Okan Ylmaz Department of Computer Engineering and Information Science Bilkent University Bilkent, Ankara 0, Turkey fko,okang@cs.bilkent.edu.tr
More informationDisambiguation of Thai Personal Name from Online News Articles
Disambiguation of Thai Personal Name from Online News Articles Phaisarn Sutheebanjard Graduate School of Information Technology Siam University Bangkok, Thailand mr.phaisarn@gmail.com Abstract Since online
More informationMultilingual Sentiment and Subjectivity Analysis
Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department
More informationTwitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationUNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen
UNIVERSITY OF OSLO Department of Informatics Dialog Act Recognition using Dependency Features Master s thesis Sindre Wetjen November 15, 2013 Acknowledgments First I want to thank my supervisors Lilja
More informationMETHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS
METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS Ruslan Mitkov (R.Mitkov@wlv.ac.uk) University of Wolverhampton ViktorPekar (v.pekar@wlv.ac.uk) University of Wolverhampton Dimitar
More informationThe Smart/Empire TIPSTER IR System
The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of
More informationSpecifying a shallow grammatical for parsing purposes
Specifying a shallow grammatical for parsing purposes representation Atro Voutilainen and Timo J~irvinen Research Unit for Multilingual Language Technology P.O. Box 4 FIN-0004 University of Helsinki Finland
More informationChunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.
NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and
More informationTHE SOME INDEFINITES
UCLA Working Papers in Linguistics, vol.3, October 1999 Syntax at Sunset 2 Gianluca Storto (ed.) THE SOME INDEFINITES MISHA BECKER mbecker@ucla.edu Important syntactic and semantic differences between
More informationMinimalism is the name of the predominant approach in generative linguistics today. It was first
Minimalism Minimalism is the name of the predominant approach in generative linguistics today. It was first introduced by Chomsky in his work The Minimalist Program (1995) and has seen several developments
More informationWords come in categories
Nouns Words come in categories D: A grammatical category is a class of expressions which share a common set of grammatical properties (a.k.a. word class or part of speech). Words come in categories Open
More informationSwitched Control and other 'uncontrolled' cases of obligatory control
Switched Control and other 'uncontrolled' cases of obligatory control Dorothee Beermann and Lars Hellan Norwegian University of Science and Technology, Trondheim, Norway dorothee.beermann@ntnu.no, lars.hellan@ntnu.no
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationInteractive Corpus Annotation of Anaphor Using NLP Algorithms
Interactive Corpus Annotation of Anaphor Using NLP Algorithms Catherine Smith 1 and Matthew Brook O Donnell 1 1. Introduction Pronouns occur with a relatively high frequency in all forms English discourse.
More informationMethods for the Qualitative Evaluation of Lexical Association Measures
Methods for the Qualitative Evaluation of Lexical Association Measures Stefan Evert IMS, University of Stuttgart Azenbergstr. 12 D-70174 Stuttgart, Germany evert@ims.uni-stuttgart.de Brigitte Krenn Austrian
More informationDreistadt: A language enabled MOO for language learning
Dreistadt: A language enabled MOO for language learning Till Christopher Lech 1 and Koenraad de Smedt 2 Abstract. Dreistadt is an educational MOO (Multi User Domain, Object Oriented) for language learning.
More informationInformatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy
Informatics 2A: Language Complexity and the Chomsky Hierarchy September 28, 2010 Starter 1 Is there a finite state machine that recognises all those strings s from the alphabet {a, b} where the difference
More information