Information Status in Generation Ranking

Size: px
Start display at page:

Download "Information Status in Generation Ranking"

Transcription

1 Aoife Cahill nformation Status in Generation Ranking 1 / 57 nformation Status in Generation Ranking Aoife Cahill joint work with Arndt Riester Heidelberg Computational Linguistics Colloquium December 9, 2010

2 Aoife Cahill nformation Status in Generation Ranking 2 / 57 Outline 1 ntroduction 2 nformation Status 3 Approximating nformation Status 4 Generation Ranking 5 Predicting nformation Status 6 Generation Ranking Revisited 7 Conclusion

3 Aoife Cahill nformation Status in Generation Ranking 3 / 57 Outline 1 ntroduction 2 nformation Status 3 Approximating nformation Status 4 Generation Ranking 5 Predicting nformation Status 6 Generation Ranking Revisited 7 Conclusion

4 Aoife Cahill nformation Status in Generation Ranking 4 / 57 Outlining the problem German is considered a relatively free word order language (with a rich case system) Notion dates from a time when discourse information did not play much of a role in linguistics Our task: generating German strings from LFG F-structures The problem: how to choose the best string from the many grammatical strings output by the system?

5 Aoife Cahill nformation Status in Generation Ranking 5 / 57 Surface Realisation System Lexical Functional Grammar F-Structure Basic predicate argument structure "Die Nato werde nicht von der EU geführt." PRED SUBJ OBL-AG ADJUNCT CHECK 'führen<[249:von], [21:Nato]>' PRED 'Nato' _SPEC-TYPE _COUNT +, _DEF +, _DET attr CHECK _NFL strong-det NTYPE NSYN proper SPEC DET PRED 'die' DET-TYPE def 21 CASE nom, GEND fem, NUM sg, PERS 3 PRED OBJ 'von<[283:eu]>' PRED 'EU' CHECK _SPEC-TYPE _COUNT +, _DEF +, _DET attr _NFL strong-det NTYPE NSYN proper PRED 'die' SPEC DET DET-TYPE def 283 CASE dat, GEND fem, NUM sg, PERS PSEM dir, PTYPE sem PRED 'nicht' 215 ADJUNCT-TYPE neg _AUX-FORM werden-pass VLEX _VMORPH _AUX-SELECT sein _PARTCPLE perfect TNS-ASP MOOD subjunctive, PASS-SEM dynamic_, TENSE pres TOPC [21:Nato] 128 CLAUSE-TYPE decl, PASSVE +, STMT-TYPE decl, VTYPE main

6 Aoife Cahill nformation Status in Generation Ranking 6 / 57 Surface Realisation System Hand Crafted Large-Scale Grammar (Rohrer and Forst, 2006) generates all possible (grammatical) strings. NATO is not led by the EU. Die Nato werde von der EU nicht geführt. Nicht von der EU geführt werde die Nato. Nicht werde die Nato von der EU geführt. Nicht geführt werde die Nato von der EU. Von der EU werde die Nato nicht geführt. Von der EU geführt werde nicht die Nato. Geführt werde die Nato nicht von der EU. Geführt werde nicht von der EU die Nato. Geführt werde von der EU nicht die Nato. Die Nato werde nicht von der EU geführt. Nicht werde von der EU die Nato geführt. Nicht geführt werde von der EU die Nato. Von der EU nicht geführt werde die Nato. Von der EU werde nicht die Nato geführt. Von der EU geführt werde die Nato nicht. Geführt werde die Nato von der EU nicht. Geführt werde nicht die Nato von der EU. Geführt werde von der EU die Nato nicht.

7 Aoife Cahill nformation Status in Generation Ranking 7 / 57 Surface Realisation System (Cahill et al., 2007) Log-linear ranking model chooses most likely string Linguistically Motivated Feature Types 1. C-structure number of NPs, number of children of PP 2. C- & F-Structure SUBJ precedes OBJ 3. Language Model tri-gram score Outperforms a basic tri-gram language model, but can be further improved

8 Aoife Cahill nformation Status in Generation Ranking 7 / 57 Surface Realisation System (Cahill et al., 2007) Log-linear ranking model chooses most likely string Linguistically Motivated Feature Types 1. C-structure number of NPs, number of children of PP 2. C- & F-Structure SUBJ precedes OBJ 3. Language Model tri-gram score Outperforms a basic tri-gram language model, but can be further improved dea: Capturing the influence of discourse information can help choose the best string

9 Aoife Cahill nformation Status in Generation Ranking 8 / 57 Outline 1 ntroduction 2 nformation Status 3 Approximating nformation Status 4 Generation Ranking 5 Predicting nformation Status 6 Generation Ranking Revisited 7 Conclusion

10 Aoife Cahill nformation Status in Generation Ranking 9 / 57 nformation Status (S) (Prince 1981,1992) Means of discourse analysis Classifying (NP/PP/DP) constituents according to their givenness S is marked in prosody (Baumann, 2006; Schweitzer et al., 2009) as well as in syntax Corpus of German news texts manually annotated for S Advantages with regard to earlier S work: proper treatment of embedded phrases higher inter-annotator agreement on difficult texts closer to insights from semantic theory (e.g. semantic presuppositions)

11 Aoife Cahill nformation Status in Generation Ranking 10 / 57 S Labels: Riester, Lorenz, Seemann (2010) Full BRDGNG BRDGNG-CONTANED CATAPHOR EXPLETVE GVEN-EPTHET GVEN-PRONOUN GVEN-REFLEXVE GVEN-REPEATED GVEN-SHORT NDEF-GENERC NDEF-NEW NDEF-PARTTVE NDEF-PARTTVE-CONTANED NDEF-RESUMPTVE NULL RELATVE STUATVE UNUSED-KNOWN UNUSED-TYPE UNUSED-UNKNOWN Collapsed BRDGNG CATAPHOR EXPLETVE GVEN NDEF NULL RELATVE STUATVE UNUSED

12 Aoife Cahill nformation Status in Generation Ranking 11 / 57 Most mportant Classes GVEN BRDGNG UNUSED-KNOWN UNUSED-UNKNOWN coreferential anaphor non-coreferential but context dependent expression discourse new, familiar definite discourse new, unfamiliar definite Merkel... sie Stuttgart... der Bahnhof der Mond das neue Gesetz zur Gesundheitsreform STUATVE deictic expression am Dienstag NDEF indefinite einige hundert Menschen

13 Aoife Cahill nformation Status in Generation Ranking 12 / 57 Grammaticality and markedness Two grammatical sentences The army has even been able to recapture smaller territories. (1) Die Armee habe sogar kleinere Gebiete zurückerobern können. (ok) (2) Kleinere Gebiete habe die Armee sogar zurückerobern können. (strongly marked) A sentence is marked precisely if there are only few or very special contexts in which it is appropriate

14 Aoife Cahill nformation Status in Generation Ranking 13 / 57 Capturing context nformation status reflects context to a certain degree S labels taken from corpus The army has even been able to recapture smaller territories. (3) Die Armee GVEN-EPTHET habe sogar kleinere Gebiete NDEF-NEW zurückerobern können. The givenness/novelty of an expression characterise the class of contexts in which the expression can occur Compute the preferred order for each pair of S labels

15 Aoife Cahill nformation Status in Generation Ranking 14 / 57 Precedence of label pairs within a clause X before Y (e.g. BRDGNG before UNUSED-UNKNOWN) Die Gespräche BRDGNG sollen heute in Jerusalem UNUSED-KNOWN fortgesetzt werden. The talks shall be continued in Jerusalem today. Occurrences in corpus: 49

16 Aoife Cahill nformation Status in Generation Ranking 14 / 57 Precedence of label pairs within a clause X before Y (e.g. BRDGNG before UNUSED-UNKNOWN) Die Gespräche BRDGNG sollen heute in Jerusalem UNUSED-KNOWN fortgesetzt werden. The talks shall be continued in Jerusalem today. Occurrences in corpus: 49 Y before X (e.g. UNUSED-UNKNOWN before BRDGNG) So müsse dies die britische Regierung UNUSED-KNOWN den Bürgern BRDGNG klarmachen. Thus, the British Government should make this clear to the citizens. Occurrences in corpus: 81

17 Aoife Cahill nformation Status in Generation Ranking 14 / 57 Precedence of label pairs within a clause X before Y (e.g. BRDGNG before UNUSED-UNKNOWN) Die Gespräche BRDGNG sollen heute in Jerusalem UNUSED-KNOWN fortgesetzt werden. The talks shall be continued in Jerusalem today. Occurrences in corpus: 49 less prominent order B Y before X (e.g. UNUSED-UNKNOWN before BRDGNG) So müsse dies die britische Regierung UNUSED-KNOWN den Bürgern BRDGNG klarmachen. Thus, the British Government should make this clear to the citizens. Occurrences in corpus: 81 dominant order A

18 Aoife Cahill nformation Status in Generation Ranking 15 / 57 Defining a measure Asymmetry ratio A (Dominant order) B Asym. ratio B/A Total Compute asymmetry ratio for each pair of S labels.

19 Asymmetry tables (top) Dominant order Asym. ratio Freq UNUSED-KNOWN before CATAPHOR GVEN-REPEATED before UNUSED-TYPE GVEN-PRONOUN before STUATVE GVEN-REFLEXVE before NDEF-NEW GVEN-PRONOUN before CATAPHOR GVEN-PRONOUN before NDEF-NEW BRDGNG before NDEF-GENERC GVEN-SHORT before GVEN-REPEATED GVEN-PRONOUN before UNUSED-TYPE GVEN-REFLEXVE before UNUSED-TYPE GVEN-EPTHET before UNUSED-TYPE UNUSED-KNOWN before UNUSED-TYPE EXPLETVE before NDEF-NEW Aoife Cahill... nformation Status in Generation Ranking 16 / 57

20 Aoife Cahill nformation Status in Generation Ranking 17 / 57 The crucial problem S is an indicator for constituent order, but... there is no reliable automatic annotation system for S

21 Aoife Cahill nformation Status in Generation Ranking 17 / 57 The crucial problem S is an indicator for constituent order, but... there is no reliable automatic annotation system for S First Attempt (Cahill and Riester, 2009): use morphosyntactic features correlated with S

22 Aoife Cahill nformation Status in Generation Ranking 18 / 57 Outline 1 ntroduction 2 nformation Status 3 Approximating nformation Status 4 Generation Ranking 5 Predicting nformation Status 6 Generation Ranking Revisited 7 Conclusion

23 Syntactic Features We define an inventory of syntactic features that can appear under all S labels and automatically mark up the corpus with them. The features include: is simple definite is simple definite description with a possessive modifier is definite description with adjectival modifier is definite description with a genitive argument is definite description with an (obligatory/referentially restricting) PP adjunct is definite description including a relative clause is definite description including an embedded proper name and (perhaps) a title or job description is a combination of position/title and proper name (without article) is a bare proper name... Aoife Cahill nformation Status in Generation Ranking 19 / 57

24 Aoife Cahill nformation Status in Generation Ranking 20 / 57 Morphosyntactic correlates of S Some S categories directly derive from syntactic classes (1:1 correspondence) GVEN-REFLEXVE s a reflexive pronoun (all items) EXPLETVE s an expletive, e.g. es (all items)

25 Aoife Cahill nformation Status in Generation Ranking 21 / 57 Morphosyntactic correlates of S Some S categories are represented by various features UNUSED-KNOWN feature items example s a simple definite 145 the moon s a name with a title 55 President Obama s a bare noun 54 Africa s definite with apposition 36 the German Chancellor, Angela Merkel...

26 Aoife Cahill nformation Status in Generation Ranking 22 / 57 Syntactic Features and S phrases Extracting information from the corpus We have a corpus that is: annotated with S labels marked up with syntactic features For each phrase annotated with an S label, look at what syntactic features are present Collect statistics for each S label type

27 Aoife Cahill nformation Status in Generation Ranking 23 / 57 Syntactic Features associated with S labels GVEN-PRONOUN Syn. Feat Count S_PERS_PRON 88 S_DA_PRON 56 S_DEMON_PRON 41 S_GENERC_PRON 16

28 Aoife Cahill nformation Status in Generation Ranking 23 / 57 Syntactic Features associated with S labels GVEN-PRONOUN Syn. Feat Count S_PERS_PRON 88 S_DA_PRON 56 S_DEMON_PRON 41 S_GENERC_PRON 16 NDEF-NEW Syn. Feat Count S_SMPLE_NDEF 203 S_NDEF_ATTR 95 S_NDEF_NUM 85 S_NDEF_GENARG 20 S_NDEF_PPADJUNCT 19...

29 Aoife Cahill nformation Status in Generation Ranking 24 / 57 S asymmetries with syntactic features Label 1 Label 2 Ratio Freq. UNUSED-KNOWN CATAPHOR S_BAREPROPER 166 S_SMPLE_DEF 14 S_SMPLE_DEF 102 S_DA_PRON 13 S_PROPER 85 GVEN-REPEATED UNUSED-TYPE S_SMPLE_DEF 28 S_SMPLE_DEF 37 S_BAREPROPER 23 S_SMPLE_NDEF 36 GVEN-PRONOUN STUATVE S_PERS_PRON 88 S_TEMP_ADV 62 S_DA_PRON 56 S_SMPLE_DEF 44 S_DEMON_PRON 41 S_DEF_ATTR_ADJUNCT 23 S_GENERC_PRON 16 S_SMPLE_NDEF 19...

30 Aoife Cahill nformation Status in Generation Ranking 25 / 57 New Features From each S asymmetry extract precedence patterns of corresponding syntactic features GVEN-PRONOUN STUATVE S_PERS_PRON 88 S_TEMP_ADV 62 S_DA_PRON 56 S_SMPLE_DEF 44 S_DEMON_PRON 41 S_DEF_ATTR_ADJUNCT 23 S_GENERC_PRON 16 S_SMPLE_NDEF 19

31 Aoife Cahill nformation Status in Generation Ranking 25 / 57 New Features From each S asymmetry extract precedence patterns of corresponding syntactic features GVEN-PRONOUN STUATVE S_PERS_PRON 88 S_TEMP_ADV 62 S_DA_PRON 56 S_SMPLE_DEF 44 S_DEMON_PRON 41 S_DEF_ATTR_ADJUNCT 23 S_GENERC_PRON 16 S_SMPLE_NDEF 19 S_PERS_PRON precedes S_TEMP_ADV

32 Aoife Cahill nformation Status in Generation Ranking 25 / 57 New Features From each S asymmetry extract precedence patterns of corresponding syntactic features GVEN-PRONOUN STUATVE S_PERS_PRON 88 S_TEMP_ADV 62 S_DA_PRON 56 S_SMPLE_DEF 44 S_DEMON_PRON 41 S_DEF_ATTR_ADJUNCT 23 S_GENERC_PRON 16 S_SMPLE_NDEF 19 S_PERS_PRON precedes S_TEMP_ADV S_PERS_PRON precedes S_SMPLE_DEF

33 Aoife Cahill nformation Status in Generation Ranking 25 / 57 New Features From each S asymmetry extract precedence patterns of corresponding syntactic features GVEN-PRONOUN STUATVE S_PERS_PRON 88 S_TEMP_ADV 62 S_DA_PRON 56 S_SMPLE_DEF 44 S_DEMON_PRON 41 S_DEF_ATTR_ADJUNCT 23 S_GENERC_PRON 16 S_SMPLE_NDEF 19 S_PERS_PRON precedes S_TEMP_ADV S_PERS_PRON precedes S_SMPLE_DEF S_PERS_PRON precedes S_DEF_ATTR_ADJUNCT

34 Aoife Cahill nformation Status in Generation Ranking 25 / 57 New Features From each S asymmetry extract precedence patterns of corresponding syntactic features GVEN-PRONOUN STUATVE S_PERS_PRON 88 S_TEMP_ADV 62 S_DA_PRON 56 S_SMPLE_DEF 44 S_DEMON_PRON 41 S_DEF_ATTR_ADJUNCT 23 S_GENERC_PRON 16 S_SMPLE_NDEF 19 S_PERS_PRON precedes S_TEMP_ADV S_PERS_PRON precedes S_SMPLE_DEF S_PERS_PRON precedes S_DEF_ATTR_ADJUNCT S_PERS_PRON precedes S_SMPLE_NDEF

35 Aoife Cahill nformation Status in Generation Ranking 25 / 57 New Features From each S asymmetry extract precedence patterns of corresponding syntactic features GVEN-PRONOUN STUATVE S_PERS_PRON 88 S_TEMP_ADV 62 S_DA_PRON 56 S_SMPLE_DEF 44 S_DEMON_PRON 41 S_DEF_ATTR_ADJUNCT 23 S_GENERC_PRON 16 S_SMPLE_NDEF 19 S_PERS_PRON precedes S_TEMP_ADV S_PERS_PRON precedes S_SMPLE_DEF S_PERS_PRON precedes S_DEF_ATTR_ADJUNCT S_PERS_PRON precedes S_SMPLE_NDEF S_DA_PRON precedes S_TEMP_ADV

36 New Features From each S asymmetry extract precedence patterns of corresponding syntactic features GVEN-PRONOUN STUATVE S_PERS_PRON 88 S_TEMP_ADV 62 S_DA_PRON 56 S_SMPLE_DEF 44 S_DEMON_PRON 41 S_DEF_ATTR_ADJUNCT 23 S_GENERC_PRON 16 S_SMPLE_NDEF 19 S_PERS_PRON precedes S_TEMP_ADV S_PERS_PRON precedes S_SMPLE_DEF S_PERS_PRON precedes S_DEF_ATTR_ADJUNCT S_PERS_PRON precedes S_SMPLE_NDEF S_DA_PRON precedes S_TEMP_ADV S_DA_PRON precedes S_SMPLE_DEF S_DA_PRON precedes S_DEF_ATTR_ADJUNCT S_DA_PRON precedes S_SMPLE_NDEF S_DEMON_PRON precedes S_TEMP_ADV... Aoife Cahill nformation Status in Generation Ranking 25 / 57

37 Aoife Cahill nformation Status in Generation Ranking 26 / 57 mproved Generation Ranking Model We include these new features in our svm model for generation ranking Feature Types 1. C-structure number of NPs, number of children of PP 2. C- & F-Structure SUBJ precedes OBJ 3. Language Model tri-gram score 4. S asymmetric syntactic patterns S_PERS_PRON precedes S_TEMP_ADV

38 Aoife Cahill nformation Status in Generation Ranking 27 / 57 Outline 1 ntroduction 2 nformation Status 3 Approximating nformation Status 4 Generation Ranking 5 Predicting nformation Status 6 Generation Ranking Revisited 7 Conclusion

39 Aoife Cahill nformation Status in Generation Ranking 28 / 57 System Overview Machine Translation Sentence Condensation LFG F-Structure Grammar All Strings Summarisation Corpus Sentences Language Model Features Linguistically Motivated Features Ranking Model S features? Best String

40 Aoife Cahill nformation Status in Generation Ranking 29 / 57 Experimental Setup Experiment Train svm ranking model on 7161 syntactically annotated sentences from TGER Tune model parameters on development set of 55 sentences Carry out final evaluation on test set of 260 sentences

41 Aoife Cahill nformation Status in Generation Ranking 30 / 57 Results Evaluation on 260 sentences BLEU measures string similarity using ngrams Slightly different to Cahill and Riester (2009): Uses SVM rank instead of log-linear model asymmetries calculated from more data... but same features

42 Aoife Cahill nformation Status in Generation Ranking 30 / 57 Results Evaluation on 260 sentences BLEU measures string similarity using ngrams Slightly different to Cahill and Riester (2009): Uses SVM rank instead of log-linear model asymmetries calculated from more data... but same features BLEU Exact Match (%) Baseline S Approx

43 Aoife Cahill nformation Status in Generation Ranking 30 / 57 Results Evaluation on 260 sentences BLEU measures string similarity using ngrams Slightly different to Cahill and Riester (2009): Uses SVM rank instead of log-linear model asymmetries calculated from more data... but same features BLEU Exact Match (%) Baseline S Approx Statistically significant improvement with model including new S-inspired syntactic features

44 Aoife Cahill nformation Status in Generation Ranking 31 / 57 Example Sentences We have learnt from the scandal Gold Man hat aus der Affäre gelernt. One has from the scandal learnt.

45 Aoife Cahill nformation Status in Generation Ranking 31 / 57 Example Sentences We have learnt from the scandal Gold Man hat aus der Affäre gelernt. One has from the scandal learnt. Baseline Aus der Affäre hat man gelernt. From the scandal has one learnt.

46 Aoife Cahill nformation Status in Generation Ranking 31 / 57 Example Sentences We have learnt from the scandal Gold Man hat aus der Affäre gelernt. One has from the scandal learnt. Baseline Aus der Affäre hat man gelernt. From the scandal has one learnt. New Man hat aus der Affäre gelernt. One has from the scandal learnt.

47 Aoife Cahill nformation Status in Generation Ranking 32 / 57 Outline 1 ntroduction 2 nformation Status 3 Approximating nformation Status 4 Generation Ranking 5 Predicting nformation Status 6 Generation Ranking Revisited 7 Conclusion

48 Aoife Cahill nformation Status in Generation Ranking 33 / 57 Predicting nformation Status? We showed that for realisation ranking, the approximation of the morpho-syntactic features of the information status labels helped But what if we could automatically label raw text with information status labels?

49 Aoife Cahill nformation Status in Generation Ranking 34 / 57 Supervised Learning Task Given a corpus of manually annotated radio news 3454 sentences remove duplicates divide into 10% development (129 sentences), 90% training/test (1169 sentences) parse with XLE German grammar Task: sequence labelling Model: Conditional Random Field Designed Features to capture the basic geometry of the expressions

50 Aoife Cahill nformation Status in Generation Ranking 35 / 57 Capturing the Geometry of Expressions STUATVE STUATVE

51 Aoife Cahill nformation Status in Generation Ranking 36 / 57 Capturing the Geometry of Expressions GVEN-SHORT GVEN-PRONOUN

52 Aoife Cahill nformation Status in Generation Ranking 37 / 57 Capturing the Geometry of Expressions BRDGNG-CONTANED

53 Aoife Cahill nformation Status in Generation Ranking 38 / 57 Capturing the Geometry of Expressions UNUSED-UNKNOWN

54 Aoife Cahill nformation Status in Generation Ranking 39 / 57 Model Features Starting Point Morpho-syntactic features from previous work Things we count Words Specific syntactic categories: DP, NP, DP-APPOSS, LABELP, NAMEP, YEAR, A-CARD Children of the top category Maximum path length from top node to POS tags N-ary branching nodes (n > 1)

55 Aoife Cahill nformation Status in Generation Ranking 40 / 57 Model Features Binary Features Coordination Coreferent More than 1 DP and NP Pronoun First/Last label in the sentences Other Features Determiner type (definite, indefinite, unknown) Syntactic category of the top-most node dominating the string Syntactic function of the substring POS tag at left/right edge of the substring

56 Evaluation Carry out 10-fold cross validation on our test/train data (1169 sentence, 3705 labels) Evaluate on both sets of labels: full (20) and collapsed (9) Three Baselines: 1 Randomly assign a label to each phrase 2 Always assign the most frequent label to each phrase 3 nformed: assign the most frequent label, given the morpho-syntactic features from previous experiments Aoife Cahill nformation Status in Generation Ranking 41 / 57

57 Evaluation Carry out 10-fold cross validation on our test/train data (1169 sentence, 3705 labels) Evaluate on both sets of labels: full (20) and collapsed (9) Three Baselines: 1 Randomly assign a label to each phrase 2 Always assign the most frequent label to each phrase 3 nformed: assign the most frequent label, given the morpho-syntactic features from previous experiments Accuracy (%) Full Collapsed Random Most Frequent nformed Aoife Cahill nformation Status in Generation Ranking 41 / 57

58 Aoife Cahill nformation Status in Generation Ranking 42 / 57 CRF Model Prediction Results Accuracy (%) Full Collapsed Random Most Frequent nformed CRF % increase in full label set accuracy, 16.39% increase on collapsed set accuracy

59 Aoife Cahill nformation Status in Generation Ranking 43 / 57 Detailed CRF Prediction Results Label Total Precision Recall F-Score BRDGNG CATAPHOR EXPLETVE GVEN NDEF NULL RELATVE STUATVE UNUSED High level prediction could be used to suggest possible labels to annotators and possibly speed up the manual annotation process

60 Aoife Cahill nformation Status in Generation Ranking 43 / 57 Detailed CRF Prediction Results Label Total Precision Recall F-Score BRDGNG CATAPHOR EXPLETVE GVEN NDEF NULL RELATVE STUATVE UNUSED High level prediction could be used to suggest possible labels to annotators and possibly speed up the manual annotation process

61 Aoife Cahill nformation Status in Generation Ranking 44 / 57 Detailed CRF Prediction Results Label Total Precision Recall F-Score BRDGNG BRDGNG-CONTANED CATAPHOR EXPLETVE GVEN-EPTHET GVEN-PRONOUN GVEN-REFLEXVE GVEN-REPEATED GVEN-SHORT NDEF-GENERC NDEF-NEW NDEF-PARTTVE NDEF-PARTTVE-CONTANED NDEF-RESUMPTVE NULL RELATVE STUATVE UNUSED-KNOWN UNUSED-TYPE UNUSED-UNKNOWN

62 Aoife Cahill nformation Status in Generation Ranking 44 / 57 Detailed CRF Prediction Results Label Total Precision Recall F-Score BRDGNG BRDGNG-CONTANED CATAPHOR EXPLETVE GVEN-EPTHET GVEN-PRONOUN GVEN-REFLEXVE GVEN-REPEATED GVEN-SHORT NDEF-GENERC NDEF-NEW NDEF-PARTTVE NDEF-PARTTVE-CONTANED NDEF-RESUMPTVE NULL RELATVE STUATVE UNUSED-KNOWN UNUSED-TYPE UNUSED-UNKNOWN

63 Aoife Cahill nformation Status in Generation Ranking 44 / 57 Detailed CRF Prediction Results Label Total Precision Recall F-Score BRDGNG BRDGNG-CONTANED CATAPHOR EXPLETVE GVEN-EPTHET GVEN-PRONOUN GVEN-REFLEXVE GVEN-REPEATED GVEN-SHORT NDEF-GENERC NDEF-NEW NDEF-PARTTVE NDEF-PARTTVE-CONTANED NDEF-RESUMPTVE NULL RELATVE STUATVE UNUSED-KNOWN UNUSED-TYPE UNUSED-UNKNOWN

64 Aoife Cahill nformation Status in Generation Ranking 44 / 57 Detailed CRF Prediction Results Label Total Precision Recall F-Score BRDGNG BRDGNG-CONTANED CATAPHOR EXPLETVE GVEN-EPTHET GVEN-PRONOUN GVEN-REFLEXVE GVEN-REPEATED GVEN-SHORT NDEF-GENERC NDEF-NEW NDEF-PARTTVE NDEF-PARTTVE-CONTANED NDEF-RESUMPTVE NULL RELATVE STUATVE UNUSED-KNOWN UNUSED-TYPE UNUSED-UNKNOWN

65 Aoife Cahill nformation Status in Generation Ranking 45 / 57 Confusion Matrix (Human Annotators) Riester, Lorenz, Seemann (2010) A B C D E F G H J K L M N O P Q R S T A B C D E F G 65 1 H J K L M N O P Q 11 R 4 S 1 5 T 1 45

66 Aoife Cahill nformation Status in Generation Ranking 46 / 57 Confusion Matrix (Automatic System) A B C D E F G H J K L M N O P Q R S T A B C D 73 E F G 2 95 H J K L M N 3 19 O 1 P 7 Q R S T

67 Aoife Cahill nformation Status in Generation Ranking 46 / 57 Confusion Matrix (Automatic System) A B C D E F G H J K L M N O P Q R S T A B C D 73 E F G 2 95 H J K L M N 3 19 O 1 P 7 Q R S T

68 Aoife Cahill nformation Status in Generation Ranking 47 / 57 Confusion Matrix BRDGNG K R A BRDGNG-CONTANED C D E F G H NDEF-GENERC 1 76 NDEF-NEW NDEF-PARTTVE M 35 1 N 19 O P STUATVE UNUSED-KNOWN UNUSED-TYPE UNUSED-UNKNOWN

69 Aoife Cahill nformation Status in Generation Ranking 47 / 57 Confusion Matrix BRDGNG K R A BRDGNG-CONTANED C Confusing BRDGNG with UNUSED-KNOWN D Human annotators have E the same confusion 5/89 times F (4) Die Behörden gaben G eine Tsunami-Warnung für die H The authorities gave a Tsunami-warning for the Westküste heraus. NDEF-GENERC 1 76 west coast out. NDEF-NEW The authorities NDEF-PARTTVE gave a Tsunami-warning 3 78for the3 west M 35 1 coast N 19 O P STUATVE UNUSED-KNOWN UNUSED-TYPE UNUSED-UNKNOWN

70 Aoife Cahill nformation Status in Generation Ranking 47 / 57 Confusion Matrix A NDEF-NEW R BRDGNG BRDGNG-CONTANED C D E F G H NDEF-GENERC 1 76 K NDEF-PARTTVE NDEF-PARTTVE-CONTANED 35 1 NDEF-RESUMPTVE 19 O P STUATVE UNUSED-KNOWN UNUSED-TYPE UNUSED-UNKNOWN

71 Aoife Cahill nformation Status in Generation Ranking 47 / 57 Confusion Matrix A NDEF-NEW R BRDGNG BRDGNG-CONTANED C Confusing NDEF-NEW with NDEF-GENERC D Human annotators have E the same confusion 20/144 times F (5) Nach Angaben G japanischer Medien kam ein Mensch H According to reports Japanese media came a person ums Leben, NDEF-GENERC viele Einwohner wurden 1 verletzt. 76 for life, manykinhabitants were1 injured NDEF-PARTTVE According to Japanese media reports, one person died, NDEF-PARTTVE-CONTANED 35 1 many inhabitants were injured NDEF-RESUMPTVE 19 O P STUATVE UNUSED-KNOWN UNUSED-TYPE UNUSED-UNKNOWN

72 Aoife Cahill nformation Status in Generation Ranking 47 / 57 Confusion Matrix A K UNUSED-KNOWN BRDGNG BRDGNG-CONTANED C D E F G H J 1 76 NDEF-NEW NDEF-PARTTVE NDEF-PARTTVE-CONTANED 35 1 N 19 O P STUATVE UNUSED-KNOWN UNUSED-TYPE UNUSED-UNKNOWN

73 Aoife Cahill nformation Status in Generation Ranking 47 / 57 Confusion Matrix A K UNUSED-KNOWN BRDGNG BRDGNG-CONTANED C Confusing UNUSED-KNOWN with UNUSED-UNKNOWN D E F (6) Der Kölner Erzbischof G Meisner kritisiert die H The Cologne Archbishop Meisner criticised the Familienpolitik Jder Bundesregierung family politics NDEF-NEW of the federal government The Archbishop NDEF-PARTTVE of Cologne, Meisner, 3 78 criticised3the NDEF-PARTTVE-CONTANED 35 1 family policies of the federal government N 19 O P STUATVE UNUSED-KNOWN UNUSED-TYPE UNUSED-UNKNOWN Human annotators have the same confusion 7 / 134 times

74 Aoife Cahill nformation Status in Generation Ranking 48 / 57 Addressing our underlying assumptions 1 Gold-standard co-reference information (D-GVEN) 2 Gold-standard markables

75 Aoife Cahill nformation Status in Generation Ranking 48 / 57 Addressing our underlying assumptions 1 Gold-standard co-reference information (D-GVEN) 2 Gold-standard markables Real-world applications will not have access to this information Test two automatic co-reference systems on the data Accuracy (%) Full Collapsed Gold None Simple Unsupervised

76 Aoife Cahill nformation Status in Generation Ranking 49 / 57 Summary of Automatic S Label Prediction Trained a CRF on manually annotated text Results are high for collapsed label set (81.65%) and well above baseline for full label set (64.87%) Often the mistakes made by the automatic system are similar to the disagreements that human annotators have

77 Aoife Cahill nformation Status in Generation Ranking 49 / 57 Summary of Automatic S Label Prediction Trained a CRF on manually annotated text Results are high for collapsed label set (81.65%) and well above baseline for full label set (64.87%) Often the mistakes made by the automatic system are similar to the disagreements that human annotators have Q: How useful is it in practice?

78 Aoife Cahill nformation Status in Generation Ranking 50 / 57 Outline 1 ntroduction 2 nformation Status 3 Approximating nformation Status 4 Generation Ranking 5 Predicting nformation Status 6 Generation Ranking Revisited 7 Conclusion

79 Aoife Cahill nformation Status in Generation Ranking 51 / 57 An application for S Label Prediction Revisit our earlier realisation ranking experiments No need to use approximations of S Labels any more Train CRF on 1169 sentences of manually annotated corpus (test/train) Automatically assign an S label to every DP/NP in our TGER training data (21,341 phrases) Extract S Label order patterns directly

80 Aoife Cahill nformation Status in Generation Ranking 52 / 57 Even Newer Generation Ranking Model We include the S Label asymmetric patterns directly into the svm ranking model now Feature Types 1. C-structure number of NPs, number of children of PP 2. C- & F-Structure SUBJ precedes OBJ 3. Language Model tri-gram score 4. S asymmetric syntactic patterns S_PERS_PRON precedes S_TEMP_ADV 4. S label asymmetric patterns D-GVEN-SHORT precedes NDEF-NEW

81 Aoife Cahill nformation Status in Generation Ranking 53 / 57 Evaluation Evaluate on 260 sentences BLEU Exact Match (%) Baseline S Approx S Label (full) S Label (collapsed) Difference between the S Label (full) model and all other models is statistically significant

82 Aoife Cahill nformation Status in Generation Ranking 53 / 57 Evaluation Evaluate on 260 sentences BLEU Exact Match (%) Baseline S Approx S Label (full) S Label (collapsed) Difference between the S Label (full) model and all other models is statistically significant

83 Aoife Cahill nformation Status in Generation Ranking 53 / 57 Evaluation Evaluate on 260 sentences BLEU Exact Match (%) Baseline S Approx S Label (full) S Label (collapsed) Difference between the S Label (full) model and all other models is statistically significant

84 Aoife Cahill nformation Status in Generation Ranking 54 / 57 Sample mprovement (7) m September forderten Demonstranten den Abzug in September demanded 85,000 demonstrators the withdrawal der auf der nsel stationierten US-Soldaten. of the 29,000 on the island stationed US soldiers. 85,000 demonstrators demanded the withdrawal of the 29,000 US soldiers that were stationed on the island S Approximations Demonstranten forderten den Abzug der auf der nsel stationierten US-Soldaten im September. S Labels m September forderten Demonstranten den Abzug der auf der nsel stationierten US-Soldaten.

85 Aoife Cahill nformation Status in Generation Ranking 55 / 57 Outline 1 ntroduction 2 nformation Status 3 Approximating nformation Status 4 Generation Ranking 5 Predicting nformation Status 6 Generation Ranking Revisited 7 Conclusion

86 Aoife Cahill nformation Status in Generation Ranking 56 / 57 Conclusions We have shown that a realisation ranking system can benefit from information status Approximating the information status markup using morpho-syntactic features works well Using automatically assigned information status labels works better We trained a CRF model to automatically predict an S label for a phrase, given its parse Prediction quality on a subset of more general labels is high (81.65%) and for the full label set is well above the informed baseline (64.87%)

87 Aoife Cahill nformation Status in Generation Ranking 57 / 57 Outstanding ssues and Future Directions nvestigate the integration of lexical (and other) resources to improve the classification of certain phrases Currently we still only consider single sentences. Future work will also look at preceding context Look into carrying out an experiment with human annotators, automatically suggesting labels for them Continue working with colleagues to improve the automatic co-reference detection for our purposes and also apply it to the TGER training corpuse nvestigate other parsers during feature extraction for S label prediction model

88 Aoife Cahill nformation Status in Generation Ranking 58 / 57 Thank you! This work was funded by the Collaborative Research Centre (SFB 732) at the University of Stuttgart.

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG Dr. Kakia Chatsiou, University of Essex achats at essex.ac.uk Explorations in Syntactic Government and Subcategorisation,

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Annotation Projection for Discourse Connectives

Annotation Projection for Discourse Connectives SFB 833 / Univ. Tübingen Penn Discourse Treebank Workshop Annotation projection Basic idea: Given a bitext E/F and annotation for F, how would the annotation look for E? Examples: Word Sense Disambiguation

More information

Adapting Stochastic Output for Rule-Based Semantics

Adapting Stochastic Output for Rule-Based Semantics Adapting Stochastic Output for Rule-Based Semantics Wissenschaftliche Arbeit zur Erlangung des Grades eines Diplom-Handelslehrers im Fachbereich Wirtschaftswissenschaften der Universität Konstanz Februar

More information

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions. to as a linguistic theory to to a member of the family of linguistic frameworks that are called generative grammars a grammar which is formalized to a high degree and thus makes exact predictions about

More information

Theoretical Syntax Winter Answers to practice problems

Theoretical Syntax Winter Answers to practice problems Linguistics 325 Sturman Theoretical Syntax Winter 2017 Answers to practice problems 1. Draw trees for the following English sentences. a. I have not been running in the mornings. 1 b. Joel frequently sings

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization Annemarie Friedrich, Marina Valeeva and Alexis Palmer COMPUTATIONAL LINGUISTICS & PHONETICS SAARLAND UNIVERSITY, GERMANY

More information

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing. Lecture 4: OT Syntax Sources: Kager 1999, Section 8; Legendre et al. 1998; Grimshaw 1997; Barbosa et al. 1998, Introduction; Bresnan 1998; Fanselow et al. 1999; Gibson & Broihier 1998. OT is not a theory

More information

Chapter 4: Valence & Agreement CSLI Publications

Chapter 4: Valence & Agreement CSLI Publications Chapter 4: Valence & Agreement Reminder: Where We Are Simple CFG doesn t allow us to cross-classify categories, e.g., verbs can be grouped by transitivity (deny vs. disappear) or by number (deny vs. denies).

More information

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque Approaches to control phenomena handout 6 5.4 Obligatory control and morphological case: Icelandic and Basque Icelandinc quirky case (displaying properties of both structural and inherent case: lexically

More information

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cl] 2 Apr 2017 Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

The Role of the Head in the Interpretation of English Deverbal Compounds

The Role of the Head in the Interpretation of English Deverbal Compounds The Role of the Head in the Interpretation of English Deverbal Compounds Gianina Iordăchioaia i, Lonneke van der Plas ii, Glorianna Jagfeld i (Universität Stuttgart i, University of Malta ii ) Wen wurmt

More information

SEMAFOR: Frame Argument Resolution with Log-Linear Models

SEMAFOR: Frame Argument Resolution with Log-Linear Models SEMAFOR: Frame Argument Resolution with Log-Linear Models Desai Chen or, The Case of the Missing Arguments Nathan Schneider SemEval July 16, 2010 Dipanjan Das School of Computer Science Carnegie Mellon

More information

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3 Inleiding Taalkunde Docent: Paola Monachesi Blok 4, 2001/2002 Contents 1 Syntax 2 2 Phrases and constituent structure 2 3 A minigrammar of Italian 3 4 Trees 3 5 Developing an Italian lexicon 4 6 S(emantic)-selection

More information

Beyond the Pipeline: Discrete Optimization in NLP

Beyond the Pipeline: Discrete Optimization in NLP Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We

More information

Prediction of Maximal Projection for Semantic Role Labeling

Prediction of Maximal Projection for Semantic Role Labeling Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba

More information

A Computational Evaluation of Case-Assignment Algorithms

A Computational Evaluation of Case-Assignment Algorithms A Computational Evaluation of Case-Assignment Algorithms Miles Calabresi Advisors: Bob Frank and Jim Wood Submitted to the faculty of the Department of Linguistics in partial fulfillment of the requirements

More information

Improving coverage and parsing quality of a large-scale LFG for German

Improving coverage and parsing quality of a large-scale LFG for German Improving coverage and parsing quality of a large-scale LFG for German Christian Rohrer, Martin Forst Institute for Natural Language Processing (IMS) University of Stuttgart Azenbergstr. 12 70174 Stuttgart,

More information

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing Grzegorz Chrupa la A dissertation submitted in fulfilment of the requirements for the award of Doctor of Philosophy (Ph.D.)

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm syntax: from the Greek syntaxis, meaning setting out together

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each

More information

Accurate Unlexicalized Parsing for Modern Hebrew

Accurate Unlexicalized Parsing for Modern Hebrew Accurate Unlexicalized Parsing for Modern Hebrew Reut Tsarfaty and Khalil Sima an Institute for Logic, Language and Computation, University of Amsterdam Plantage Muidergracht 24, 1018TV Amsterdam, The

More information

CS 598 Natural Language Processing

CS 598 Natural Language Processing CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@

More information

Susanne J. Jekat

Susanne J. Jekat IUED: Institute for Translation and Interpreting Respeaking: Loss, Addition and Change of Information during the Transfer Process Susanne J. Jekat susanne.jekat@zhaw.ch This work was funded by Swiss TxT

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES PRO and Control in Lexical Functional Grammar: Lexical or Theory Motivated? Evidence from Kikuyu Njuguna Githitu Bernard Ph.D. Student, University

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,

More information

Control and Boundedness

Control and Boundedness Control and Boundedness Having eliminated rules, we would expect constructions to follow from the lexical categories (of heads and specifiers of syntactic constructions) alone. Combinatory syntax simply

More information

Feature-Based Grammar

Feature-Based Grammar 8 Feature-Based Grammar James P. Blevins 8.1 Introduction This chapter considers some of the basic ideas about language and linguistic analysis that define the family of feature-based grammars. Underlying

More information

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification? Can Human Verb Associations help identify Salient Features for Semantic Verb Classification? Sabine Schulte im Walde Institut für Maschinelle Sprachverarbeitung Universität Stuttgart Seminar für Sprachwissenschaft,

More information

LNGT0101 Introduction to Linguistics

LNGT0101 Introduction to Linguistics LNGT0101 Introduction to Linguistics Lecture #11 Oct 15 th, 2014 Announcements HW3 is now posted. It s due Wed Oct 22 by 5pm. Today is a sociolinguistics talk by Toni Cook at 4:30 at Hillcrest 103. Extra

More information

Multi-Lingual Text Leveling

Multi-Lingual Text Leveling Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency

More information

A relational approach to translation

A relational approach to translation A relational approach to translation Rémi Zajac Project POLYGLOSS* University of Stuttgart IMS-CL /IfI-AIS, KeplerstraBe 17 7000 Stuttgart 1, West-Germany zajac@is.informatik.uni-stuttgart.dbp.de Abstract.

More information

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English. Basic Syntax Doug Arnold doug@essex.ac.uk We review some basic grammatical ideas and terminology, and look at some common constructions in English. 1 Categories 1.1 Word level (lexical and functional)

More information

Some Principles of Automated Natural Language Information Extraction

Some Principles of Automated Natural Language Information Extraction Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

THE VERB ARGUMENT BROWSER

THE VERB ARGUMENT BROWSER THE VERB ARGUMENT BROWSER Bálint Sass sass.balint@itk.ppke.hu Péter Pázmány Catholic University, Budapest, Hungary 11 th International Conference on Text, Speech and Dialog 8-12 September 2008, Brno PREVIEW

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,

More information

A Graph Based Authorship Identification Approach

A Graph Based Authorship Identification Approach A Graph Based Authorship Identification Approach Notebook for PAN at CLEF 2015 Helena Gómez-Adorno 1, Grigori Sidorov 1, David Pinto 2, and Ilia Markov 1 1 Center for Computing Research, Instituto Politécnico

More information

Freitag 7. Januar = QUIZ = REFLEXIVE VERBEN = IM KLASSENZIMMER = JUDD 115

Freitag 7. Januar = QUIZ = REFLEXIVE VERBEN = IM KLASSENZIMMER = JUDD 115 DEUTSCH 3 DIE DEBATTE: GEFÄHRLICHE HAUSTIERE Debatte: Freitag 14. JANUAR, 2011 Bewertung: zwei kleine Prüfungen. Bewertungssystem: (see attached) Thema:Wir haben schon die Geschichte Gefährliche Haustiere

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.

More information

Memory-based grammatical error correction

Memory-based grammatical error correction Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,

More information

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Richard Johansson and Alessandro Moschitti DISI, University of Trento Via Sommarive 14, 38123 Trento (TN),

More information

Proof Theory for Syntacticians

Proof Theory for Syntacticians Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 5, 2012 Logics for Linguistics Many different kinds of logic are directly applicable to formalizing theories in syntax

More information

Developing a TT-MCTAG for German with an RCG-based Parser

Developing a TT-MCTAG for German with an RCG-based Parser Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,

More information

The optimal placement of up and ab A comparison 1

The optimal placement of up and ab A comparison 1 The optimal placement of up and ab A comparison 1 Nicole Dehé Humboldt-University, Berlin December 2002 1 Introduction This paper presents an optimality theoretic approach to the transitive particle verb

More information

AN EXPERIMENTAL APPROACH TO NEW AND OLD INFORMATION IN TURKISH LOCATIVES AND EXISTENTIALS

AN EXPERIMENTAL APPROACH TO NEW AND OLD INFORMATION IN TURKISH LOCATIVES AND EXISTENTIALS AN EXPERIMENTAL APPROACH TO NEW AND OLD INFORMATION IN TURKISH LOCATIVES AND EXISTENTIALS Engin ARIK 1, Pınar ÖZTOP 2, and Esen BÜYÜKSÖKMEN 1 Doguş University, 2 Plymouth University enginarik@enginarik.com

More information

Parsing of part-of-speech tagged Assamese Texts

Parsing of part-of-speech tagged Assamese Texts IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal

More information

An Interactive Intelligent Language Tutor Over The Internet

An Interactive Intelligent Language Tutor Over The Internet An Interactive Intelligent Language Tutor Over The Internet Trude Heift Linguistics Department and Language Learning Centre Simon Fraser University, B.C. Canada V5A1S6 E-mail: heift@sfu.ca Abstract: This

More information

A Comparison of Two Text Representations for Sentiment Analysis

A Comparison of Two Text Representations for Sentiment Analysis 010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational

More information

Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN:

Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN: Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN: 1137-3601 revista@aepia.org Asociación Española para la Inteligencia Artificial España Lucena, Diego Jesus de; Bastos Pereira,

More information

Underlying and Surface Grammatical Relations in Greek consider

Underlying and Surface Grammatical Relations in Greek consider 0 Underlying and Surface Grammatical Relations in Greek consider Sentences Brian D. Joseph The Ohio State University Abbreviated Title Grammatical Relations in Greek consider Sentences Brian D. Joseph

More information

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist Meeting 2 Chapter 7 (Morphology) and chapter 9 (Syntax) Today s agenda Repetition of meeting 1 Mini-lecture on morphology Seminar on chapter 7, worksheet Mini-lecture on syntax Seminar on chapter 9, worksheet

More information

The College Board Redesigned SAT Grade 12

The College Board Redesigned SAT Grade 12 A Correlation of, 2017 To the Redesigned SAT Introduction This document demonstrates how myperspectives English Language Arts meets the Reading, Writing and Language and Essay Domains of Redesigned SAT.

More information

EAGLE: an Error-Annotated Corpus of Beginning Learner German

EAGLE: an Error-Annotated Corpus of Beginning Learner German EAGLE: an Error-Annotated Corpus of Beginning Learner German Adriane Boyd Department of Linguistics The Ohio State University adriane@ling.osu.edu Abstract This paper describes the Error-Annotated German

More information

LING 329 : MORPHOLOGY

LING 329 : MORPHOLOGY LING 329 : MORPHOLOGY TTh 10:30 11:50 AM, Physics 121 Course Syllabus Spring 2013 Matt Pearson Office: Vollum 313 Email: pearsonm@reed.edu Phone: 7618 (off campus: 503-517-7618) Office hrs: Mon 1:30 2:30,

More information

Introduction to Causal Inference. Problem Set 1. Required Problems

Introduction to Causal Inference. Problem Set 1. Required Problems Introduction to Causal Inference Problem Set 1 Professor: Teppei Yamamoto Due Friday, July 15 (at beginning of class) Only the required problems are due on the above date. The optional problems will not

More information

Using Semantic Relations to Refine Coreference Decisions

Using Semantic Relations to Refine Coreference Decisions Using Semantic Relations to Refine Coreference Decisions Heng Ji David Westbrook Ralph Grishman Department of Computer Science New York University New York, NY, 10003, USA hengji@cs.nyu.edu westbroo@cs.nyu.edu

More information

Applications of memory-based natural language processing

Applications of memory-based natural language processing Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal

More information

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Hindi-Urdu Phrase Structure Annotation

Hindi-Urdu Phrase Structure Annotation Hindi-Urdu Phrase Structure Annotation Rajesh Bhatt and Owen Rambow January 12, 2009 1 Design Principle: Minimal Commitments Binary Branching Representations. Mostly lexical projections (P,, AP, AdvP)

More information

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Ulrike Baldewein (ulrike@coli.uni-sb.de) Computational Psycholinguistics, Saarland University D-66041 Saarbrücken,

More information

Loughton School s curriculum evening. 28 th February 2017

Loughton School s curriculum evening. 28 th February 2017 Loughton School s curriculum evening 28 th February 2017 Aims of this session Share our approach to teaching writing, reading, SPaG and maths. Share resources, ideas and strategies to support children's

More information

What the National Curriculum requires in reading at Y5 and Y6

What the National Curriculum requires in reading at Y5 and Y6 What the National Curriculum requires in reading at Y5 and Y6 Word reading apply their growing knowledge of root words, prefixes and suffixes (morphology and etymology), as listed in Appendix 1 of the

More information

In Udmurt (Uralic, Russia) possessors bear genitive case except in accusative DPs where they receive ablative case.

In Udmurt (Uralic, Russia) possessors bear genitive case except in accusative DPs where they receive ablative case. Sören E. Worbs The University of Leipzig Modul 04-046-2015 soeren.e.worbs@gmail.de November 22, 2016 Case stacking below the surface: On the possessor case alternation in Udmurt (Assmann et al. 2014) 1

More information

Indian Institute of Technology, Kanpur

Indian Institute of Technology, Kanpur Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Basic Parsing with Context-Free Grammars Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Announcements HW 2 to go out today. Next Tuesday most important for background to assignment Sign up

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

cmp-lg/ Jul 1995

cmp-lg/ Jul 1995 A CONSTRAINT-BASED CASE FRAME LEXICON ARCHITECTURE 1 Introduction Kemal Oazer and Okan Ylmaz Department of Computer Engineering and Information Science Bilkent University Bilkent, Ankara 0, Turkey fko,okang@cs.bilkent.edu.tr

More information

Disambiguation of Thai Personal Name from Online News Articles

Disambiguation of Thai Personal Name from Online News Articles Disambiguation of Thai Personal Name from Online News Articles Phaisarn Sutheebanjard Graduate School of Information Technology Siam University Bangkok, Thailand mr.phaisarn@gmail.com Abstract Since online

More information

Multilingual Sentiment and Subjectivity Analysis

Multilingual Sentiment and Subjectivity Analysis Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen UNIVERSITY OF OSLO Department of Informatics Dialog Act Recognition using Dependency Features Master s thesis Sindre Wetjen November 15, 2013 Acknowledgments First I want to thank my supervisors Lilja

More information

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS Ruslan Mitkov (R.Mitkov@wlv.ac.uk) University of Wolverhampton ViktorPekar (v.pekar@wlv.ac.uk) University of Wolverhampton Dimitar

More information

The Smart/Empire TIPSTER IR System

The Smart/Empire TIPSTER IR System The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of

More information

Specifying a shallow grammatical for parsing purposes

Specifying a shallow grammatical for parsing purposes Specifying a shallow grammatical for parsing purposes representation Atro Voutilainen and Timo J~irvinen Research Unit for Multilingual Language Technology P.O. Box 4 FIN-0004 University of Helsinki Finland

More information

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence. NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and

More information

THE SOME INDEFINITES

THE SOME INDEFINITES UCLA Working Papers in Linguistics, vol.3, October 1999 Syntax at Sunset 2 Gianluca Storto (ed.) THE SOME INDEFINITES MISHA BECKER mbecker@ucla.edu Important syntactic and semantic differences between

More information

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Minimalism is the name of the predominant approach in generative linguistics today. It was first Minimalism Minimalism is the name of the predominant approach in generative linguistics today. It was first introduced by Chomsky in his work The Minimalist Program (1995) and has seen several developments

More information

Words come in categories

Words come in categories Nouns Words come in categories D: A grammatical category is a class of expressions which share a common set of grammatical properties (a.k.a. word class or part of speech). Words come in categories Open

More information

Switched Control and other 'uncontrolled' cases of obligatory control

Switched Control and other 'uncontrolled' cases of obligatory control Switched Control and other 'uncontrolled' cases of obligatory control Dorothee Beermann and Lars Hellan Norwegian University of Science and Technology, Trondheim, Norway dorothee.beermann@ntnu.no, lars.hellan@ntnu.no

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Interactive Corpus Annotation of Anaphor Using NLP Algorithms

Interactive Corpus Annotation of Anaphor Using NLP Algorithms Interactive Corpus Annotation of Anaphor Using NLP Algorithms Catherine Smith 1 and Matthew Brook O Donnell 1 1. Introduction Pronouns occur with a relatively high frequency in all forms English discourse.

More information

Methods for the Qualitative Evaluation of Lexical Association Measures

Methods for the Qualitative Evaluation of Lexical Association Measures Methods for the Qualitative Evaluation of Lexical Association Measures Stefan Evert IMS, University of Stuttgart Azenbergstr. 12 D-70174 Stuttgart, Germany evert@ims.uni-stuttgart.de Brigitte Krenn Austrian

More information

Dreistadt: A language enabled MOO for language learning

Dreistadt: A language enabled MOO for language learning Dreistadt: A language enabled MOO for language learning Till Christopher Lech 1 and Koenraad de Smedt 2 Abstract. Dreistadt is an educational MOO (Multi User Domain, Object Oriented) for language learning.

More information

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy Informatics 2A: Language Complexity and the Chomsky Hierarchy September 28, 2010 Starter 1 Is there a finite state machine that recognises all those strings s from the alphabet {a, b} where the difference

More information