Corpora and Statistical Methods Lecture 11. Albert Gatt
|
|
- Clementine Carroll
- 6 years ago
- Views:
Transcription
1 Corpora and Statistical Methods Lecture 11 Albert Gatt
2 Part 2 Statistical parsing
3 Preliminary issues How parsers are evaluated
4 Evaluation The issue: what objective criterion are we trying to maximise? i.e. under what objective function can I say that my parser does well (and how well?) need a gold standard Possibilities: strict match of candidate parse against gold standard match of components of candidate parse against gold standard components
5 Evaluation A classic evaluation metric is the PARSEVAL one initiative to compare parsers on the same data not initially concerned with stochastic parsers evaluate parser output piece by piece Main components: compares gold standard tree to parser tree typically, gold standard is the tree in a treebank computes: precision recall crossing brackets
6 PARSEVAL: labeled recall #correct nodes in candidate parse # nodes in treebank parse Correct node = node in candidate parse which: has same node label originally omitted from PARSEVAL to avoid theoretical conflict spans the same words
7 PARSEVAL: labeled precision #correct nodes in candidate parse # nodes in candidate parse The proportion of correctly labelled and correctly spanning nodes in the candidate.
8 Combining Precision and Recall As usual, Precision and recall can be combined into a single F-measure: 1 F 1 1 (1 ) P R
9 PARSEVAL: crossed brackets number of brackets in the candidate parse which cross brackets in the treebank parse e.g. treebank has ((X Y) Z) and candidate has (X (Y Z)) Unlike precision/recall, this is an objective function to minimise
10 Current performance Current parsers achieve: ca. 90% precision >90% recall 1% cross-bracketed constituents
11 Some issues with PARSEVAL 1. These measures evaluate parses at the level of individual decisions (nodes). ignore the difficulty of getting a globally correct solution by carrying out a correct sequence of decisions 2. Success on crossing brackets depends on the kind of parse trees used Penn Treebank has very flat trees (not much embedding), therefore likelihood of crossed brackets decreases. 3. In PARSEVAL, if a constituent is attached lower in a tree than the gold standard, all its daughters are counted wrong.
12 Probabilistic parsing with PCFGs The basic algorithm
13 The basic PCFG parsing algorithm Many statistical parsers use a version of the CYK algorithm. Assumptions: CFG productions are in Chomsky Normal Form. A BC A a Use indices between words: Book the flight through Houston (0) Book (1) the (2) flight (3) through (4) Houston (5) Procedure (bottom-up): Traverse input sentence left-to-right Use a chart to store constituents and their span + their probability.
14 Probabilistic CYK: example PCFG S NP VP [.80] NP Det N [.30] VP V NP [.20] V includes [.05] Det the [.4] Det a [.4] N meal [.01] N flight [.02]
15 Probabilistic CYK: initialisation The flight includes a meal. //Lexical lookup: for j = 1 to length(string) do: chart j-1,j := {X : X->word in G} //syntactic lookup for i = j-2 to 0 do: chart ij := {} for k = i+1 to j-1 do: for each A -> BC do: if B in chart ik & C in chart kj : chart ij := chart ij U {A}
16 Probabilistic CYK: lexical step The flight includes a meal. //Lexical lookup: for j = 1 to length(string) do: chart j-1,j := {X : X->word in G} Det (.4)
17 Probabilistic CYK: lexical step The flight includes a meal. //Lexical lookup: for j = 1 to length(string) do: chart j-1,j := {X : X->word in G} Det (.4) 1 N
18 Probabilistic CYK: syntactic step The flight includes a meal. //Lexical lookup: for j = 1 to length(string) do: chart j-1,j := {X : X->word in G} //syntactic lookup for i = j-2 to 0 do: chart ij := {} for k = i+1 to j-1 do: for each A -> BC do: if B in chart ik & C in chart kj : chart ij := chart ij U {A} Det (.4) NP N
19 Probabilistic CYK: lexical step The flight includes a meal. //Lexical lookup: for j = 1 to length(string) do: chart j-1,j := {X : X->word in G} Det (.4) NP N.02 2 V
20 Probabilistic CYK: lexical step The flight includes a meal. //Lexical lookup: for j = 1 to length(string) do: chart j-1,j := {X : X->word in G} Det (.4) NP N.02 2 V.05 3 Det.4 4 5
21 Probabilistic CYK: syntactic step The flight includes a meal. //Lexical lookup: for j = 1 to length(string) do: chart j-1,j := {X : X->word in G} Det (.4) NP N.02 2 V.05 3 Det.4 4 N.01
22 Probabilistic CYK: syntactic step The flight includes a meal. //Lexical lookup: for j = 1 to length(string) do: chart j-1,j := {X : X->word in G} //syntactic lookup for i = j-2 to 0 do: chart ij := {} for k = i+1 to j-1 do: for each A -> BC do: if B in chart ik & C in chart kj : chart ij := chart ij U {A} Det (.4) NP N.02 2 V.05 3 Det.4 NP N.01
23 Probabilistic CYK: syntactic step The flight includes a meal. //Lexical lookup: for j = 1 to length(string) do: chart j-1,j := {X : X->word in G} //syntactic lookup for i = j-2 to 0 do: chart ij := {} for k = i+1 to j-1 do: for each A -> BC do: if B in chart ik & C in chart kj : chart ij := chart ij U {A} Det (.4) NP N.02 2 V.05 VP Det.4 NP N.01
24 Probabilistic CYK: syntactic step The flight includes a meal. //Lexical lookup: for j = 1 to length(string) do: chart j-1,j := {X : X->word in G} //syntactic lookup for i = j-2 to 0 do: chart ij := {} for k = i+1 to j-1 do: for each A -> BC do: if B in chart ik & C in chart kj : chart ij := chart ij U {A} 0 Det (.4) NP 1 N V.05 3 Det.4 4 N S VP NP
25 Probabilistic CYK: summary Cells in chart hold probabilities Bottom-up procedure computes probability of a parse incrementally. To obtain parse trees, cells need to be augmented with backpointers.
26 Probabilistic parsing with lexicalised PCFGs Main approaches (focus on Collins (1997,1999)) see also: Charniak (1997)
27 Unlexicalised PCFG Estimation Charniak (1996) used Penn Treebank POS and phrasal categories to induce a maximum likelihood PCFG only used relative frequency of local trees as the estimates for rule probabilities did not apply smoothing or any other techniques Works surprisingly well: 80.4% recall; 78.8% precision (crossed brackets not estimated) Suggests that most parsing decisions are mundane and can be handled well by unlexicalized PCFG
28 Probabilistic lexicalised PCFGs Standard format of lexicalised rules: associate head word with non-terminal e.g. dumped sacks into VP(dumped) VBD(dumped) NP(sacks) PP(into) associate head tag with non-terminal VP(dumped,VBD) VBD(dumped,VBD) NP(sacks,NNS) PP(into,IN) Types of rules: lexical rules expand pre-terminals to words: e.g. NNS(sacks,NNS) sacks probability is always 1 internal rules expand non-terminals e.g. VP(dumped,VBD) VBD(dumped,VBD) NP(sacks,NNS) PP(into,IN)
29 Estimating probabilities Non-generative model: take an MLE estimate of the probability of an entire rule Count( VP(dumped, VBD) VBD(dumped, VBD) NP(sacks, NNS) PP(into,IN)) Count( VP(dumped, VBD)) non-generative models suffer from serious data sparseness problems Generative model: estimate the probability of a rule by breaking it up into sub-rules.
30 Collins Model 1 Main idea: represent CFG rules as expansions into Head + left modifiers + right modifiers LHS STOP Ln Ln 1... L1 H R1... Rn 1Rn STOP L i /R i is of the form L/R(word,tag); e.g. NP(sacks,NNS) STOP: special symbol indicating left/right boundary. Parsing: Given the LHS, generate the head of the rule, then the left modifiers (until STOP) and right modifiers (until STOP) inside-out. Each step has a probability.
31 Collins Model 1: example VP(dumped,VBD) VBD(dumped,VBD) NP(sacks,NNS) PP(into,IN) 1. Head H(hw,ht): ( H( hw, ht) Parent, hw, ht) P H P( VBD(dumped, VBD) VP(dumped, VBD))
32 Collins Model 1: example VP(dumped,VBD) VBD(dumped,VBD) NP(sacks,NNS) PP(into,IN) 1. Head H(hw,ht): P H ( H( hw, ht) Parent, hw, ht) 2. Left modifiers: n 1 i 1 P ( L L i ( lw i, lw t ) Parent, H, hw, ht) P( STOP VP(dumped, VBD), VBD(dumped, VBD))
33 Collins Model 1: example VP(dumped,VBD) VBD(dumped,VBD) NP(sacks,NNS) PP(into,IN) 1. Head H(hw,ht): P H ( H( hw, ht) Parent, hw, ht) 2. Left modifiers: 3. Right modifiers: P P P R R R n 1 i 1 n 1 i 1 P R P ( L L ( R i i ( rw ( lw i i, lw, rw t t ) Parent, H, hw, ht) ) Parent, H, hw, ht) ( NP(sacks, NNS) VP(dumped, VBD), VBD(dumped, VBD)) ( PP(into,IN) VP(dumped, VBD), VBD(dumped, VBD)) (STOP VP(dumped, VBD), VBD(dumped, VBD))
34 Collins Model 1: example VP(dumped,VBD) VBD(dumped,VBD) NP(sacks,NNS) PP(into,IN) 1. Head H(hw,ht): 2. Left modifiers: 3. Right modifiers: n 1 i 1 P H ( H( hw, ht) Parent, hw, ht) n 1 i 1 P ( R R P ( L i L i ( rw i ( lw i, rw, lw t t ) Parent, H, hw, ht) ) Parent, H, hw, ht) 4. Total probability: multiplication of (1) (3)
35 Variations on Model 1: distance Collins proposed to extend rules by conditioning on distance of modifiers from the head: P L ( L ( lw, lw ) P, H, hw, ht,distance ( i 1)) i i t PR ( Ri ( rwi, rwt ) P, H, hw, ht,distance R( i 1)) a function of the yield of modifiers seen. L Distance for R 2 probability = words under R 1
36 Using a distance function Simplest kind of distance function is a tuple of binary features: Is the string of length 0? Does the string contain a verb? Example uses: if the string has length 0, P R should be higher: English is right-branching & most right modifiers are adjacent to the head verb if string contains a verb, P R should be higher: accounts for preference to attach dependencies to main verb
37 Further additions Collins Model 2: subcategorisation preferences distinction between complements and adjuncts. Model 3 augmented to deal with long-distance (WH) dependencies.
38 Smoothing and backoff Rules may condition on words that never occur in training data. Collins used 3-level backoff model. Combined using linear interpolation. 1. use head word P ( R ( rw, rw ) Parent, H, hw, ht) R 2. use head tag i i t P ( R ( rw, rw ) Parent, H, ht) R 3. parent only i i t P R ( R ( rw, rw ) Parent ) i i t
39 Other parsing approaches
40 Data-oriented parsing Alternative to grammar-based models does not attempt to derive a grammar from a treebank treebank data is stored as fragments of trees parser uses whichever trees seem to be useful
41 Data-oriented parsing Suppose we want to parse Sue heard Jim. Corpus contains the following potentially useful fragments: Parser can combine these to give a parse
42 Data-oriented Parsing Multiple fundamentally distinct derivations of a single tree. Parse using Monte Carlo simulation methods: randomly produce a large sample of derivations use these to find the most probable parse disadvantage: needs very large samples to make parses accurate, therefore potentially slow
43 Data-oriented parsing vs. PCFGs Possible advantages: using partial trees directly accounts for lexical dependencies also accounts for multi-word expressions and idioms (e.g. take advantage of) while PCFG rules only represent trees of depth 1, DOP fragments can represent trees of arbitrary length Similarities to PCFG: tree fragments could be equivalent to PCFG rules probabilities estimated for grammar rules are exactly the same as for tree fragments
44 History Based Grammars (HBG) General idea: any derivational step can be influenced by any earlier derivational step (Black et al. 1993) the probability of expansion of the current node conditioned on all previous nodes along the path from the root
45 History Based Grammars (HBG) Black et al lexicalise their grammar. every phrasal node inherits 2 words: its lexical head H 1 a secondary head H 2, deemed to be useful e.g. the PP in the bank might have H1=in and H2=bank Every non-terminal is also assigned: a syntactic category (Syn) e.g. PP a semantic category (Sem) e.g with-data Use the index I that indicates what number child of the parent node is being expanded
46 HBG Example (Black et al 1993)
47 History Based Grammars (HBG) Estimation of the probability of a rule R: P( Syn, Sem, R, H, H2 Syn p, Sem p, Rp, I, H1, H2 1 p p ) probability of: the current rule R to be applied its Syn and Sem category its heads H1 and H2 conditioned on: Syn and Sem of parent node the rule that gave rise to the parent the index of this child relative to the parent the heads H1 and H2 of the parent
48 Summary This concludes our overview of statistical parsing We ve looked at three important models Also considered basic search techniques and algorithms
11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation
tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each
More informationBasic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1
Basic Parsing with Context-Free Grammars Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Announcements HW 2 to go out today. Next Tuesday most important for background to assignment Sign up
More informationGrammars & Parsing, Part 1:
Grammars & Parsing, Part 1: Rules, representations, and transformations- oh my! Sentence VP The teacher Verb gave the lecture 2015-02-12 CS 562/662: Natural Language Processing Game plan for today: Review
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationModeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures
Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Ulrike Baldewein (ulrike@coli.uni-sb.de) Computational Psycholinguistics, Saarland University D-66041 Saarbrücken,
More informationSyntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm syntax: from the Greek syntaxis, meaning setting out together
More informationPrediction of Maximal Projection for Semantic Role Labeling
Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba
More informationAn Efficient Implementation of a New POP Model
An Efficient Implementation of a New POP Model Rens Bod ILLC, University of Amsterdam School of Computing, University of Leeds Nieuwe Achtergracht 166, NL-1018 WV Amsterdam rens@science.uva.n1 Abstract
More informationChunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.
NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and
More informationNatural Language Processing. George Konidaris
Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2017 Natural Language Processing Understanding spoken/written sentences in a natural language. Major area of research in AI. Why? Humans
More informationContext Free Grammars. Many slides from Michael Collins
Context Free Grammars Many slides from Michael Collins Overview I An introduction to the parsing problem I Context free grammars I A brief(!) sketch of the syntax of English I Examples of ambiguous structures
More informationEnhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities
Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion
More informationIntroduction to Simulation
Introduction to Simulation Spring 2010 Dr. Louis Luangkesorn University of Pittsburgh January 19, 2010 Dr. Louis Luangkesorn ( University of Pittsburgh ) Introduction to Simulation January 19, 2010 1 /
More informationDeveloping a TT-MCTAG for German with an RCG-based Parser
Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,
More informationInformatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy
Informatics 2A: Language Complexity and the Chomsky Hierarchy September 28, 2010 Starter 1 Is there a finite state machine that recognises all those strings s from the alphabet {a, b} where the difference
More informationThe stages of event extraction
The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks
More informationTowards a MWE-driven A* parsing with LTAGs [WG2,WG3]
Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Jakub Waszczuk, Agata Savary To cite this version: Jakub Waszczuk, Agata Savary. Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]. PARSEME 6th general
More informationAccurate Unlexicalized Parsing for Modern Hebrew
Accurate Unlexicalized Parsing for Modern Hebrew Reut Tsarfaty and Khalil Sima an Institute for Logic, Language and Computation, University of Amsterdam Plantage Muidergracht 24, 1018TV Amsterdam, The
More informationParsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank
Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank Dan Klein and Christopher D. Manning Computer Science Department Stanford University Stanford,
More informationAnalysis of Probabilistic Parsing in NLP
Analysis of Probabilistic Parsing in NLP Krishna Karoo, Dr.Girish Katkar Research Scholar, Department of Electronics & Computer Science, R.T.M. Nagpur University, Nagpur, India Head of Department, Department
More informationUNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen
UNIVERSITY OF OSLO Department of Informatics Dialog Act Recognition using Dependency Features Master s thesis Sindre Wetjen November 15, 2013 Acknowledgments First I want to thank my supervisors Lilja
More informationProof Theory for Syntacticians
Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 5, 2012 Logics for Linguistics Many different kinds of logic are directly applicable to formalizing theories in syntax
More informationSEMAFOR: Frame Argument Resolution with Log-Linear Models
SEMAFOR: Frame Argument Resolution with Log-Linear Models Desai Chen or, The Case of the Missing Arguments Nathan Schneider SemEval July 16, 2010 Dipanjan Das School of Computer Science Carnegie Mellon
More informationLTAG-spinal and the Treebank
LTAG-spinal and the Treebank a new resource for incremental, dependency and semantic parsing Libin Shen (lshen@bbn.com) BBN Technologies, 10 Moulton Street, Cambridge, MA 02138, USA Lucas Champollion (champoll@ling.upenn.edu)
More informationAdapting Stochastic Output for Rule-Based Semantics
Adapting Stochastic Output for Rule-Based Semantics Wissenschaftliche Arbeit zur Erlangung des Grades eines Diplom-Handelslehrers im Fachbereich Wirtschaftswissenschaften der Universität Konstanz Februar
More informationChapter 4: Valence & Agreement CSLI Publications
Chapter 4: Valence & Agreement Reminder: Where We Are Simple CFG doesn t allow us to cross-classify categories, e.g., verbs can be grouped by transitivity (deny vs. disappear) or by number (deny vs. denies).
More informationTowards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la
Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing Grzegorz Chrupa la A dissertation submitted in fulfilment of the requirements for the award of Doctor of Philosophy (Ph.D.)
More informationThree New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA
Three New Probabilistic Models for Dependency Parsing: An Exploration Jason M. Eisner CIS Department, University of Pennsylvania 200 S. 33rd St., Philadelphia, PA 19104-6389, USA jeisner@linc.cis.upenn.edu
More informationSome Principles of Automated Natural Language Information Extraction
Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract
More information2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases
POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz
More informationParsing of part-of-speech tagged Assamese Texts
IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal
More informationThe Internet as a Normative Corpus: Grammar Checking with a Search Engine
The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a
More informationCS 598 Natural Language Processing
CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@
More informationRANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S
N S ER E P S I M TA S UN A I S I T VER RANKING AND UNRANKING LEFT SZILARD LANGUAGES Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A-1997-2 UNIVERSITY OF TAMPERE DEPARTMENT OF
More informationThe presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.
Lecture 4: OT Syntax Sources: Kager 1999, Section 8; Legendre et al. 1998; Grimshaw 1997; Barbosa et al. 1998, Introduction; Bresnan 1998; Fanselow et al. 1999; Gibson & Broihier 1998. OT is not a theory
More informationDomain Adaptation for Parsing
Domain Adaptation for Parsing Barbara Plank CLCG The work presented here was carried out under the auspices of the Center for Language and Cognition Groningen (CLCG) at the Faculty of Arts of the University
More informationESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly
ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly Inflected Languages Classical Approaches to Tagging The slides are posted on the web. The url is http://chss.montclair.edu/~feldmana/esslli10/.
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationThe Discourse Anaphoric Properties of Connectives
The Discourse Anaphoric Properties of Connectives Cassandre Creswell, Kate Forbes, Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi Λ, Bonnie Webber y Λ University of Pennsylvania 3401 Walnut Street Philadelphia,
More informationCompositional Semantics
Compositional Semantics CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Words, bag of words Sequences Trees Meaning Representing Meaning An important goal of NLP/AI: convert natural language
More informationA Graph Based Authorship Identification Approach
A Graph Based Authorship Identification Approach Notebook for PAN at CLEF 2015 Helena Gómez-Adorno 1, Grigori Sidorov 1, David Pinto 2, and Ilia Markov 1 1 Center for Computing Research, Instituto Politécnico
More informationBANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS
Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.
More informationLearning Computational Grammars
Learning Computational Grammars John Nerbonne, Anja Belz, Nicola Cancedda, Hervé Déjean, James Hammerton, Rob Koeling, Stasinos Konstantopoulos, Miles Osborne, Franck Thollard and Erik Tjong Kim Sang Abstract
More informationThe Interface between Phrasal and Functional Constraints
The Interface between Phrasal and Functional Constraints John T. Maxwell III* Xerox Palo Alto Research Center Ronald M. Kaplan t Xerox Palo Alto Research Center Many modern grammatical formalisms divide
More information"f TOPIC =T COMP COMP... OBJ
TREATMENT OF LONG DISTANCE DEPENDENCIES IN LFG AND TAG: FUNCTIONAL UNCERTAINTY IN LFG IS A COROLLARY IN TAG" Aravind K. Joshi Dept. of Computer & Information Science University of Pennsylvania Philadelphia,
More informationA Computational Evaluation of Case-Assignment Algorithms
A Computational Evaluation of Case-Assignment Algorithms Miles Calabresi Advisors: Bob Frank and Jim Wood Submitted to the faculty of the Department of Linguistics in partial fulfillment of the requirements
More informationCan Human Verb Associations help identify Salient Features for Semantic Verb Classification?
Can Human Verb Associations help identify Salient Features for Semantic Verb Classification? Sabine Schulte im Walde Institut für Maschinelle Sprachverarbeitung Universität Stuttgart Seminar für Sprachwissenschaft,
More informationTarget Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data
Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationLNGT0101 Introduction to Linguistics
LNGT0101 Introduction to Linguistics Lecture #11 Oct 15 th, 2014 Announcements HW3 is now posted. It s due Wed Oct 22 by 5pm. Today is a sociolinguistics talk by Toni Cook at 4:30 at Hillcrest 103. Extra
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationA Version Space Approach to Learning Context-free Grammars
Machine Learning 2: 39~74, 1987 1987 Kluwer Academic Publishers, Boston - Manufactured in The Netherlands A Version Space Approach to Learning Context-free Grammars KURT VANLEHN (VANLEHN@A.PSY.CMU.EDU)
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationInleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3
Inleiding Taalkunde Docent: Paola Monachesi Blok 4, 2001/2002 Contents 1 Syntax 2 2 Phrases and constituent structure 2 3 A minigrammar of Italian 3 4 Trees 3 5 Developing an Italian lexicon 4 6 S(emantic)-selection
More informationControl and Boundedness
Control and Boundedness Having eliminated rules, we would expect constructions to follow from the lexical categories (of heads and specifiers of syntactic constructions) alone. Combinatory syntax simply
More informationEnsemble Technique Utilization for Indonesian Dependency Parser
Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id
More informationCOMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR
COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR ROLAND HAUSSER Institut für Deutsche Philologie Ludwig-Maximilians Universität München München, West Germany 1. CHOICE OF A PRIMITIVE OPERATION The
More informationMemory-based grammatical error correction
Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,
More informationAn Interactive Intelligent Language Tutor Over The Internet
An Interactive Intelligent Language Tutor Over The Internet Trude Heift Linguistics Department and Language Learning Centre Simon Fraser University, B.C. Canada V5A1S6 E-mail: heift@sfu.ca Abstract: This
More informationConstraining X-Bar: Theta Theory
Constraining X-Bar: Theta Theory Carnie, 2013, chapter 8 Kofi K. Saah 1 Learning objectives Distinguish between thematic relation and theta role. Identify the thematic relations agent, theme, goal, source,
More informationArgument structure and theta roles
Argument structure and theta roles Introduction to Syntax, EGG Summer School 2017 András Bárány ab155@soas.ac.uk 26 July 2017 Overview Where we left off Arguments and theta roles Some consequences of theta
More informationBeyond the Pipeline: Discrete Optimization in NLP
Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We
More informationMachine Learning from Garden Path Sentences: The Application of Computational Linguistics
Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,
More informationRefining the Design of a Contracting Finite-State Dependency Parser
Refining the Design of a Contracting Finite-State Dependency Parser Anssi Yli-Jyrä and Jussi Piitulainen and Atro Voutilainen The Department of Modern Languages PO Box 3 00014 University of Helsinki {anssi.yli-jyra,jussi.piitulainen,atro.voutilainen}@helsinki.fi
More informationRadius STEM Readiness TM
Curriculum Guide Radius STEM Readiness TM While today s teens are surrounded by technology, we face a stark and imminent shortage of graduates pursuing careers in Science, Technology, Engineering, and
More informationPlanning with External Events
94 Planning with External Events Jim Blythe School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 blythe@cs.cmu.edu Abstract I describe a planning methodology for domains with uncertainty
More informationIterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages
Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer
More informationBasic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.
Basic Syntax Doug Arnold doug@essex.ac.uk We review some basic grammatical ideas and terminology, and look at some common constructions in English. 1 Categories 1.1 Word level (lexical and functional)
More informationFragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing
Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing D. Indhumathi Research Scholar Department of Information Technology
More informationHyperedge Replacement and Nonprojective Dependency Structures
Hyperedge Replacement and Nonprojective Dependency Structures Daniel Bauer and Owen Rambow Columbia University New York, NY 10027, USA {bauer,rambow}@cs.columbia.edu Abstract Synchronous Hyperedge Replacement
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationApproaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque
Approaches to control phenomena handout 6 5.4 Obligatory control and morphological case: Icelandic and Basque Icelandinc quirky case (displaying properties of both structural and inherent case: lexically
More informationEdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,
More informationarxiv: v1 [math.at] 10 Jan 2016
THE ALGEBRAIC ATIYAH-HIRZEBRUCH SPECTRAL SEQUENCE OF REAL PROJECTIVE SPECTRA arxiv:1601.02185v1 [math.at] 10 Jan 2016 GUOZHEN WANG AND ZHOULI XU Abstract. In this note, we use Curtis s algorithm and the
More information(Sub)Gradient Descent
(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include
More informationConstruction Grammar. University of Jena.
Construction Grammar Holger Diessel University of Jena holger.diessel@uni-jena.de http://www.holger-diessel.de/ Words seem to have a prototype structure; but language does not only consist of words. What
More informationThe Smart/Empire TIPSTER IR System
The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of
More informationWhat Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017
What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017 Supervised Training of Neural Networks for Language Training Data Training Model this is an example the cat went to
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationAQUA: An Ontology-Driven Question Answering System
AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.
More informationAnnotation Projection for Discourse Connectives
SFB 833 / Univ. Tübingen Penn Discourse Treebank Workshop Annotation projection Basic idea: Given a bitext E/F and annotation for F, how would the annotation look for E? Examples: Word Sense Disambiguation
More informationA Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many
Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.
More informationCase government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG
Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG Dr. Kakia Chatsiou, University of Essex achats at essex.ac.uk Explorations in Syntactic Government and Subcategorisation,
More informationPOS tagging of Chinese Buddhist texts using Recurrent Neural Networks
POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important
More informationOrganizational Knowledge Distribution: An Experimental Evaluation
Association for Information Systems AIS Electronic Library (AISeL) AMCIS 24 Proceedings Americas Conference on Information Systems (AMCIS) 12-31-24 : An Experimental Evaluation Surendra Sarnikar University
More informationWE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT
WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationRule Learning with Negation: Issues Regarding Effectiveness
Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX
More informationSecond Exam: Natural Language Parsing with Neural Networks
Second Exam: Natural Language Parsing with Neural Networks James Cross May 21, 2015 Abstract With the advent of deep learning, there has been a recent resurgence of interest in the use of artificial neural
More informationA deep architecture for non-projective dependency parsing
Universidade de São Paulo Biblioteca Digital da Produção Intelectual - BDPI Departamento de Ciências de Computação - ICMC/SCC Comunicações em Eventos - ICMC/SCC 2015-06 A deep architecture for non-projective
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationFormulaic Language and Fluency: ESL Teaching Applications
Formulaic Language and Fluency: ESL Teaching Applications Formulaic Language Terminology Formulaic sequence One such item Formulaic language Non-count noun referring to these items Phraseology The study
More informationCharacter Stream Parsing of Mixed-lingual Text
Character Stream Parsing of Mixed-lingual Text Harald Romsdorfer and Beat Pfister Speech Processing Group Computer Engineering and Networks Laboratory ETH Zurich {romsdorfer,pfister}@tik.ee.ethz.ch Abstract
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationDiscriminative Learning of Beam-Search Heuristics for Planning
Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University
More informationUniversiteit Leiden ICT in Business
Universiteit Leiden ICT in Business Ranking of Multi-Word Terms Name: Ricardo R.M. Blikman Student-no: s1184164 Internal report number: 2012-11 Date: 07/03/2013 1st supervisor: Prof. Dr. J.N. Kok 2nd supervisor:
More informationCS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus
CS 1103 Computer Science I Honors Fall 2016 Instructor Muller Syllabus Welcome to CS1103. This course is an introduction to the art and science of computer programming and to some of the fundamental concepts
More informationPurdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study
Purdue Data Summit 2017 Communication of Big Data Analytics New SAT Predictive Validity Case Study Paul M. Johnson, Ed.D. Associate Vice President for Enrollment Management, Research & Enrollment Information
More informationIntroduction to Causal Inference. Problem Set 1. Required Problems
Introduction to Causal Inference Problem Set 1 Professor: Teppei Yamamoto Due Friday, July 15 (at beginning of class) Only the required problems are due on the above date. The optional problems will not
More information