UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES

Similar documents
Grammars & Parsing, Part 1:

Parsing of part-of-speech tagged Assamese Texts

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Language properties and Grammar of Parallel and Series Parallel Languages

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

CS 598 Natural Language Processing

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Developing a TT-MCTAG for German with an RCG-based Parser

Proof Theory for Syntacticians

Context Free Grammars. Many slides from Michael Collins

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Using dialogue context to improve parsing performance in dialogue systems

Compositional Semantics

Natural Language Processing. George Konidaris

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Chapter 4: Valence & Agreement CSLI Publications

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

A R "! I,,, !~ii ii! A ow ' r.-ii ' i ' JA' V5, 9. MiN, ;

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

A Version Space Approach to Learning Context-free Grammars

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Parsing natural language

(Sub)Gradient Descent

Some Principles of Automated Natural Language Information Extraction

BULATS A2 WORDLIST 2

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

A General Class of Noncontext Free Grammars Generating Context Free Languages

The Role of the Head in the Interpretation of English Deverbal Compounds

Derivational and Inflectional Morphemes in Pak-Pak Language

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Enumeration of Context-Free Languages and Related Structures

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

CS 101 Computer Science I Fall Instructor Muller. Syllabus

CS Machine Learning

Ensemble Technique Utilization for Indonesian Dependency Parser

Dear Teacher: Welcome to Reading Rods! Reading Rods offer many outstanding features! Read on to discover how to put Reading Rods to work today!

Pre-Processing MRSes

Refining the Design of a Contracting Finite-State Dependency Parser

Words come in categories

What the National Curriculum requires in reading at Y5 and Y6

The College Board Redesigned SAT Grade 12

Constraining X-Bar: Theta Theory

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

"f TOPIC =T COMP COMP... OBJ

Cognitive Modeling. Tower of Hanoi: Description. Tower of Hanoi: The Task. Lecture 5: Models of Problem Solving. Frank Keller.

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

A Computational Evaluation of Case-Assignment Algorithms

Prediction of Maximal Projection for Semantic Role Labeling

cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Hyperedge Replacement and Nonprojective Dependency Structures

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

The Interface between Phrasal and Functional Constraints

Discriminative Learning of Beam-Search Heuristics for Planning

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Radius STEM Readiness TM

The Strong Minimalist Thesis and Bounded Optimality

California Department of Education English Language Development Standards for Grade 8

The Discourse Anaphoric Properties of Connectives

WSU Five-Year Program Review Self-Study Cover Page

Adapting Stochastic Output for Rule-Based Semantics

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

Analysis of Probabilistic Parsing in NLP

Aspectual Classes of Verb Phrases

AQUA: An Ontology-Driven Question Answering System

Hindi Aspectual Verb Complexes

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

PRODUCT PLATFORM DESIGN: A GRAPH GRAMMAR APPROACH

DIRECT AND INDIRECT SPEECH

Type Theory and Universal Grammar

ARNE - A tool for Namend Entity Recognition from Arabic Text

arxiv: v1 [math.at] 10 Jan 2016

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

Developing Grammar in Context

An Evaluation of POS Taggers for the CHILDES Corpus

Specifying Logic Programs in Controlled Natural Language

Writing a composition

Adjectives tell you more about a noun (for example: the red dress ).

Learning Computational Grammars

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Controlled vocabulary

SAMPLE. Chapter 1: Background. A. Basic Introduction. B. Why It s Important to Teach/Learn Grammar in the First Place

The stages of event extraction

Grade 6: Correlated to AGS Basic Math Skills

Probabilistic Latent Semantic Analysis

MYCIN. The MYCIN Task

CEFR Overall Illustrative English Proficiency Scales

Transcription:

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES Monday 9 th December 2013 09:30 to 11:30 INSTRUCTIONS TO CANDIDATES 1. Answer all five questions in Part A, and two out of three questions in Part B. Each question in Part A is worth 10% of the total exam mark; each question in Part B is worth 25%. 2. Use a single script book for all questions. 3. Calculators may be used in this exam. Convener: J. Bradfield External Examiner: C. Johnson THIS EXAMINATION WILL BE MARKED ANONYMOUSLY

PART A ANSWER ALL QUESTIONS IN PART A. 1. (a) Explain briefly what is meant by a language processing pipeline. List, in order, the stages in a typical such pipeline (i) for the Java programming language, and (ii) for spoken English. You need not mention all possible stages in the pipeline, but should include at least four important stages in each list. (b) Mention two important differences in nature between formal and natural languages. Briefly explain what impact these differences have on the way such languages must be processed. [6 marks] 2. For each n 1 consider the language L n, over the alphabet Σ = {a, b, c}, defined below. L n = {x Σ x n and the nth symbol from the end of x is a} For example, the strings aa, ab and bac are in L 2, but a and aca are not. And the string cabcb is in L 4, but caba is not. (a) Draw an NFA that recognises the language L 4. (b) Draw a minimal DFA recognising the language L 2. (c) For n 1, how many states does a minimal DFA for L n have? [5 marks] [1 mark] 3. Consider the following simple context-free grammar for commands in English. The start symbol is Com. Com give NP NP NP DT Nom Nom N A N DT the a N dog bone A large small (a) Convert this grammar to Chomsky Normal Form (CNF). You need not write out the productions with left hand side DT, N or A, which will remain unchanged. (b) With respect to your CNF grammar, draw a CYK parse chart for the command: give the dog a bone Include pointers to show how each compound phrase is built from smaller phrases. [6 marks] Page 1 of 6

4. In the following context-free grammar, each rule is equipped with a semantic attachment showing how the corresponding phrases may be translated to lambdaexpressions involving logical symbols and connectives. Note that Rains and Snows are propositional constants (i.e. predicate symbols with no arguments). The start symbol is BigS. BigS S if S { S 2.Sem S 1.Sem} S it WeatherV { WeatherV.Sem} WeatherV rains { Rains } WeatherV snows { Snows } S there is NP { x. NP.Sem(x) } NP a Nom { Nom.Sem} Nom A N { λy. A.Sem(y) N.Sem(y) } A serious { λz. Serious(z) } N problem { λz. Problem(z) } (a) Draw the parse tree for the following sentence, allowing plenty of room for annotations: there is a serious problem if it snows Starting from the leaves of the tree and working upwards, annotate each node with the raw lambda-expression assigned to it by the above semantics. (b) Show how the lambda-expression associated with the complete sentence above β-reduces in several steps to a logical expression in normal form. You should show each β-reduction step explicitly. [6 marks] 5. (a) Consider the following noncontracting grammar, with terminals Σ = {a, b,!}; nonterminals S (the start symbol), T, A and B; and productions S! T T A a B b A a T B b T a A A a a B B a! A a! b A A b! B b! b B B b Give a full derivation of: a b! a b (b) Give a concise mathematical description of the language generated by the grammar in part (a). (c) At which level does this language reside in the Chomsky hierarchy? [7 marks] [1 mark] Page 2 of 6

PART B ANSWER TWO QUESTIONS FROM PART B. 6. (a) Consider the following list of different types of machine. i. Deterministic finite automata ii. Nondeterministic pushdown automata iii. Nondeterministic linear bounded automata iv. Turing machines For each machine type in the above list, name the class of languages recognised by machines of that type, and give an example of a language in the class that cannot be recognised by any machine type higher up the list. (b) Consider a pushdown automaton (PDA) with two control states Q = {, q2}, start state, input alphabet Σ = {a, b}, stack alphabet Γ = {a, b, } (where is the start symbol), and transition relation: [8 marks] a, : a a, a : a a a, b : ɛ b, : b b q2 b, a : ɛ q2 q2 b, b : b b b ɛ, : b ɛ, a : ɛ ɛ, : ɛ The automaton accepts on empty stack. (In the above description, we use the general notation q s, x : α q to mean that when the automaton is in control state q Q and x Γ is popped from the top of the stack, the input symbol or empty string s Σ {ɛ} can be read to reach control state q Q with α Γ pushed onto the stack.) Describe in detail an execution of the above PDA that accepts the string aaabba (c) Give a concise mathematical definition of the language L recognised by the PDA above. (d) Prove that the language L, defined in your answer to (c) above, cannot be recognised by any deterministic finite automaton. [8 marks] [7 marks] Page 3 of 6

7. The following is part of a grammar for generating simple cookery instructions: S place NP PP remove NP from NP NP Noun NP PP PP Prep NP The start symbol is S. The terminal symbols Noun and Prep stand for word classes as follows: Noun: { chips, burgers, tray, plate, oven } Prep: { in, on, with } (a) Draw all possible parse trees for the phrase: place chips on tray in oven (b) Treating the symbols Noun and Prep as terminals, use the Earley algorithm to parse the sequence place Noun Prep Noun You should show the execution of the algorithm as a table with a row for each step. Each row should include the start and end position of the portion of input processed, and a letter P, S or C to indicate whether the step is due to the predictor, scanner or completer. To get you started, the first row of your table should be S place NP PP [0,0] P You need not include all possible steps, but you should include all steps that contribute to a successful parse, and at least two that do not. You should explicitly mark the latter steps as spurious. (c) One possible method for resolving ambiguities in context-free grammars is to attach a probability weighting to each production rule, and then select the parse tree with the highest overall probability. Could this approach be used to select a preferred parsing for the above phrase? Briefly justify your answer. (d) Another possible way to resolve ambiguities is to rewrite the grammar. Design an LL(1) grammar that is equivalent to the one above (again treating Noun and Prep as terminals). You may introduce new non-terminal symbols if they are helpful. (e) Draw up the LL(1) parse table for your grammar. (You need not exhibit the First and Follow sets as part of your solution.) [9 marks] [5 marks] [7 marks] Page 4 of 6

8. In this question, we will consider an example of POS tagging using the following six tags, representing singular and plural nouns, third person singular and plural present-tense verbs, adjectives and adverbs: Ns Np Vs Vp Adj Adv We will also consider six words, to each of which is associated a stem and two or more parts of speech as follows: Word Stem Parts of speech fast fast Ns, Vp, Adj, Adv fasts fast Np, Vs hold hold Ns, Vp holds hold Np, Vs hook hook Ns, Vp hooks hook Np, Vs (a) Draw the state diagram for a non-deterministic finite state transducer which accepts as input any of the above six words (as a sequence of letters) followed by the word boundary marker #, and which can produce as output the corresponding stem followed by any of the possible POS tags for the word. For example, given the input string hold#, the transducer should be capable of producing either hold Ns or hold Np as output (and nothing else). The input alphabet for the transducer should be { a,...,z,# }, while the output alphabet should be { a,...,z } together with the set of six POS tags (we regard each POS tag as a single output symbol). Finally, your transducer should have as few states as possible for the task it performs. However, you will not be penalized heavily if your transducer has just a few more states than necessary. (b) We will now work towards tagging the sequence: hook holds fast In this part of the question, we shall consider the simpler problem of tagging the corresponding sequence of stems: hook hold fast using just the reduced tagset N, V, Adj, Adv. [11 marks] Page 5 of 6

Use the Viterbi algorithm to tag this sequence, using the following transition and emission probabilities: to N to V to Adj to Adv from start 0.6 0.2 0.1 0.1 from N 0.4 0.3 0.1 0.2 from V 0.3 0.2 0.2 0.3 from Adj 0.5 0.1 0.3 0.1 from Adv 0.2 0.5 0.1 0.2 Transitions fast hold hook N 0.2 0.3 0.5 V 0.2 0.5 0.3 Adj 1.0 0 0 Adv 1.0 0 0 Emissions Show your working, and include explicit backtrace pointers in your Viterbi matrix. (c) Explain how the techniques from parts (a) and (b) can be combined to produce taggings for word sequences using our original set of six tags. Illustrate the method by tagging the sequence: hook holds fast [10 marks] Page 6 of 6