Exam Speech and Language Processing 1 (216631) 24 January 2006

Similar documents
CS 598 Natural Language Processing

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Parsing of part-of-speech tagged Assamese Texts

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

The analysis starts with the phonetic vowel and consonant charts based on the dataset:

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Chapter 4: Valence & Agreement CSLI Publications

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Natural Language Processing. George Konidaris

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Grammars & Parsing, Part 1:

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Phonological Processing for Urdu Text to Speech System

Refining the Design of a Contracting Finite-State Dependency Parser

Phonological and Phonetic Representations: The Case of Neutralization

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Developing a TT-MCTAG for German with an RCG-based Parser

Using dialogue context to improve parsing performance in dialogue systems

Lexical phonology. Marc van Oostendorp. December 6, Until now, we have presented phonological theory as if it is a monolithic

Proof Theory for Syntacticians

Context Free Grammars. Many slides from Michael Collins

Adapting Stochastic Output for Rule-Based Semantics

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Constraining X-Bar: Theta Theory

Ohio s Learning Standards-Clear Learning Targets

BULATS A2 WORDLIST 2

The Acquisition of Person and Number Morphology Within the Verbal Domain in Early Greek

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Derivational and Inflectional Morphemes in Pak-Pak Language

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Consonants: articulation and transcription

What the National Curriculum requires in reading at Y5 and Y6

LING 329 : MORPHOLOGY

Parsing natural language

Developing Grammar in Context

Underlying Representations

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Journal of Phonetics

Compositional Semantics

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Ch VI- SENTENCE PATTERNS.

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

More Morphology. Problem Set #1 is up: it s due next Thursday (1/19) fieldwork component: Figure out how negation is expressed in your language.

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Character Stream Parsing of Mixed-lingual Text

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Using a Native Language Reference Grammar as a Language Learning Tool

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

An Interactive Intelligent Language Tutor Over The Internet

Basic concepts: words and morphemes. LING 481 Winter 2011

Program in Linguistics. Academic Year Assessment Report

Feature-Based Grammar

"f TOPIC =T COMP COMP... OBJ

Some Principles of Automated Natural Language Information Extraction

Words come in categories

Today we examine the distribution of infinitival clauses, which can be

Backwards Numbers: A Study of Place Value. Catherine Perez

A Computational Evaluation of Case-Assignment Algorithms

Construction Grammar. University of Jena.

AN ANALYSIS OF GRAMMTICAL ERRORS MADE BY THE SECOND YEAR STUDENTS OF SMAN 5 PADANG IN WRITING PAST EXPERIENCES

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Theoretical Syntax Winter Answers to practice problems

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

End-of-Module Assessment Task

Word Stress and Intonation: Introduction

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

Language properties and Grammar of Parallel and Series Parallel Languages

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

The building blocks of HPSG grammars. Head-Driven Phrase Structure Grammar (HPSG) HPSG grammars from a linguistic perspective

Formulaic Language and Fluency: ESL Teaching Applications

A Simple Surface Realization Engine for Telugu

Citation for published version (APA): Veenstra, M. J. A. (1998). Formalizing the minimalist program Groningen: s.n.

IN THIS UNIT YOU LEARN HOW TO: SPEAKING 1 Work in pairs. Discuss the questions. 2 Work with a new partner. Discuss the questions.

INTRODUCTION TO MORPHOLOGY Mark C. Baker and Jonathan David Bobaljik. Rutgers and McGill. Draft 6 INFLECTION

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

The Acquisition of English Grammatical Morphemes: A Case of Iranian EFL Learners

Specifying a shallow grammatical for parsing purposes

DOWNSTEP IN SUPYIRE* Robert Carlson Societe Internationale de Linguistique, Mali

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Type Theory and Universal Grammar

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

The suffix -able means "able to be." Adding the suffix -able to verbs turns the verbs into adjectives. chewable enjoyable

Senior Stenographer / Senior Typist Series (including equivalent Secretary titles)

EAGLE: an Error-Annotated Corpus of Beginning Learner German

Negation through reduplication and tone: implications for the LFG/PFM interface 1

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

LEXICAL CATEGORY ACQUISITION VIA NONADJACENT DEPENDENCIES IN CONTEXT: EVIDENCE OF DEVELOPMENTAL CHANGE AND INDIVIDUAL DIFFERENCES.

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Transcription:

Exam Speech and Language Processing 1 (216631) 24 January 2006 Introduction This exam Speech and Language Processing 1 consists of 20 multiple choice questions. You may use the book Speech and Language Processing, the slides and your notes. You can earn 100 points for this exam: 5 points per question. The numbered grammar referred to in two of the multiple choice questions can be found in the final section of this document. When the time is up or when you are finished you should hand in the answer form for the multiple choice questions. Tip: First fill in your answers on this question form; check the answers when you have completed all the questions; then fill in your answers on the answer form. Success 1

Multiple choice questions 1. Which of the following regular languages is accepted by the automaton shown here? (q 0 is the start state) (a) a(ba)* a(bba)* (b) {aba, abba} (c) a(bb?a)* (d) ab(b?a)? 2. Consider the following two statements about inflection and derivation. i) In English adding the suffix -s to the end of an infinitive verb (for example, sing sings ) is a form of inflection. ii) In English adding the suffix -ism to an adjective (for example, national nationalism ) is a form of derivation. Which of these statements are true? (a) Both are true (b) None of them is true (c) Only i) is true (d) Only ii) is true 3. Consider the following to statements about morphemes and syllables. i) A morpheme can consist of several syllables. ii) A syllable can consist of several morphemes. Which of these statements are true? 2

(a) Both are true (b) None of them is true (c) Only i) is true (d) Only ii) is true 4. In Dutch, the past tense of a verb ends in de if the verb stem ends in a voiced sound (for example, voedde, fed and oliede, oiled ) and in te if the verb stem ends in an unvoiced sound (for example, zakte, failed and pestte, bullied ). We assume that the basic past tense suffix is de, and that in the step from intermediate to surface level de is changed to te after an unvoiced sound. Below you see the state-transition table for a transducer that can correctly generate the past tenses mentioned above. We use PC-Kimmo notation where 0 is the empty symbol, + is the morpheme boundary symbol, and @ is the other symbol. The symbol CU stands for unvoiced consonants. Final states are indied with a colon (:) and non-final states with a dot after the state number. State numbers start with 1; the fail state has number zero. RULE "DE/TE Replacement" 7 7 CU + # d d e @ CU 0 # t d e @ 1: 2 1 1 0 1 1 1 2: 2 3 1 0 1 1 1 3: 2 0 1 6 4 1 1 4. 2 1 1 0 1 5 1 5. 2 1 0 0 1 1 1 6. 0 0 0 0 0 7 0 7: 0 0 1 0 0 0 0 Assume we make the following changes to the transducer: - We replace the d:d transition from state 3 to state 4 with a d:d transition from state 3 to state 0, and - We replace the CU:CU transition from state 2 to state 2 with a CU:CU transition from state 2 to state 1. What will happen now? 3

(a) The transducer will now accept (and generate) the incorrect past tense form zakde (b) The transducer will now accept (and generate) the incorrect past tense form pestde (c) The transducer will now accept (and generate) both zakde and pestde (d) The transducer will still not accept (nor generate) zakde and pestde 5. A finite state automaton (FSA) accepts a regular language. A finite state transducer (FST) is an extension of a finite state automaton; it defines a translation from sequences of input symbols (a regular language) to sequences of output symbols. Finite state automata as well as finite state transducers can be non-deterministic. A finite state transducer is non-deterministic if the underlying finite state automaton (that we get by ignoring the output symbols on the transitions of the automaton) is non-deterministic. Consider the following two statements. i) For every non-deterministic FSA there is a deterministic FSA that accepts the same regular language. ii) For every non-deterministic FST there is a deterministic FST that defines the same translation. (a) Only i) is true. (b) Only ii) is true. (c) Both i) and ii) are true. (d) Both i) and ii) are false. 6. Some natural language stemming algorithm has the following two properties: 1) the words adhere and adhesion remain distinct after stemming; 2) the words experiment and experience are reduced to the same stem. Which of the following statements is true? 4

(a) 1) is an example of overstemming, 2) is an example of understemming (b) 1) is an example of understemming, 2) is an example of overstemming (c) Both 1) and 2) are examples of overstemming (d) Both 1) and 2) are examples of understemming 7. The field of phonology is about: (a) How speech sounds are actually made, transmitted and received (b) Studying all the sounds that both human and artificial voices are capable of creating (c) Studying subsets of the sounds that constitute language and meaning (d) How sounds can be organized into one system for all languages 8. Which of the following sound classifiions should not be part of this group? (a) Nasal (b) Dental (c) Velar (d) Glottal 9. Which English speech sounds does the following feature bundle refer to? [ + consonant [ - sonorant [ +/- voice [ + back (a) /m/ ( man ) and /n/ ( name ) (b) /k/ ( ) and /g/ ( goal ) (c) /p/ ( pack ) and /b/ ( ball ) (d) /f/ ( foot ) and /v/ ( verb ) 5

10. Which of the following phonetic transcriptions of Dutch words can be regarded as representative for the way the word would be normally pronounced in Dutch? (a) dichtdoen: d I x t d u n (b) lenen: l e: n @ (c) politie: p l i s i (d) herfststraal: h E r f s t s t r a: l 11. Dobby the house-elf, one of the characters in the Harry Potter books and films, has a rather typical way of speaking. For example, Dobby says things like this to Harry Potter: Dobby has to punish himself, sir Dobby has come to warn Harry Potter Harry Potter asks if he can help Dobby... These utterances are different from normal English at which linguistic level? (a) Phonology (b) Morphology (c) Syntax (d) None of the above 12. Consider the grammar below: Rules Lexicon S NP VP Prep with, in VP Verb NP (PP) Noun woods, bike NP (Det) Nom (PP) Det the PP Prep NP Verb saw Nom Noun ProperNoun John, Peter Nom ProperNoun How many parse trees does this grammar produce for the sentence John saw Peter with the bike in the woods? 6

(a) 1 (b) 2 (c) 3 (d) More than 3 13. Consider the sentence She bought a potato and some carrots when she went to the corner store. Which of the following lists of word sequences only contain constituents of this sentence? (a) She bought, a potato and some carrots (b) She, to the corner store (c) the corner store, bought a potato (d) potato, she went to the corner store 14. Which of the following feature structures does Grammar 1 (given at the end of this document) assign to the sentence A student works? subj (a) head S sub NP student [ pers 3 num sg VP works [ NP student [ pers 3 num sg 7

(b) S subj 1 NP student [ pers 3 num sg head VP works sub 1 [ pers 3 num sg (c) S subj 1 NP student [ pers 3 num sg head VP works sub [ subj 1 (d) None of the above 8

15. We want to extend Grammar 1 so that we can parse the sentence two students work but not two student work or two students works. To achieve this, which of the following ical items should we add to the icon? Det two Noun (a) sub 1 1 (b) sub Det two 1 [ Noun 1 [ num 2 (c) sub Det two [ Noun [ num pl (d) sub Det two [ num pl Noun [ pers 3 num pl 9

16. A language has 100 words. Every word w has equal probability of occurring in a sentence. For every word w i, every word w j also has equal probability of occurring after w i. What are the values of the probabilities P (w i w j ), the probability that the bigram w i w j occurs, and P (w j w i ), the probability of word w j if the preceding word is w i? (a) P (w i w j ) = 0.01 and P (w j w i ) = 0.01 (b) P (w i w j ) = 0.0001 and P (w j w i ) = 0.01 (c) P (w i w j ) = 0.01 and P (w j w i ) = 0.0001 (d) P (w i w j ) = 0.0001 and P (w j w i ) = 0.0001 17. Language XL is modelled as a random sequence of letters with the following probabilities of occurrence: a b c d e f 1/16 1/4 1/16 1/4 1/4 1/8 What is the per letter entropy of this language model? (a) 2.0 (b) 2.375 (c) 3.0 (d) Neither 2.0, 2.375 or 3.0 18. Good-Turing estimators use this equation to calculate the probability of seeing word X, having seen a corpus: with: P (X corpus) = r N r = (r + 1) E(N r+1) E(N r ) where r is the number of times you ve seen word X, N r is the number of different words that were seen exactly r times, and the E() means 10

you re trying to estimate what N r would normally be, for an infinite corpus of an infinite language. N is the total number of counts, and r is the adjusted number of observations: which is how many times you should have seen that word (which is often a fraction). A Very Simple Form of Good-Turing Estimation takes as function E() the identity function: E(n) = n. A corpus has 30000 words. The word unusualness occurs once. There are 10000 words that occur exactly once. There are 3000 words that occur exactly twice in the corpus. What is the estimated probability of the word unusualness if we use the Very Simple Good-Turing Estimation method? (a) 1/30000 (b) 1/10000 (c) 2/100000 (d) other value 19. Consider the following two statements. i) The sum of the re-estimated Simple Good-Turing probabilities of all the words in the corpus is exactly one. ii) The Very Simple Good-Turing Estimation method (see previous exercise) has a major drawback: it may assign probability zero to some words, namely if by chance for some value of r there are no word types that occur exactly r times. (a) Only i) is true. (b) Only ii) is true. (c) Both i) and ii) are true. (d) Both i) and ii) are false. 20. Consider the following context-free grammar, with N oun, Det and V erb as Part of Speech symbols. The words John and Mary have only Part of Speech P ropnoun, the words walks and sleeps have only Part of Speech V erb, and the word and has only Part of Speech Conj. 11

S S Conj S S NP VP NP Det Nom NP PropNoun VP V NP VP V We use Earley s Recognizer (see J&M Figure 10.16, page 381) to check whether the sentence Mary walks and John sleeps is correct according to this grammar. Constructing Chart[0 we start with the initial item [γ S; 0 and we add as many different items to the chart as possible. Then we construct Chart[1 also adding as many different items as possible. And so on. What is the number of items Chart[1 will eventually have according to this algorithm? (a) 5 (b) 6 (c) 7 (d) 8 12

Grammar 1 Rules: S VP NP NP VP <S subj> = <NP> <S head> = <VP> <VP sub subj> = <NP> Verb <VP > = <Verb > <VP > = <Verb > <VP sub> = <Verb sub> Det Noun <NP > = <Noun > <NP > = <Det > <Det sub> = <Noun> Lexicon: Noun student [ pers 3 num sg Noun students [ pers 3 num pl sub Verb work [ subj [ NP [ num pl 13

sub Verb works subj NP [ pers 3 num sg sub Det a 1 [ Noun 1 [ num sg sub 1 Det the [ Noun 1 14