CS474 Introduction to Natural Language Processing Final Exam December 15, 2005

Similar documents
Parsing of part-of-speech tagged Assamese Texts

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

CS 598 Natural Language Processing

Grammars & Parsing, Part 1:

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

The Smart/Empire TIPSTER IR System

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

CS Machine Learning

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Compositional Semantics

Analysis of Probabilistic Parsing in NLP

AQUA: An Ontology-Driven Question Answering System

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Using dialogue context to improve parsing performance in dialogue systems

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Some Principles of Automated Natural Language Information Extraction

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Prediction of Maximal Projection for Semantic Role Labeling

Natural Language Processing. George Konidaris

Probabilistic Latent Semantic Analysis

Accurate Unlexicalized Parsing for Modern Hebrew

Proof Theory for Syntacticians

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

A Bayesian Learning Approach to Concept-Based Document Classification

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Ensemble Technique Utilization for Indonesian Dependency Parser

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

arxiv: v1 [cs.cl] 2 Apr 2017

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Linking Task: Identifying authors and book titles in verbose queries

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

A Graph Based Authorship Identification Approach

Cross Language Information Retrieval

Language Independent Passage Retrieval for Question Answering

A Case Study: News Classification Based on Term Frequency

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Lecture 10: Reinforcement Learning

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Switchboard Language Model Improvement with Conversational Data from Gigaword

BYLINE [Heng Ji, Computer Science Department, New York University,

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Developing a TT-MCTAG for German with an RCG-based Parser

Context Free Grammars. Many slides from Michael Collins

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Rule-based Expert Systems

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

The Discourse Anaphoric Properties of Connectives

Applications of memory-based natural language processing

The stages of event extraction

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Replace difficult words for Is the language appropriate for the. younger audience. For audience?

Accuracy (%) # features

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Guidelines for Writing an Internship Report

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

The Interface between Phrasal and Functional Constraints

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Chapter 4: Valence & Agreement CSLI Publications

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Control and Boundedness

Major Milestones, Team Activities, and Individual Deliverables

Extracting and Ranking Product Features in Opinion Documents

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

LTAG-spinal and the Treebank

Using Semantic Relations to Refine Coreference Decisions

Shockwheat. Statistics 1, Activity 1

SEMAFOR: Frame Argument Resolution with Log-Linear Models

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Developing Grammar in Context

5 th Grade Language Arts Curriculum Map

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Adapting Stochastic Output for Rule-Based Semantics

Speech Recognition at ICSI: Broadcast News and beyond

"f TOPIC =T COMP COMP... OBJ

Senior Stenographer / Senior Typist Series (including equivalent Secretary titles)

Specifying Logic Programs in Controlled Natural Language

Software Maintenance

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA

Corpus Linguistics (L615)

Distant Supervised Relation Extraction with Wikipedia and Freebase

The Role of the Head in the Interpretation of English Deverbal Compounds

Specifying a shallow grammatical for parsing purposes

Construction Grammar. University of Jena.

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011

A Computational Evaluation of Case-Assignment Algorithms

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Lecture 1: Machine Learning Basics

Transcription:

Name: CS474 Introduction to Natural Language Processing Final Exam December 15, 2005 Netid: Instructions: You have 2 hours and 30 minutes to complete this exam. The exam is a closed-book exam. # description score max score 1 Parsing with PCFGs / 25 2 Bottom-up Chart Parsing / 10 3 Partial Parsing/Question ing / 30 4 Inference / 15 5 The Grab Bag / 20 Total score: / 100 1

1 Parsing with PCFGs (25 pts) (a) (3 pts) A sentence can easily have more than one parse tree that is consistent with a given CFG. How do PCFGs and non-probability-based CFGs differ in terms of handling parsing ambiguity? PCFG parsers resolve ambiguity by preferring constsituents (and parse trees) with the highest probability. Consider the following PCFG for problems (b)-(e). production rule probability S VP 1.0 VP Verb NP 0.7 VP Verb NP PP 0.3 NP NP PP 0.3 NP Det Noun 0.7 PP Prep Noun 1.0 Det the 0.1 Verb Cut Ask Find... 0.1 Prep with in... 0.1 Noun envelope grandma scissors men suits summer... 0.1 (b) (3 pts) Draw the top-ranked parse tree for the sentence below by applying the given PCFG. Does the result seem reasonable to you? Why or why not? Cut the envelope with scissors. The top-ranked sentence structure is shown in figure 1. (The leaf nodes representing words are omitted.) The probability of the resulting parse tree is 1.0 0.3 0.7 1.0 2

(0.1) 5, which is larger than 1.0 0.7 0.3 0.7 1.0 (0.1) 5, the probability of the alternative parse tree (with the [VP Verb NP] rule expansion). Semantically, with scissors should attach to the verb, hence the resulting parse tree is a reasonable one. (c) (3 pts) Draw the top-ranked parse tree for the sentence below by applying the given PCFG. Does the result seem reasonable to you? Why or why not? Ask the grandma with scissors. The top-ranked sentence structure is the same as for part (c). Semantically, with scissors should attach to the noun phrase, hence the resulting parse tree is not a reasonable one. (d) (5 pts) Describe how you would lexicalize the given PCFG in order to address the problem you hopefully noticed in (b) and/or (c). Then show specifically how the production rules below should be modified according to your lexicalization scheme. production rule probability VP Verb NP 0.7 VP Verb NP PP 0.3 Lexicalization of production rules can capture lexical specific preference of certain rule expensions. In order to mitigate the sparse data problem, we will lexicalize with respect to the head word of the left hand side of each production rule, instead of all nonterminals in each production rule. In particular, the rules expending from VP should be modified as production rule probability VP (x) Verb NP (p x ) VP (x) Verb NP PP (q x ) where x {Cut, Ask, Find,... }, def p x = P( VP (x) Verb NP VP, x), def q x = P( VP (x) Verb NP PP VP, x), and p x + q x =1. 3

Comment Because we didn t restrict lexicalization to head words of the right hand side of rules, it is okay to propose lexicalized PCFGs in many different ways; in particular, you don t have to condition on the head word, you can condition on the entire combination of words for all nonterminals, as long as you made it clear what you are conditioning on, although it would be much less practical. (e) (5 pts) The following two sentences exhibit parsing ambiguities. How would your lexicalized PCFG from (d) handle these ambiguities? Find the men in suits. Find the men in summer. Notice that the head word for any other node in a parse tree except the node for the last word is identical in both sentences. Therefore, the conditional probability of each node for a particular rule expension is identical in both sentences except the node for the last word. However, the last word in both given sentences do not control which rule expension to be used on the ancestor s nodes. Hence the exact same parse tree will be chosen by PCFGs, even though the prepositional phrase in the first sentence should attach to the noun phrase, and the prepositional phrase in the second sentence should attach to the verb phrase. (Although it is not impossible to do it the other way, but it would sound less sensible.) Which attachment to be chosen will depend on the actual value of P( VP (Find) Verb NP VP, Find ) and P( VP (Find) Verb NP PP VP, Find ). In summary, the head word lexicalization does not solve all ambiguities, as shown in the given sentences. Comment If your proposal in (d) didn t condition on head words of the right hand side of rules, you might have a chance to have a different conclusion here, depending on how exactly you chose the set of words to condition on. However, unless you somehow invented a clever way to condition on the entire words for one nonterminal PP, or unless you have changed the definition of head word, you probably end up encountering the same problem as above. Read for problems (f)-(g): One problem with a lexicalized PCFG is that some (perfectly reasonable) words might never show up in the training data for certain production rules. This results in rules with a probability of 0. (f) (3 pts) Describe why production rules with zero probability are problematic. 4

If a production rule has a zero probability, then the parse tree derived from that production rule will have to have a zero probability also. However, a production rule may have a zero probability not because it is invalid, but because the particular production rule has not been observed in the training data. PCFGs in this case will not be able to return the correct parse tree involving an unseen rule. (g) (3 pts) Describe one method to avoid zero probabilities for lexicalized PCFGs. Smoothing techniques from the language models can be similarly applied here. One simple method would be assigning a minimum count 1 for all possible lexicalized rules. (In order to make it a proper probability, we will need to augment the probabilty values collected from the training data when we see the new test data by renormalizing them.) 5

2 Bottom-up Chart Parsing (10 pts) Given the grammar and lexicon below, show the final chart for the following sentence after applying the bottom-up chart parser. Remember that the final chart contains all edges added during the parsing process. You may use either the notation from class (i.e. nodes/links) or the notation from the book to depict the chart. S VP VP Verb NP NP NP PP NP Det Noun PP Prep Noun Find the men in suits. Det the Verb Find Prep in Noun men suits 6

S VP VP Verb NP PP VP Verb NP VP Verb NP NP NP PP NP Det Noun PP Prep Noun Find the men in suits 0 1 2 3 4 5 Verb Det Noun Prep Noun VP Verb. NP PP Prep. Noun VP Verb. NP PP NP Det. Noun NP NP. PP VP Verb NP. PP 7

3 Partial Parsing / Question ing (30 pts) Consider the following article for problems (a) - (e). [From product reviews for various computer peripherals.] I bought my wireless keyboard/mouse set several months ago, and, like a lot of new products, it has some unanticipated issues. On the plus side, obviously, is the styling. The design is fresh, clean, and interesting. The keyboard can tilt at different angles, which was important because I had some difficulty typing with it flat. The bluetooth receiver in the charger was functional, and I appreciated having a bluetooth hub for my cellphone. The mouse and the keyboard have both proved durable and reliable despite a number of mishaps. In regards to the software, there are some real issues. When the mouse powers down to save battery life there is a second or two of lag before it reconnects with the receiver. I found this really annoying to deal with every time I stepped away from my desk for ten or fifteen minutes. Also, during system startup when the bluetooth software has yet to initalize, both the keyboard and the mouse are useless. This made it impossible to do any kind of pre-windows-startup tasks such as F8 for windows configuration. I suspect this is a result of how bluetooth interacts with the OS and bios, but whatever the cause, it was, for me, a deal-breaker. (a) (5pts) Mark or draw the output of a partial parser for the following sentence, stating any necessary assumptions. The bluetooth receiver in the charger was functional, and I appreciated having a bluetooth hub for my cellphone. [The bluetooth receiver] np in [the charger] np was functional, and I appreciated having [a bluetooth hub] np for [my cellphone] np. Comment There can be different correct answers depending on the definition of constituents. (b) (5 pts) State two advantages of partial parsers over parsers that provide in-depth syntactic information. 8

First, partial parsers can be more robust than the regular parsers, because partial parsers work on easier tasks. Second, for some NLP applications such as information extraction, information derived from partial parsers can be more relevant than that from regular parsers. (c) (5 pts) Consider a closed domain QA system for the domain of the above text, i.e. product reviews of computer peripherals. Assume that the QA system uses a simple TFIDF-based information retrieval method to identify documents and sentences that contain the answer to the input question. Assume also that the QA system only has access to the above document, i.e. the above document is the only document in the collection. (Yes, we know that this is not a reasonable assumption.) Devise one reasonable wh-question (i.e. who, what, where, when, why) that has an answer in the document but that the QA system would not be able to answer sensibly. Explain why the question is difficult for the system. Fall 2006 students: we did not cover TFIDF-based IR methods. They represent each document and query as a vector indicating the presence or absence of each word in the language (minus stopwords), and then compute similarity between a document and a query by computing the cosine of the angle between the two vectors. In addition, words that appear frequently across the entire corpus receive small weights; words that appear frequently in a document receive high weights. This isn t the whole story, but is enough to let you think about answering the question. No answer yet... (d) (7 pts) Now suppose a closed domain QA system that has access to a large number of product reviews for various computer peripherals. Assume the possible questions for the QA system are limited to the following two types of questions. What features of product X are buyers satisfied with? What features of product X are buyers dissatisfied with? Since the types of questions are restricted, we can design predictive annotations to assist the question answering system. Describe a set of useful predictive annotation types for this restricted question answering task. Then annotate one sentence from the article according to your annotation scheme. Fall 2006 students: we did not cover predictive annotation. No answer yet. 9

(e) (8 pts) Suppose that you have convinced your friends to annotate 500 documents per your definition of predictive annotations given in (d). Once the 500 documents are annotated, one can use them to train a supervised machine learning algorithm to automatically annotate many more documents (and thereby avoid losing one s friends who have become increasingly unwilling to help with the manual annotations). Select one of your predictive annotation types from (d). Explain step-by-step how you would go about the task of training a learning algorithm to automate this type of annotation. Be sure to define your learning task and to describe a reasonable set of features. Fall 2006 students: we did not cover predictive annotation. No answer yet. 10

4 Inference (10 pts) Consider the following article for this problem. [This is just the first paragraph from the previous question s text.] I bought my wireless keyboard/mouse set several months ago, and, like a lot of new products, it has some unanticipated issues. On the plus side, obviously, is the styling. The design is fresh, clean, and interesting. The keyboard can tilt at different angles, which was important because I had some difficulty typing with it flat. The bluetooth receiver in the charger was functional, and I appreciated having a bluetooth hub for my cellphone. The mouse and the keyboard have both proved durable and reliable despite a number of mishaps. For each of inferences (a) through (d) below, 1. state whether the inference depends on the discourse context, knowledge about actions, and/or general world knowledge; and 2. describe what natural language processing techniques, if any, might enable a system to make the inference automatically. (a) The reviewer owns the keyboard. (b) The charger is part of the keyboard. (c) The reviewer had difficulty typing with the keyboard. (d) The reviewer likes the keyboard. 11

5 Grab Bag (20 pts) (a) (4 pts) (True or False. Explain your answer.) Information extraction is harder than text categorization. Fall 2006 students: we did not cover information extraction. (b) (6 pts) Briefly describe the key differences between Autoslog-TS and Autoslog. Fall 2006 students: we did not cover this. Autoslog-TS is largely unsupervised. It does not require annotations, but instead, requires two sets of documents: relevant and not relevant. After extracting every NP from the texts, it selects patterns by relevance rate and frequency. (c) (4 pts) (True or False. Explain your answer.) 4-grams are better than trigrams for part-of-speech tagging. False. There is not generally enough data for 4-grams to outperform trigrams. (d) (6 pts) Noun phrase coreference resolution includes pronoun resolution, proper noun resolution, and common noun resolution. Which of the three would you expect to be the most difficult to handle computationally? Explain why. Common noun is the hardest, because there can be drastically broad way of coreferring the same entity. The variety of proper noun and pronoun coreference patterns are relatively much narrower. 12