Disambiguating Levin Verbs Using Untagged Data

Similar documents
A Case Study: News Classification Based on Term Frequency

The Choice of Features for Classification of Verbs in Biomedical Texts

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Proceedings of the 19th COLING, , 2002.

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Linking Task: Identifying authors and book titles in verbose queries

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Prediction of Maximal Projection for Semantic Role Labeling

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Handling Sparsity for Verb Noun MWE Token Classification

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

AQUA: An Ontology-Driven Question Answering System

Using dialogue context to improve parsing performance in dialogue systems

Applications of memory-based natural language processing

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Probabilistic Latent Semantic Analysis

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

The MEANING Multilingual Central Repository

The stages of event extraction

The Smart/Empire TIPSTER IR System

Learning Computational Grammars

Lecture 1: Machine Learning Basics

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Unsupervised and Constrained Dirichlet Process Mixture Models for Verb Clustering

A Bayesian Learning Approach to Concept-Based Document Classification

Search right and thou shalt find... Using Web Queries for Learner Error Detection

Multilingual Sentiment and Subjectivity Analysis

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Developing a TT-MCTAG for German with an RCG-based Parser

Which verb classes and why? Research questions: Semantic Basis Hypothesis (SBH) What verb classes? Why the truth of the SBH matters

Memory-based grammatical error correction

An Interactive Intelligent Language Tutor Over The Internet

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

Word Segmentation of Off-line Handwritten Documents

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Reducing Features to Improve Bug Prediction

Short Text Understanding Through Lexical-Semantic Analysis

Switchboard Language Model Improvement with Conversational Data from Gigaword

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano

Ensemble Technique Utilization for Indonesian Dependency Parser

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Graph Alignment for Semi-Supervised Semantic Role Labeling

Accurate Unlexicalized Parsing for Modern Hebrew

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

A heuristic framework for pivot-based bilingual dictionary induction

Combining a Chinese Thesaurus with a Chinese Dictionary

Disambiguation of Thai Personal Name from Online News Articles

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain

The Ups and Downs of Preposition Error Detection in ESL Writing

A Statistical Approach to the Semantics of Verb-Particles

Unsupervised Learning of Narrative Schemas and their Participants

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Parsing of part-of-speech tagged Assamese Texts

BYLINE [Heng Ji, Computer Science Department, New York University,

TextGraphs: Graph-based algorithms for Natural Language Processing

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

arxiv: v1 [cs.cl] 2 Apr 2017

Lexical category induction using lexically-specific templates

Using Semantic Relations to Refine Coreference Decisions

Using computational modeling in language acquisition research

Distant Supervised Relation Extraction with Wikipedia and Freebase

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Word Sense Disambiguation

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Some Principles of Automated Natural Language Information Extraction

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

The Role of the Head in the Interpretation of English Deverbal Compounds

A Bootstrapping Model of Frequency and Context Effects in Word Learning

Matching Similarity for Keyword-Based Clustering

A Comparison of Two Text Representations for Sentiment Analysis

THE VERB ARGUMENT BROWSER

arxiv:cmp-lg/ v1 22 Aug 1994

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

Transcription:

Disambiguating Levin Verbs Using Untagged Data Jianguo Li Department of Linguistics The Ohio State University Columbus, Ohio, USA jianguo@ling.ohio-state.edu Chris Brew Department of Linguistics The Ohio State University Columbus, Ohio, USA cbrew@ling.ohio-state.edu Abstract Lapata and Brew [8] (hereafter LB04) obtain from untagged texts a statistical prior model that is able to generate class preferences for ambiguous Levin [9] verbs. They also show that their informative priors, incorporated into a Naive Bayesian classifier deduced from handtagged data, can aid in verb class disambiguation. We re-examine the parameter estimation of LB04 s prior model and identify the only parameter in LB04 s prior model that determines the predominant class for a particular verb in a particular frame. In addition, we propose a method for training our classifier without using hand-tagged data. Our experiments suggest that although our verb class disambiguator does not match the performance of the ones that make use of hand-tagged data, it consistently outperforms the random baseline model. Our experiments also demonstrate that the informative priors derived from untagged texts help improve the performance of the classifier trained on untagged data. Keywords Keywords: lexical semantics, verb class disambiguation, Levin verb class, informative priors, untagged corpus, Naive Bayesian classifier 1 Introduction Much research in lexical acquisition has concentrated on verb classification [18, 11, 14, 7]. Many scholars hypothesize that the meaning of a verb determines to a large extent its syntactic behavior, particularly the realization and interpretation of its arguments, and therefore base their verb classification on the relation between verbs and their arguments [5, 4, 9, 16]. Such classifications can capture generalizations over a range of linguistic properties, and therefore can be used as a means of reducing redundancy in the lexicon and for filling gaps in lexical knowledge. Much of the work on verb classification in NLP has adopted the classification proposed by Levin [9]. Levin [9] argues that verbs that exhibit the same diathesis alternation can be assumed to share certain semantic components and to form a semantically coherent class. Applying this observation inductively, one can use surface syntactic cues to infer a verb s semantic class. In this paper, we focus on the task of verb classification for ambiguous Levin verbs (verbs belonging to two or more Levin classes). To be exact, given a verb in a particular frame, we want to assign it to one of its possible Levin classes. As noted by Lapata and Brew [8], this is a wide-spread and important disambiguation problem. Levin s verb inventory covers 3,024 verbs. Although only 784 verbs are polysemous verbs, the total frequency of polysemous verbs in the British National Corpus (BNC) is comparable to the total frequency of monosemous verbs (48.4%:51.6%). Consider the verb call in the following two sentences: 1. He called me a fool. 2. He called me a taxi. The verb call is ambiguous between the class of DUB and GET when occurring in the double object frame. We want to automatically identify call as a DUB verb in sentence (1) and a GET verb in sentence (2). The verb class of a particular verb token provides a significant amount of information about the verb. At semantic level, for example, knowing a token s verb class helps determine the thematic role of its arguments [14, 19] and at the syntactic level, it indicates what subcategorization frames and alternations are allowed [9, 6]. Word sense disambiguation (WSD) is usually cast as a problem in supervised learning, where a word class disambiguator is induced from hand-tagged data. The context within which the ambiguous word occurs is typically represented by a set of more or less linguistically-motivated features from which a learning algorithm induces a representative model that performs the disambiguation task. One classifier that has been extensively used is the Naive Bayesian classifier. A Naive Bayesian classifier usually consists of two parts: the prior and the posterior. Lapata and Brew [8] estimate an informative prior over Levin verb classes for a given verb in a given frame, training on untagged texts. Their prior model is able to generate a class preference for an ambiguous verb. Consider the verb call again. It is ambiguous between the class of DUB and GET when occurring in doubleobject frame. Their prior model predicts DUB to be the predominant class. The model s outcome is considered correct since hand-tagged corpus tokens also reveal a preference for the class DUB. To compute the posterior probability, LB04 uses contextual features (e.g. word collocation) extracted from a small handtagged corpus. Their experiments demonstrate that the informative priors obtained from untagged texts helps achieve improved disambiguation performance.

The major contribution of LB04 is that it highlights the importance for WSD of a suitable prior derived from untagged text: A prior model derived from untagged texts can help find with reasonable accuracy the predominant sense of a given target ambiguous word. Knowing the predominant sense of a target ambiguous word is valuable as the first sense heuristics which usually serve as a baseline for supervised WSD systems outperform many of these systems which take the surrounding contexts into account. McCarthy et al. [12] have recently also demonstrated the usefulness of a prior in WSD. They use parsed data to find distributionally similar words to the target ambiguous word and then use the associated similarity scores to discover the predominant sense for that target word. One benefit of both LB04 and McCarthy s method is that the predominant senses can be derived without relying on hand-tagged data, which may not be available for every domain and text type. This is important because the frequency of the senses of words depends on the genre and domain of the text under consideration. A prior model derived from untagged texts can also help improve the performance of a classifier over a uniform prior. This is exactly what is shown in LB04. However, although the informative priors in LB04 are derived from untagged texts, the posteriors are deduced from hand-tagged data. Using hand-tagged data to derive the posterior probability assumes the existence of such a corpus. However, if there is a hand-tagged corpus, then an empirical prior can be derived from such a corpus. We would expect a prior obtained from a hand-tagged corpus to be more accurate, therefore when combined with some contextual features should yield better performance. In this paper, we want to evaluate the usefulness of priors derived from a large unlabelled corpus or a small handlabelled corpus. Two experiments are conducted in this paper. First, we examined the estimation of LB04 s prior model because we suspected that some of its parameters are irrelevant to the ultimate outcome of the decision process. This examination confirmed our suspicion. We identified the only parameter that determines the predominant class and reformulated LB04 s prior model accordingly. Our reformulation shows that LB04 s prior model ignores the identity of individual verbs in determining the predominant class for a particular verb. We implemented LB04 s prior model using data parsed by two different full parsers [2, 1]. Second, we proposed a new way to train the verb disambiguator without relying on a hand-tagged corpus. More precisely, we used examples containing unambiguous verbs in a particular verb class as the training data for the ambiguous ones in that class. In doing so, both our informative priors and posteriors are obtained without using hand-tagged data. This method is available even if we are dealing with an unusual text type. We also tested the usefulness of our informative priors in aiding verb class disambiguation. 2 Experiment 1: The Prior Model 2.1 LB04 s Prior Model LB04 s prior model views the choice of a class c for a polysemous verb v in a given frame f as a maximization of the joint probability P(c, f, v), where v is a verb subcategorizing for the frame f with Levin class c: P(c, f, v) = P(v)P(f v)p(c v, f) (1) The estimation of P(c v, f) relies on the frequency of F(c, v, f), which could be obtained if a parsed corpus annotated with semantic class information were available. Without such a corpus, LB04 assumes that the semantic class determines the subcategorization patterns of its members independently of their identity: P(c v, f) P(c f) (2) By applying Bayes rule, P(c f, v) is rewritten as P(c f) = P(f c)p(c) P(f) Substituting (3) into (1), LB04 expresses P(c, f, v) as P(c, v, f) = P(v)P(f v)p(f c)p(c) P(f) (3) (4) 2.2 Examination of LB04 s Parameter Estimation To estimate P(c, f, v), LB04 has to estimate five parameters: P(v), P(f v), P(f), P(f c) and P(c), as shown in (4). However, for a given verb v in a given frame f, the value of P(v), P(f v) and P(f) do not vary with the choice of the class c. If we are only interested in knowing which class c is the predominant class for a given verb in a given frame, we could simply ignore them. Therefore, it is the value of P(f c) and P(c) that determines the predominant class for the verb. According to LB04, P(f c) and P(c) are estimated as P(f c) = P(c) = F(f, c) F(c) (5) F(c) Pi F(ci) (6) With (5) and (6), the value that determines the predominant class for a given verb in a given frame is calculated as F(f, c) F(c) P F(c) F(f, c) = F(ci) Pi F(ci) (7) i The value of the denominator i F(c i) is only a normalizing constant to ensure that we have a probability function. Again, if we are simply interested in which class is the predominant class for a given verb in a given frame, we can ignore it. It turns out that

Rank Class F(Class,V-NP) 1 CONT. LOCATION 70,471 2 ADMIRE 66,352 3 HURT 12,730 4 WIPE MANNER 10,294 5 ASSESS 9,872 6 PUSH-PULL 9,828 Table 1: Frequency of six classes with V-NP 1. Equal Distribution: dividing the overall frequency of a verb by the number of classes it belongs to: P(c v) = 1 classes(v) (11) 2. Unequal Distribution: distributing a verb s frequency unequally according to class size: P(c amb class) = c Pc amb class c (12) F(f, c) is the only value that determines the predominant class. According to LB04, F(f, c) is obtained by summing over all occurrences of verbs that are members of class c and attested in the corpus with frame f: F(f, c) = X i F(c,f, v i) (8) For monosemous verbs, F(c, f, v) reduces to the number of times these verbs have been attested in the corpus with a given frame. For polysemous verbs, F(c, f, v) is obtained by dividing the frequency of a verb with the given frame by the number of classes that the verb belongs to when occurring in the given frame. Note that our reformulation of LB04 does not result in a different model. All we did is getting rid of the parameters of LB04 s model that are irrelevant to the decision regarding the predominant class of a verb v in a frame f. Note two facts about the model from LB04: First, due to the independence assumption, the only parameter that matters for the prior model is F(c, f). The identity of a given verb is totally irrelevant. In other words, for a verb v that is ambiguous between class c 1 and c 2 in a given frame f, the predominant class for the verb v is c 1 if F(c 1,f) is greater than F(c 2,f) and c 2 otherwise. Table 1 ranks six verb classes according to their frequency of occurring with the transitive frame. For a verb that is ambiguous between any two of the classes listed in Table 1 when occurring in the transitive frame, the preferred class is determined by the rank of the class in the table. For example, both miss and support are ambiguous between the class AD- MIRE and CONT. LOCATION when occurring in the transitive frame. Since F(CONT. LOCATION, V-NP) is greater than F(ADMIRE, V-NP), the model selects CONT. LOCATION as the predominant class for both verbs. However, the prevalence in the manually annotated corpus data (BNC) suggests that CONT. LOCA- TION is the preferred class for miss while ADMIRE is the preferred class for support. The independence assumption makes it impossible for the model to select the right preferred class for both miss and support. The second fact about LB04 s prior model is that without our reformulation, LB04 has to estimate F(c): F(c) = X i F(c,v i) (9) F(v, c) = F(c)P(c v) (10) LB04 proposes two ways to estimate the value of P(c v): LB04 shows that in selecting the predominant class for a verb in a given frame, for the 34 ambiguous verbs with the transitive frame, its prior model is about 6% better using the equal distribution for the estimation of F(c) than using the unequal distribution. According to our reformulation of its prior model, the value of F(c) is totally irrelevant in choosing the predominant class for a verb. There should be no difference in the performance of the prior model between using equal and unequal distribution to estimate F(c). 2.3 Experiments on the Prior Model 2.3.1 Methodology LB04 used a parsed version of the whole BNC made with GSearch [3], a tool that facilitates the search of arbitrary part-of-speech-tagged corpora for shallow syntactic patterns. It used a chunk grammar for recognizing the verbal complex, NPs and PPs, and applied GSearch to extract tokens matching frames specified in Levin. A set of linguistic heuristics were applied to the parser s output in order to filter out unreliable cues. Our implementation used two sets of frames acquired from the whole BNC using two different statistical parsers. (1) We parsed the whole BNC with Charniak s parser. (2) In addition, we obtained the frame set from Schulte im Walde [18]. This frame set was acquired from the whole BNC using a head-entity parser described in Carroll and Rooth (1998) (hereafter CR). We implemented LB04 s prior model (based on our reformulation) using these two separate sets of frames. We obtained test data from LB04. This test data, summarized in Table 2, consists of 5,078 ambiguous verb tokens involving 64 verb types and 3 frame types 1. It includes verbs with double object frame (V-NP- NP) (3.27 average class ambiguity), verbs with dative frame (V-NP-PP(to)) (average 2.94 class ambiguity) and verbs with transitive frame (V-NP) (2.77 average class ambiguity). 2.3.2 Results of the Prior Model s Performance We report the results of our implementation of LB04 using accuracy by verb type. This accuracy is the percentage of verb types for which the prior model correctly selects the predominant class. The outcome 1 The test data we used here is not identical to that used in LB04. It has undergone both additional corrections and systematic adjustments before being released to us.

Frame Number of Verb Types V-NP-NP 12 V-NP-PP(to) 16 V-NP 34 Table 2: Test data of the prior model is considered correct if the class selected by the prior model agrees with the most frequent class found in the hand-tagged corpus. Table 3 provides a summary of the results for our implementation of LB04 s prior model. We also computed a baseline by randomly selecting a class out of all the possible classes for a given verb in a particular frame 2. Parser Charniak CR LB04 53.2% 56.4% Baseline 39.7%±0.01 Table 3: Type accuracy for the prior model Table 3 shows that our reformulation of LB04 s prior model achieves a better performance (using either set of frame frequency) than the baseline. However, our results are lower than that reported in LB04. LB04 s prior model achieves an accuracy of 74.6%. This may be due to the different test data we used and different parsers we used to obtain frame frequency. 3 Experiment 2: Verb Class Disambiguation Using Untagged Texts 3.1 Motivation Recall that LB04 derives the informative priors from untagged texts, but the posteriors from a hand-tagged corpus. In this experiment, we attempt to address this weakness of LB04 s method: LB04 does not compare the performance of the Naive Bayesian classifier between using the informative priors derived from untagged texts (IPrior) and the empirical priors derived from a hand-tagged corpus (EPrior). It would be helpful to know if an IPrior outperforms an EPrior estimated from a very small hand-tagged corpus. LB04 derives IPrior from untagged texts. If the posteriors can also be deduced without using hand-tagged data, it will free us from our dependence on handtagged data for disambiguating Levin verbs. As noted above, Levin [9] has classified verbs according to their syntactic behavior. Verbs that show similar diathesis alternation are assumed to share certain semantic components and to form a coherent semantic class. Neighboring words are not taken into consideration in her verb classification. On the other hand, many scholars have shown that words with similar contextual features, typically neighboring words, are also semantically similar [17, 10]. Faced with these two dif- 2 We replicated this random selection 100 times and the result reported in Table 3 was obtained by averaging the results on the 100 selections. class ambiguous verbs unambiguous verbs anoint, baptize, brand, christen DUB call, make consecrate, crown, decree, dub vote name, nickname, pronounce, rule stamp, style, term book, buy, cash, catch charter, choose, earn, fetch GET call, find gain, gather, hire, keep leave, vote order, phone, pick, pluck procure, pull, reach, rent reserve, save, secure, shoot slaughter, steal, win Table 4: DUB and GET class ferent approaches to identifying semantically similar words, we may ask the following two questions: Are the semantic components shared by verbs in a Levin class correlated with their contexts words? Can we use the context words of the unambiguous verbs in a particular Levin class to disambiguate the ambiguous verbs in that class? To perform verb class disambiguation without relying on a hand-tagged corpus, we decided to train our verb class disambiguator using only data containing unambiguous verbs. Consider the verb call again, it is ambiguous between the class of DUB and GET when occurring in the double-object frame. However, most verbs in these two classes are not ambiguous, as shown in Table 4. For an unambiguous verb, we know for sure the class it belongs to without even examining the sentences in which it occurs. To disambiguate call in a double-object frame, we therefore used all sentences that are identified as double object frame and contain an unambiguous verb in the class DUB as the training data for the class DUB and did the same for the class GET. 3.2 Constructing Training Data We picked all the example sentences containing the relevant unambiguous verbs from Charniak-parsed BNC that are identified as double-object frame, transitive frame or dative frame. We understand that the training data constructed this way is noisy in that some false instances of the target frames are included in the training data. For example, a sentence like I fed the boy myself is incorrectly recognized as a double-object frame. Thus the training data we used may potentially have a negative effect on the verb class disambiguator. 3.3 Classifier and Feature Space 3.3.1 A Naive Bayesian Classifier We employed a Naive Bayesian classifier for our disambiguation task. Although the Naive Bayesian classifier is simple, it is quite efficient and has shown good performance on WSD. Another reason for using a Naive Bayesian classifier is that it is easy to incorporate the prior information. Within a Naive Bayesian approach, the choice of the predominant class for an ambiguous verb v when occurring in a frame f given its context can be expressed as

C(v, f) = argmax ci [P(c i, f, v) n P(a 1,..., a n c i, f, v)] (13) i=1 Where C(v,f) represents the predominant class for an ambiguous verb v when occurring in a frame f. P(c i, f, v) is the prior probability of the ambiguous verb v belonging to class c i when occurring in frame f and n i=1 P(a 1,..., a n c i, f, v) is the posterior probability. 3.3.2 Feature Space As common in WSD, we used as features the neighboring words of a target ambiguous verb. We considered 8 different window sizes: L1R1, L1R2, L1R3, L1R4, L2R1, L2R2, L2R3 and L2R4. A window size such as L1R2 represents one word to the left and two words to the right of an ambiguous verb. Neighboring words are lemmatized using the English lemmatizer described in [15]. 3.4 Results and Discussion We used the same test data from the first experiment. We compare the performance of six different models. They differ from each other in whether the priors are derived from hand-tagged data and whether the classifier is trained on hand-tagged data. Prior IPrior: The informative priors derived from untagged texts as described in experiment 1. EPrior: The empirical priors derived from handtagged data. In our experiment, the empirical priors are derived from the test examples. UPrior: The uniform priors. Classifier NHTD: The classifier is trained without using hand-tagged data. In our experiment, the training data consists of all the examples containing only unambiguous verbs. The classifier is tested on all test examples. HTD: The classifier is trained on hand-tagged data. In our experiment, the classifier is trained and tested using 10-fold cross-validation on the test examples. The six models we experimented with are as follows: UPrior+NHTD, IPrior+NHTD, EPrior+NHTD, UPrior+HTD, IPrior+HTD and EPrior+HTD 3. For the purpose of comparison, we also report the performance of three different baseline models: Random Baseline (RB): We randomly selected a class from all those that are compatible with the given verb and frame. Selection was based on a unifrom distribution. 3 We also estimated a prior from the unambiguous examples only, but its performance is about the same as the IPrior. model average accuracy highest accuracy UPrior+NHTD 58.1% 64.8%(L1R3) IPrior+NHTD 62.3% 68.8%(L1R4) EPrior+NHTD 64.1% 71.0%(L2R4) UPrior+HTD 64.2% 72.3%(L1R4) IPrior+HTD 64.9% 73.9%(L2R4) EPrior+HTD 68.9% 77.4%(L2R4) RB 37.9% IPB 57.9% EPB 74.2% Table 5: Results for verb class disambiguation IPrior Baseline (IPB): We selected the class whose IPrior was the largest of the available possibilities. EPrior Baseline (EPB): We selected the class whose EPrior was the largest of the available possibilities. The results are summarized in Table 5. The average accuracy was obtained by averaging the accuracy over all 8 window sizes. We also report the highest accuracy and the window size where the highest accuracy were achieved. For example, using the window size L1R3 (see Table 5) the model UPrior+NHTD achieves its best performance of 64.8%. Several things are worth noting in the result Table 5: When the classifier is trained on hand-tagged data (HTD), using IPrior(IPrior+HTD) outperforms using UPrior(UPrior+HTD). This agrees with what is shown in LB04. However, using IPrior (IPrior+HTD) does not match the performance of using EPrior (EPrior+HTD). Both the average accuracy and the highest accuracy for model EPrior+HTD are higher than IPrior+HTD. A pair-wise t-test indicates that the difference is statistically significant (p-value = 0.021). This suggests that it is not always best to incorporate a prior derived from untagged texts into a classifier trained on hand-tagged data. It is better to derive a prior from a hand-tagged corpus if such a corpus is available. When the classifier is trained without using handtagged data (NHTD), neither using UPrior (UPrior+NHTD) nor using IPrior (IPrior+NHTD) performs better than any of the supervised models (HTD). However, they both (UPrior+NHTD and IPrior+NHTD) consistently outperform the random baseline, suggesting that verbs in the same Levin class do tend to share their context words. Our verb class disambiguator using untagged data can be used in the absence of a hand-tagged corpus. In addition, using the EPrior (EPrior+NHTD) achieves a performance comparable to that achieved by the supervised model with a UPrior (UPrior+HTD), suggesting a tagged corpus, if available, helps derive more accurate priors. When the classifier is trained without using handtagged data (NHTD), using IPrior (IPrior+NHTD) outperforms using UPrior (UPrior+NHTD). A pairwise t-test indicates that the improvement achieved by using IPrior is statistically significant (p-value = 0.026), suggesting that the IPrior derived from untagged data, thought not as accurate as the EPrior, can still can still help improve the performance of the classifier. All five models we experimented with outperform the IPB, but fail to achieve the performance of the EPB

with the exception of the model EPrior+HTD, the highest accuracy of which is about 3% better than the EPB. Again, annotation, if available, helps. 4 Conclusions and Future Work The main conclusions of this paper are the following: Our experiments confirm the importance of syntactic frame information in verb class disambiguation. In addition, we have also re-confirmed the importance of a good prior derived from untagged texts in WSD. However, instead of deriving the classifier from handtagged data like what LB04 did, we trained our classifier using examples containing unambiguous verbs. This offers us a way to disambiguate Levin verbs without relying on hand-tagged data. 4.1 About the Prior Model A contribution of our paper is a clearer reformulation of LB04 s prior model. This reveals that LB04 s prior model cannot distinguish between different verbs of the same class. This is a direct result of the independence assumption built into the model. To improve the performance of the prior model, we believe it is worthwhile finding new ways to bring the identity of each individual verb into the prior model [13]. 4.2 Disambiguation without a Handtagged Corpus We proposed a method for disambiguating Levin verbs that completely avoids the need for a hand-tagged corpus and analysed how it compares to various alternatives. Our experiments show that our verb class disambiguator is not as accurate as the supervised ones that make use of a hand-tagged corpus. One reason is that we relied on a statistical parser for identifying the target frames (double-object, transitive and dative) in constructing the training data. The training data obtained this way is noisy in that some false instances of the target frames are included. On the other hand, the training data (in this case it is the 5,078 test examples) used to train the supervised models has been examined by human annotators and is free of any false instances of the target frames. However, our method of disambiguating Levin verbs without using hand-tagged data consistently outperforms the random baseline, suggesting that it is feasible to use examples containing unambiguous verbs to disambiguate ambiguous ones. Levin s verb classification covers about 79 frames and many of them involve some ambiguity. In this paper, we only tested our verb class disambiguator on three of Levin s frames. It remains to be shown that it works equally well for other frames. We also plan to test our disambiguation method, namely using unambiguous words to disambiguate ambiguous ones, on different WSD data sets. 5 Acknowledgments This study was supported by NSF grant 0347799. We are grateful to Mirella Lapata for providing the test data. Our thanks also go to Sabine Schulte im Walde for making available to us the frame set she acquired from BNC. References [1] G. Carroll and M. Rooth. Valence induction with a headlexicalized PCFG. In Proceedings of 3rd Conference of Empirical Methods of Natural Language Processing, pages 58 63, 1998. [2] E. Charniak. A maximum-entropy-inspired parser. In Proceedings of the 2000 Conference of the North American Chapter of the Association for Computation Linguistics, pages 132 139, 2000. [3] S. Corley, M. Corley, F. Keller, M. Cocker, and S. Trewin. Finding syntactic structure in unparsed corpora. Computers and the Humanities, 35(2):81 94, 2000. [4] A. Goldberg. Constructions. University of Chicago Press, Chicago, 1st edition, 1995. [5] R. Jackendoff. Semantics and Cognition. MIT Press, Cambridge, MA, 1983. [6] A. Korhonen. Subcategorization Acquisition. PhD thesis, Cambridge University, 2002. [7] A. Korhonen, Y. Krymolowski, and Z. Marx. Clustering polysemic subcategorization frame distributions semantically. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pages 48 55, 2003. [8] M. Lapata and C. Brew. Verb class disambiguation using informative priors. Computational Linguistics, 30(1):45 73, 2004. [9] B. Levin. English Verb Classes and Alternations: A Preliminary Investigation. University of Chicago Press, Chicago, 1st edition, 1993. [10] D. Lin. Automatic retrieval and clustering of similar words. In COLING-ACL 98, 1998. [11] D. McCarthy. Using semantic preference for identifying verbal participation in role switching alternations. In Proceedings of the 1st Conference of the North American Chapter of the Association for Computational Linguistics, pages 58 63, 2000. [12] D. McCarthy, R. Koeling, J. Weeds, and J. Carroll. Finding predominant senses in untagged text. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, pages 280 287, 2005. [13] P. Merlo, E. Joanis, and J. Henderson. Unsupervised verb class disambiguation based on diathesis alternations. manuscripts, 2005. [14] P. Merlo and S. Stevenson. Automatic verb classification based on statistical distribution of argument structure. Computational Linguistics, 27(3):373 408, 2001. [15] G. Minnen, J. Carroll, and D. Pearce. Applied morphological processing of English. Natural Language Engineering, 7(3):207 223, 2000. [16] S. Pinker. Learnability and Cognition: The Acquisition of Argument Structure. MIT Press, Cambridge, MA, 1989. [17] D. Rohde, L. Gonnerman, and D. Plaut. An improved method for deriving word meaning from lexical co-occurrence. Cognitive Science, 2004. submitted. [18] S. Schulte im Walde. Clustering verbs semantically according to alternation behavior. In Proceedings of the 18th International Conference on Computational Linguistics, pages 747 753, 2000. [19] R. Swier and S. Stevenson. Unsupervised semantic role labelling. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 95 102, 2004.