Latent Semantic Kernels for WordNet: Transforming a Tree-like Structure into a Matrix

Similar documents
Probabilistic Latent Semantic Analysis

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Vocabulary Usage and Intelligibility in Learner Language

Matching Similarity for Keyword-Based Clustering

Using Web Searches on Important Words to Create Background Sets for LSI Classification

A Bayesian Learning Approach to Concept-Based Document Classification

Lexical Similarity based on Quantity of Information Exchanged - Synonym Extraction

A Semantic Similarity Measure Based on Lexico-Syntactic Patterns

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

On-the-Fly Customization of Automated Essay Scoring

Controlled vocabulary

Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design

A Comparison of Two Text Representations for Sentiment Analysis

English Language and Applied Linguistics. Module Descriptions 2017/18

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Word Sense Disambiguation

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Extended Similarity Test for the Evaluation of Semantic Similarity Functions

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Latent Semantic Analysis

Evidence for Reliability, Validity and Learning Effectiveness

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

A Comparative Evaluation of Word Sense Disambiguation Algorithms for German

A redintegration account of the effects of speech rate, lexicality, and word frequency in immediate serial recall

Automatic Extraction of Semantic Relations by Using Web Statistical Information

Evaluating vector space models with canonical correlation analysis

The Strong Minimalist Thesis and Bounded Optimality

Organizational Knowledge Distribution: An Experimental Evaluation

On document relevance and lexical cohesion between query terms

Statewide Framework Document for:

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA

Accuracy (%) # features

Transfer Learning Action Models by Measuring the Similarity of Different Domains

arxiv: v1 [cs.cl] 2 Apr 2017

AQUA: An Ontology-Driven Question Answering System

Integrating Semantic Knowledge into Text Similarity and Information Retrieval

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

A Case Study: News Classification Based on Term Frequency

Robust Sense-Based Sentiment Classification

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

A Domain Ontology Development Environment Using a MRD and Text Corpus

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Measurement. When Smaller Is Better. Activity:

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Knowledge-Free Induction of Inflectional Morphologies

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Leveraging Sentiment to Compute Word Similarity

Generating Test Cases From Use Cases

As a high-quality international conference in the field

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

TextGraphs: Graph-based algorithms for Natural Language Processing

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

Linking Task: Identifying authors and book titles in verbose queries

Switchboard Language Model Improvement with Conversational Data from Gigaword

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Using Synonyms for Author Recognition

Learning Semantically Coherent Rules

Combining a Chinese Thesaurus with a Chinese Dictionary

Lecture 1: Machine Learning Basics

Multilingual Sentiment and Subjectivity Analysis

Reinforcement Learning by Comparing Immediate Reward

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

Development of an IT Curriculum. Dr. Jochen Koubek Humboldt-Universität zu Berlin Technische Universität Berlin 2008

Level and Trend of Basic Education of Children in Bangladesh:

Performance Analysis of Optimized Content Extraction for Cyrillic Mongolian Learning Text Materials in the Database

The CTQ Flowdown as a Conceptual Model of Project Objectives

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Rule Learning With Negation: Issues Regarding Effectiveness

The MEANING Multilingual Central Repository

Axiom 2013 Team Description Paper

What is a Mental Model?

Semantic Evidence for Automatic Identification of Cognates

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Grade 5 + DIGITAL. EL Strategies. DOK 1-4 RTI Tiers 1-3. Flexible Supplemental K-8 ELA & Math Online & Print

Cross-Lingual Text Categorization

Comment-based Multi-View Clustering of Web 2.0 Items

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Automatic Essay Assessment

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

Cooperative Game Theoretic Models for Decision-Making in Contexts of Library Cooperation 1

Combining Bidirectional Translation and Synonymy for Cross-Language Information Retrieval

Ontologies vs. classification systems

A heuristic framework for pivot-based bilingual dictionary induction

Comparison of network inference packages and methods for multiple networks inference

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Circuit Simulators: A Revolutionary E-Learning Platform

Are You Ready? Simplify Fractions

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Strategies for Solving Fraction Tasks and Their Link to Algebraic Thinking

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Transcription:

Latent Semantic Kernels for WordNet: Transforming a Tree-like Structure into a Matrix Young-Bum Kim Department of Computer Engineering Hallym University Chuncheon, Gangwon, 200-702, Korea stylemove@hallym.ac.kr Yu-Seop Kim Department of Computer Engineering Hallym University Chuncheon, Gangwon, 200-702, Korea yskim01@hallym.ac.kr Abstract WordNet is one of the most widely used linguistic resources in the computational linguistics society. However, many applications using the WordNet hierarchical structure are suffering from the word sense disambiguation (WSD) caused by its polysemy. In order to solve the problem, we propose a matrix representing the WordNet hierarchical structure. Firstly, we transform a term as a vector with elements of each corresponding to a synset of WordNet. Then, with singular value decomposition (SVD), we reduce the dimension size of the vector to represent the latent semantic structure. For evaluation, we implement an automatic assessment system for short essays and acquire reliable accuracy. As a result, the scores which are assessed by the automatic assessment system are signi cantly correlated with those of human assessors. The new WordNet is expected to be easily combined with other matrix-based approaches. 1 Introduction For a decade, WordNet [4] has been one of the most widely used linguistic resources, and is a broad coverage lexical network of English words [2]. [1] made use of Wor- Net to measure the semantic distance between concepts and [13] used WordNet to disambiguate word senses. [9], [5], and [11] measured the relatedness of concepts, not words themselves, and the relatedness of words were estimated by using an assumption proposed by [11]. [11] measured the relatedness between words by calculating the relatedness between the most-related pair of concepts. The assumption would be proper to measure the relatedness of words themselves. However, it is less proper to measure the similarity between sentences, paraphrases, and even documents, and a new complicated method is needed to be developed to disambiguate sense of each word with considering its context. Latent Semantic Kernel, which was proposed by [3], has shown simpli ed methods to measure the similarity between documents and also has been applied to many applications such as an automatic assessment system of [7]. Unlike the WordNet based methods, the kernel method fortunately has no need to consider the polysemy problem during measurement process. This is the reason we tried to transform the WordNet structure into the kernel form, that is, a matrix. This matrix from WordNet has another nice property that it can easily be integrated with other matrix-based information in other application later. At rst, we built a term-synset matrix, like a termdocument matrix in traditional IR society, with a given Korean WordNet which is called KorLEX [10]. We initially gave a score, which was calculated based on a distance to the given synset, to each element of the matrix. Then we constructed a new semantic kernel from the matrix using the SVD algorithm. For evaluation of the new kernel matrix, we implemented an automatic assessment system for short essay questions. The correlation coef cient between the automatic system and a human was 0.922. Section 2 will describe the representation of the transformed WordNet. The latent semantic kernel method integrated with WordNet will be explained in section 3. The experimental results and the concluding remarks will be described in section 4 and 5, respectively. 2 Transformed WordNet Structure Synsets (synonym sets) are representing speci c underlying lexical concepts in the WordNet. Even though the WordNet has been utilized to solve many semantic problems in computational linguistics and information retrieval societies, its inherent polysemy problem, that is, one term could be shown in multiple synsets, has also caused another problems. In order to solve the problem, we adapted the

latent semantic kernel to the WordNet. Firstly, we built a term-synset matrix by transforming the WordNet hierarchical structure. Each row vector of the matrix is associated with each term listed in WordNet and the terms are also appearing more than once in corpus data having about 38,000 documents. The vector can be represented like: t j =<s 1,s 2,...,s i,...,s N > (1), where t j means a row vector for the j-th term and N is the total number of synsets in the WordNet. And s i,setto zero initially, is calculated by s i = α 2 k (2), where α is a constant value. The s i is decreasing along with the number of edges, 0 k k max, on the path from the synset including the term t j to the i-th synset. The k max decides the range of synsets related with the term. If k max is increasing, more synsets will be regarded as to be related with the synset of the term. In this paper, we decided that k max =2and also α =2. Figure 1 from [12] shows a part of WordNet extract for the term car. The term appears in multiple synsets, {car, gondola}, {car, railway car}, and {car, automobile}. Then the values of s i for the synsets are determined to (α=2) (2 0 =1) =2 in this paper. The adjacent synsets to {car, automobile}, which are {motor vehicle}, {coupe}, {sedan}, and {taxi} in gure 1, are all given (α=2) (2 1 =2) =1s as their s i values. This procedure is continued until the k max th adjacent synsets are faced. 3 Latent Semantic Kernel for Similarity Measurement With the initial term-synset matrix, A, created above, we build the latent semantic kernel [3]. Similarity between documents, d 1 and d 2, is estimated as follows. sim(d 1,d 2 )=cos(p T d 1,P T d 2 )= dt 1 PPd 2 P T d 1 P T (3) d 2, where P is a matrix transforming documents from an input space to a feature space. A kernel function k(d 1,d 2 )=< φ(d 1 ),φ(d 2 ) > uses the matrix P to replace φ(d 1 ) with P T d 1. To nd P, the term-synset matrix A is transformed by using SVD like A = UΣV T (4), where Σ is a diagonal matrix composed of nonzero eigenvalues of AA T or A T A, and U and V are the orthogonal eigenvectors associated with the r nonzero eigenvalues of AA T and A T A, respectively. The original term-synset matrix (A) has size of m n. One component matrix (U), with m r, describes the original row entities as vectors of derived orthogonal factor value, another (V ), withn r, describes the original column entities in the same way, and the third (Σ) is a diagonal matrix, with r r, containing scaling values when the three components are matrix-multiplied, the original matrix is reconstructed. The singular vectors corresponding to the k (k r) largest singular values are then used to de ne k-dimensional synset space. Using these vectors, m k and n k matrices U k and V k may be rede- ned along with k k singular value matrix Σ k. It is known that A k = U k Σ k Vk T is the closest matrix of rank k to the original matrix A. AndU k is replaced with P. [8] explains more details of above SVD-based methods, latent semantic analysis (LSA). 4 Evaluation In this section, we will rstly explain the automatic assessment procedure developed to evaluate the usefulness of the new kernel based on WordNet, and then show the usefulness by giving the assessment accuracy. 4.1 Automatic Assessment for Short Essay Figure 2 shows the whole process of the automatic assessment system developed to evaluate the new kernel. The whole process is started with the sentence input written by a student. At rst, Korean Morphological Analyzer (KMA) [6] extracts main indexed terms from the input sentence. With the term list, an initial vector is constituted with elements of the vocabulary generated from both a large document collection and WordNet. 16,000 words were listed in the vocabulary. Then, the dimension is reduced by computing P T d, where P is the kernel from WordNet and d is the initial vector. With model sentences, which were created by instructors and were transformed on the same way as the input student sentence, the similarities are estimated by using the equation (3), where the student sentence is mapped into d 1 and the model sentences are mapped into d 2.Inthis research, we stored ve model sentences for each question. Finally, the highest similarity value is determined as the - nal score of the student sentence. 4.2 Comparison to Human Assessors We gave 30 questions about Chinese proverbs to 100 students, requiring proper meaning of each proverbs. Firstly, a human assessor decided whether each answer is correct or not and gave one of two scores, 1, and 0, respectively. The

Figure 1. WordNet Extract for the term car Figure 2. Automatic Assessment Process

score of each student are then computed. Likewise, our assessment system does the same thing except that the system gives a partial score to each answer ranged from 0 to 1. Figure 3 shows the correlation between the scores from the human assessor and those of the automatic assessor. The correlation coef cient value is 0.922. As shown in gure 3, the assessment system tends to give a little lower score to the answer than the human. It is caused by the lack of information used in scoring. The less information has the assessor, the lower score it gives. Table 1. Error rate for each threshold threshold error rate threshold error rate 0.1 0.184 0.2 0.229 0.4 0.228 0.6 0.248 We also evaluate the coincidence level of two assessors decision whether the answer is correct or not. At rst, we randomly decided the threshold value which roles as a boundary score of correct and wrong answer. If a similarity value is larger than the threshold value, for example, then the answer is decided as a correct answer. Then we can count the number of answers decided to be same by both assessors. Table 1 shows the comparison results, regarding to the threshold values. When we used threshold values lower than 1.0, the error rates were raised again. 5 Concluding Remarks We proposed a latent semantic kernel for WordNet. Firstly, we transformed the hierarchical structure into a matrix, representing a term as a vector. Like corpus-driven latent semantic kernel, then, we adapted the SVD algorithm to present the latent semantic structure of WordNet. We evaluated the usefulness of the new kernel by integrating it into an automatic assessment system which showed high correlation with a human assessor. Terms in WordNet, in this paper, are represented as vectors similar to terms of other vector space models in many related applications. It will be possible for other researches to integrate the WordNet with other approaches based on bag of words concept. We will try to nd a new method to integrate the WordNet and many data-driven methods to integrate term similarity and term relatedness. References [1] A. Budanitsky and G. Hirst. Semantic distance in wordnet: An experimental, application-oriented evaluation of ve measures. Workshop on WordNet and Other Lexical Resources, Second Meeting of the North Americal Chapter of the Association for Computational Linguistics, 2001. [2] A. Budanitsky and G. Hirst. Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 32(1):13 47, 2006. [3] N. Cristianini, J. Shawe-Taylor, and H. Lodhi. Latent semantic kernel. Journal of Intelligent Information Systems, 18(2-3):127 152, 2002. [4] C. Fellbaum. WordNet: An Electronic Lexical Database. The MIT Press, Cambridge, MA, 1998. [5] J. J. Jiang and D. W. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of International Conference on Research in Computational Linguistics (ROCLING X), pages 19 33, 1997. [6] S.-S. Kang. Gerneral Purposed Morphological Analyzer HAM Ver. 6.0.0. http://nlp.kookmin.ac.kr, 2004. [7] Y.-S. Kim, W.-J. Cho, J.-Y. Lee, and Y.-J. Oh. An intelligent grading system using heterogeneous linguistic resources. Proceedings of 6th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL-2005), pages 102 108, 2005. [8] T. K. Landauer, P. W. Foltz, and D. Laham. An introduction to latent semantic analysis. Discourse Processes, (25):259 284, 1998. [9] C. Leacock and M. Chodorow. Combining local context and wordnet similarity for word sense identi cation. In Christiane Fellbaum, editor, WordNet: An Electronic Lexical Database, pages chapter 11:265 283, 1998. [10] E. R. Lee and S. S. Lim. Korean WordNet ver.2.0. Korean Language Processing Laboratory, Pusan National University, 2004. [11] P. Resnik. Using information content to evaluate semantic similarity. Proceedings of the 14th International Joint Conference on Arti cial Intelligence, 1995. [12] R. Richardson, A. F. Smeaton, and J. Murphy. Using Word- Net as a Knowledge Base for Measuring Semantic Similarity between Words. CA-1294, Dublin, Ireland, 1994. [13] E. M. Voorhees. Using wordnet to disambiguate word senses for text retrieval. Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1993. Acknowledgments This work was supported by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD, Basic Resrach Promotion Fund) (KRF-2006-331-D00534)

Figure 3. Scores graded by Human Assessor and Automatic Assessor