Tolerant and Adaptive Information Retrieval with Neural Networks

Similar documents
Probabilistic Latent Semantic Analysis

Artificial Neural Networks written examination

Python Machine Learning

Learning Methods for Fuzzy Systems

A Case Study: News Classification Based on Term Frequency

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

INPE São José dos Campos

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Evolutive Neural Net Fuzzy Filtering: Basic Description

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Evolution of Symbolisation in Chimpanzees and Neural Nets

Automating the E-learning Personalization

Word Segmentation of Off-line Handwritten Documents

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Henry Tirri* Petri Myllymgki

Knowledge-Based - Systems

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Axiom 2013 Team Description Paper

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

DICTE PLATFORM: AN INPUT TO COLLABORATION AND KNOWLEDGE SHARING

AQUA: An Ontology-Driven Question Answering System

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

South Carolina English Language Arts

Neuro-Symbolic Approaches for Knowledge Representation in Expert Systems

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Modeling function word errors in DNN-HMM based LVCSR systems

A Case-Based Approach To Imitation Learning in Robotic Agents

Lecture 1: Machine Learning Basics

Human Emotion Recognition From Speech

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Speech Recognition at ICSI: Broadcast News and beyond

Seminar - Organic Computing

Test Effort Estimation Using Neural Network

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Rule Learning With Negation: Issues Regarding Effectiveness

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Australian Journal of Basic and Applied Sciences

Knowledge Transfer in Deep Convolutional Neural Nets

Artificial Neural Networks

Quantitative Evaluation of an Intuitive Teaching Method for Industrial Robot Using a Force / Moment Direction Sensor

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Modeling function word errors in DNN-HMM based LVCSR systems

Rubric for Scoring English 1 Unit 1, Rhetorical Analysis

Reinforcement Learning by Comparing Immediate Reward

On-Line Data Analytics

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas

Learning to Schedule Straight-Line Code

Data Fusion Models in WSNs: Comparison and Analysis

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Guide to Teaching Computer Science

Rule Learning with Negation: Issues Regarding Effectiveness

The Smart/Empire TIPSTER IR System

A 3D SIMULATION GAME TO PRESENT CURTAIN WALL SYSTEMS IN ARCHITECTURAL EDUCATION

Abstractions and the Brain

Organizational Knowledge Distribution: An Experimental Evaluation

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Ontologies vs. classification systems

Lecture 10: Reinforcement Learning

Presentation Advice for your Professional Review

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS

10.2. Behavior models

How People Learn Physics

Speaker Identification by Comparison of Smart Methods. Abstract

Latent Semantic Analysis

Education: Integrating Parallel and Distributed Computing in Computer Science Curricula

Use of Online Information Resources for Knowledge Organisation in Library and Information Centres: A Case Study of CUSAT

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Classification Using ANN: A Review

Syntactic systematicity in sentence processing with a recurrent self-organizing network

AUTOMATED FABRIC DEFECT INSPECTION: A SURVEY OF CLASSIFIERS

SARDNET: A Self-Organizing Feature Map for Sequences

Softprop: Softmax Neural Network Backpropagation Learning

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The Learning Model S2P: a formal and a personal dimension

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

A Pipelined Approach for Iterative Software Process Model

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

Matching Similarity for Keyword-Based Clustering

The Future of Consortia among Indian Libraries - FORSA Consortium as Forerunner?

Discriminative Learning of Beam-Search Heuristics for Planning

Short vs. Extended Answer Questions in Computer Science Exams

User education in libraries

Rule-based Expert Systems

How to Judge the Quality of an Objective Classroom Test

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

Transcription:

Tolerant and Adaptive Information Retrieval with Neural Networks Thomas Mandl Information Science - University of Hildesheim Marienburger Platz 22-31141 Hildesheim Germany e-mail: mandl@rz.uni-hildesheim.de Abstract: The COSIMIR (Cognitive Similarity Learning in Information Retrieval) model applies the powerful backpropagation algorithm to Information Retrieval, integrating human centered, soft and tolerant computing to the core of the retrieval processes. An overview of neural networks in Information Retrieval shows that current systems do not fully exploit the potential of neural networks. An empirical evaluation of COSIMIR has led to positive and promising results. Keywords: Backpropagation, Information Retrieval, Neural Networks 1. Information Retrieval The amount of knowledge available in the world is increasing at a fast pace. Most of it is still conveyed through text documents. Information Seekers need to be directed toward the relevant piece of information in an ocean of knowledge to be able to solve their problems. Therefore, Information Retrieval will be a key technology in the near future. Large scale experiments have shown that current retrieval engines only find a fraction of the relevant documents in a collection (Voorhees and Harman 1998). To deal with the ever growing amount of text, better suited models are necessary. The main weaknesses of today s Information Retrieval systems are: Cognitive processes are modeled mathematically: Query and document are matched by similarity functions which are not based on the human judgement of similarity. Therefore, the inherent vagueness of the Information Retrieval process is not appropriately modeled in current systems. Lack of Adaptivity: The mathematical models imperfectly adapt to the situation within a certain domain. Different importance of terms and their combinations are neglected. Treatment of heterogeneity: Traditional Information Retrieval systems assume a homogeneous and monolithic data source, although users want to retrieve documents of different types with one query. The semantic problems of linking multimedial, multilingual sources remain unsolved. A sketch of the state of the art in chapter three shows that the current Information Retrieval models based on neural networks have considerable weaknesses. This analysis has led to the development of COSIMIR (Cognitive Similarity Learning in Information Retrieval), an innovative model integrating human knowledge into the core of the retrieval process.

2. Neural Networks Neural networks are an information processing technology within the framework of soft computing and computational intelligence. They are based on the parallel and distributed processing of information which leads to highly tolerant systems. Neural networks can learn from existing data even when humans find it difficult to identify rules. The backpropagation network specifically has been applied to a large number of problems. It learns the mapping between pattern spaces based on examples. Input and output are located in layers of neurons and the Backpropagation networks introduce a hidden layer, which increases the computing capabilities. Neural networks are also appropriate from the perspective of cognitive science. Smolensky 1988 claims that the introduction of hidden neurons without symbolic equivalence leads toward an intuitive processor capable of implementing human expert knowledge. 3. Neural Networks in Information Retrieval The soft computing paradigm of neural networks seems to be well suited for Information Retrieval tasks. This particular field has attracted considerable research; however, the search for an appropriate architecture has proved to be difficult. Current systems can be grouped into four categories. 3.1 Kohonen Self-Organizing-Maps Several Researchers have implemented Information Retrieval systems based on the Kohonen Self-Organizing Map (SOM), a neural network model for unsupervised classification. Implementations for large collections can be tested on the internet (Chen et al. 1996, Kohonen 1998). The SOM consists of a usually two-dimensional grid of neurons, each associated with a weight vector. Input documents are classified into the most similar class and in the next step the algorithm adapts the weights of the winning class and its neighbor. As a result, the most similar classes are always the neighboring classes. The Information Retrieval paradigm for the SOM is browsing. However, users of large text collections need search mechanisms and the SOM does not adapt to s earch. 3.2 Associative Memories Associative memories like the Hopfield-Network are powerful error-tolerant retrieval tools. s can be stored as energy minima. The query is considered as distorted pattern and serves as input. The network minimizes its energy and tends toward the closest minimum which represents the result document. As a consequence, only one pattern or document is retrieved which is not sufficient for most users. However, asso-

associative memories can be used e.g. for spelling correction within Information Retrieval systems. 3.3 Information Retrieval in Spreading Activation Networks The most common and so far the most successful Information Retrieval model based on neural networks is that of the so called spreading activation networks (Belew 1989, Kwok 1989). These are simple neural networks with normally two layers of networks representing index terms and documents. The weights between the layers are initially set according to the results of the traditional indexing and weighting schemes. The terms of the user s query are activated and activation spreads along the weights into the document layer and back. The most highly activated documents are presented to the user as result. The spreading activation networks have performed well in experiments within the TREC conference (e.g. Boughanem and Soulé-Dupuy 1998). However, a closer look at the models reveals that they very much resemble the traditional vector space model of information retrieval. Mothe 1994 presents a theoretical and empirical proof that after the first step of activation propagation the models are indeed identical. Thus, spreading activation networks are not a new model for information retrieval. Dokument-Layer 1 2 3 4 5 6 7 8 The model of Mori et al. 1990 is an extension of the spreading activation systems and it includes several hidden layers. It learns to map from sets of query terms to sets of documents. The document layer of such a system seems to be too large to collect suffi- Connektionism Neural Networks LAN Server Internet Term-Layer User Fig. 1: Spreading Activation Model Fig.. 2: Kohonen SOM (cf. Kohonen 1998) Furthermore, they do not fully exploit the potential of neural networks. Their learning capabilities are rather limited and they do not include hidden neurons, a feature which increases the computing abilities. 3.2 Experimental Backpropagation Models Although the backpropagation algorithm is one of the most powerful and most often used neural network models, it has not been applied to information retrieval very often so far.

sufficient training data. In addition it is unclear how generalization can be guaranteed when the features of the objects are not integrated into the model. The transformation network (Crestani 1994) maps between two different representation schemes of documents. It does not implement the central process in Information Retrieval, although it can be used for pre-processing in an environment of heterogeneous documents. 4. The COSIMIR-Model The COSIMIR model (COgnitive SIMilarity Learning in Information Retrieval) implements the central process in Information Retrieval in a backpropagation network and avoids the weaknesses of the discussed models. An Information Retrieval system calculates the similarity between a query and a document representation. COSIMIR learns to calculate this similarity by making use of many examples given by humans. The input for COSIMIR consists of one query representation and one document representation. Both are fed in parallel into the input layer. The activation spreads through one hidden layer into the output layer consisting of one unit, representing the similarity between both objects. The similarity calculated can be interpreted as relevance of the document for the query. This step needs to be repeated for each document in the collection. By using the backpropagation algorithm, COSIMIR can form sub-symbolic representations in the hidden layer and can implement a complex function. Relevance/Similarity Not all Connections are Shown Hidden Layer -Representation Query-Representation Fig. 3: The COSIMIR Model 4.1 The Suitability of COSIMIR for Information Retrieval The COSIMIR model uses the traditional knowledge source in Information Retrieval, which means that the weights of the terms for the documents are derived by an indexing

method. In addition, it integrates a large number of relevance judgements. That way, it makes use of more knowledge than a traditional Information Retrieval system. Neural networks are a very tolerant processing method. As one result of this tolerance, different evaluations of different users will not dramatically decrease the performance of COSIMIR. Traditional Information Retrieval systems use a mathematical similarity function like the cosine to calculate the relevance of a document for a query; however, these formulas do not account for the complexity of human similarity judgements. Tversky 1977 showed that similarity is often perceived neither as transitive nor symmetrical; however, most mathematical functions have these properties. COSIMIR does not need to make these assumptions and does not need to model them explicitly. According to the set of relevance judgements, the resulting similarity function implemented in the neural network may be transitive or not. And the choice of one similarity function based on some heuristics can be avoided, as well. Common Information Retrieval models assume that terms are independent and that all have the same importance for the similarity. Neither of these assumptions is true. COSIMIR does not rely on such assumptions and rather learns the complex relationships between terms. As a result, a cognitive similarity function is implemented rather than a mathematical one. COSIMIR is also very flexible and can even process heterogeneous representations as long as enough human similarity judgements are available. Hence, it can be applied to a heterogeneous system where the user can form the query in a representation scheme or thesaurus different from the document representation scheme. This scenario will become more and more common when information sources are connected and users are able to query a number of them with one action. 4.2 Empirical Evaluation COSIMIR was first evaluated with a data set on materials used in the construction of airplane engines. These materials were characterized by two vectors, one representing their features and the other representing the parts for which the materials can be used. According to experts, the similarity of materials in this area is primarily based on the usage of a particular material. Thus, a COSIMIR model which took two feature vectors as input and was trained to calculate the similarity based on the usage vectors was implemented. The performance was measured by comparing the original similarity ranking with the one obtained by COSIMIR on a test set. The correlation reached a percentage of 79%, a result which can be considered very satisfying. COSIMIR networks for text retrieval tend to become very large, as the number of terms is usually higher than 5000 even for controlled vocabulary. This results in a large number of connections which need to be trained using a sufficient number of training examples. Therefore, a statistical dimensionality reduction based on Singular Value Decomposition is used for the experiments (Deerwester et al. 1990). Using this method, the term space can be reduced to some 300 dimensions, something which can be handled by COSIMIR. Details of COSIMIR and the experiments carried out can be found in Mandl 1998.

5. Conclusion The COSIMIR model for information retrieval has the potential to improve current systems in order to lead to better results for information seekers. It integrates human knowledge and experience in the form of relevance judgements on documents and queries into the core of the system, and thus it eliminates some heuristic choices in the implementation phase of an information retrieval system. A cognitive similarity function is implemented by learning the regularities of human similarity judgement with a neural network. References Boughanem M; Soulé-Dupuy C (1998): Mercure at trec6. In: Voorhees and Harman 1998. Belew R (1989): Adaptive Information Retrieval: Using a Connectionist Representation to Retrieve and Learn about s. In: Belkin and Rijsbergen 1989. pp. 11-20. Chen H; Schuffels C; Orwig R (1996): Internet Categorization and Search: A Self-Organizing Approach. In: J of Visual Communication and Image Representation. 7(1). pp. 88-101. Crestani F (1994): Domain Knowledge Acquisition for Information Retrieval Using Neural Networks. In: Int J of Applied Expert Systems 2(2). pp. 100-115. Deerwester S; Dumais ST; Harshman R (1990): Indexing by Latent Semantic Analysis. In: Journal of the American Society For Information Science 1990. vol. 41 (6). pp. 391-407. Kohonen T (1998): Self-organization of Very Large Collections: State of the art. In: Niklasson L; Bodén M; Ziemke T (eds.): Proc ICANN98, 8th Int Conf on Artificial Neural Networks, Springer, London. vol. 1, pp. 65-74. Kwok K. L. (1989): A Neural Network for Probabilistic Information Retrieval. In: Belkin and Rijsbergen 1989. pp. 21-30. Mandl T (1998): Das COSIMIR Modell: Information Retrieval mit dem Backpropagation Algorithmus. ELVIRA-Arbeitsbericht 10, IZ Sozialwissenschaften, Bonn. Mori H; Chung CL; Kinoe Y; Hayashi Y (1990): An Adaptive Retrieval System Using a Neural Network. In: International Journal of Human-Computer Interaction 2 (3). pp. 267-280. Mothe J (1994): Search Mechanisms Using a Neural Network Model. In: Intelligent Multimedia Information Retrieval Systems and Management. Proc. of RIAO 94. New York. pp. 275-294. Smolensky P (1988): On the Proper Treatment of Connectionism. In: Behavioral and Brain Sciences vol. 11. pp. 1-74. Tversky A (1977): Features of Similarity. In: Psychological Review vol. 84 (4). pp. 327 Voorhees E; Harman D (eds.) (1998): The Sixth Text Retrieval Conference (TREC-6). NIST Special Publication 500-240. National Institute of Standards and Technology. Gaithersburg. Nov. 19-21 1996.