Extending Sparse Classification Knowledge via NLP Analysis of Classification Descriptions

Size: px
Start display at page:

Download "Extending Sparse Classification Knowledge via NLP Analysis of Classification Descriptions"

Transcription

1 Extending Sparse Classification Knowledge via NLP Analysis of Classification Descriptions Attila Ondi 1, Jacob Staples 1, and Tony Stirtzinger 1 1 Securboration, Inc W. NASA Blvd, Melbourne, FL, USA Abstract - Supervised machine learning algorithms, particularly those operating on free text, depend upon the quality of their training datasets to correctly classify unlabeled text instances. In many cases where the classification task is nontrivial, it is difficult to obtain a large enough set of training data to achieve good classification accuracy. In this work we examine one such case in the context of a system designed to ground free text to an organizational hierarchy which is ontologically modeled. We explore the impact of utilizing information garnered from a highly customized Natural Language Processing (NLP) analysis of this ontology to augment a very sparse initial training dataset and compare this to a more labor intensive extraction of a small set of key words and phrases associated with each concept. We demonstrate an approach with significant improvement in classifier performance for concepts having little or no initial training data coverage. Keywords: Hierarchical document classification, Automatic document classification, Machine learning, NLP, Ontology 1 Introduction In this work we describe a software classification system which employs NLP analysis and machine learning algorithms to automatically determine whether a human produced text applies to one or more of a set of concepts. The machine learning portion of this system is essentially a supervised learner whose accuracy relies heavily upon the quality of the training instances it has encountered. Unfortunately the concept space over which this solution is deployed is sufficiently large that it was not feasible to obtain a large quantity of high-quality training instances. In fact, many concepts were observed to never be explicitly referenced by training instances. Although the classification portion of the system evolves dynamically to improve its classifications over time, the initial classifications produced by the system were of poor quality due to this deficiency of initial training data. We discuss several approaches taken to improve these initial classifications and construct knowledge in the face of limited or absent training data. We compare two mechanisms of capturing expert user knowledge in terms of their impact on classifier accuracy and recall and how well they interact with traditional training instances. 2 Related Work From a machine learning perspective, this work deals with the well-studied field of supervised learning algorithms [1]. There is a wide body of research related to applying such algorithms to human produced text, for an overview see [2]. Of particular relevance is the multi-topic document classification problem [3] in which a document is analyzed by a machine classifier and determined to be applicable or not applicable to a list of topics. The sparseness of labeled training instances is by no means unique to this problem domain. Significant research has been conducted in the area of semi-supervised learners which attempt to generalize a small amount of labeled training data to a larger amount of unlabeled data which can subsequently be used for training. For a thorough overview of semi supervised techniques see [4]. In this work we examine a somewhat different position than that in which most semi-supervised systems are deployed, however, because we assume access to a limited number of labeled training instances but do not assume the existence of unlabeled training instances. Instead we assume access to expert generated knowledge either in the form of an ontology [5] modeling the structure of concepts in the concept space or in the form of a key-word document, either of which we attempt to derive unlabeled instances from and whose instance labels are trivial to reverse engineer from the ontology structure. 3 Approach We explore two unique options to expand the initial classifier knowledge. The first is to exploit the knowledge of subject matter experts by hand-populating a database of key phrases they have determined likely to be associated with each concept in the concept space. The second approach is to utilize an expert-created ontology describing the nature and relationship between the concepts in the concept space to determine which lexical features are associated with what concepts by making the assumption that the definition of a

2 concept will contain language similar to the documents associated with that concept. Though their angles of attack on this problem are quite different, it is important to keep in mind that these approaches are attempting to do fundamentally the same thing derive new training instances which can be provided to the classifier. Before describing the mechanics behind generating these instances, we will give a brief overview of the classifier and its operation. 3.1 Classifier The minutiae of the supervised learning classification algorithm utilized in this work are not germane to this discussion and are described elsewhere [10]. For the purposes of this paper, it is sufficient to understand the grounding engine mechanics for two key operations the classifier performs: classification and training. Classification is an operation performed on a text document to determine which concepts are present in it. In order to classify a given document as exhibiting or not exhibiting a set of concepts, the document is first decomposed into features using Natural Language Processing (NLP) techniques described later. These features are provided as input to the grounding engine. The output of the grounding engine is a label for each concept indicating how confident the classifier is that the concept exists in the document whose features were provided. Note that the features passed to the classifier are either sense-ambiguous (for example a stemmed form of a word with some notion of its part of speech) or sense-specific (for example a WordNet synset identifier). In this work we explore the implications of each feature type. Training is an operation performed on a text document and a set of labeled concepts indicating which are correctly or incorrectly associated with the text. When updated with a training instance, the classifier internally adjusts its structure such that in the future the features provided will be more likely to be associated with the concepts labeled correct and less likely to be associated with the concepts labeled incorrect. For clarity, we now introduce a logically distinct type of training, called bootstrap training, which differs slightly from the training described above. Bootstrap training is unique in that it is always performed before any standard training instances are encountered and is used to construct the initial body of knowledge needed to very roughly associate features with concepts. Because the bootstrap training labels in this work were implicitly derived, we could not make any assumptions about false negative concepts using a bootstrapping instance. When a bootstrap training instance was encountered, the classifier therefore only considered the concept arguments labeled as correct for the given text. Bootstrapping is useful because, as mentioned earlier, for many of the concept types on which our classification system operated, there were a significant number of concepts having a trivial number of training instances (or none at all). Without some form of correction, these holes in the initial knowledge of the classifier meant that the classification algorithm would always assign some concepts zero confidence. Bootstrapping is less crude than the typical approach of assigning each concept a small virtual probability. 3.2 Feature extraction using Natural Language Processing The features used as input to the classifier were extracted from document text via a NLP pipeline using the UIMA [6] framework and OpenNlp [7] components. The results of the OpenNlp components were augmented by a heuristic lemmatizer based on an extended WordNet [8] dictionary Custom WordNet dictionary WordNet is a machine-usable dictionary of words organized into synonym sets (synsets). Each word in the dictionary is associated with a sense, which is captured by a synset; and each synset is associated with a textual description, along with the words belonging to this synset. In this sense the WordNet dictionary assumes the role of a thesaurus as well. We modified WordNet in two ways to better fit the needs of the effort described here. The first modification was at the content level the dictionary lookup logic was modified to support customized synsets stored in a file. The addition of this custom read logic enables us to fully modify the WordNet dictionary by removing, altering, or creating new synsets in the dictionary. Nearly 1200 field-specific jargon and acronym terms were added to the dictionary. These 1200 terms were selected because they were important words previously discarded by the lemmatizer using a standard WordNet dictionary. After implementing the changes to WordNet the lemmatizer was able to recognize these terms as words. The second modification was at the software level the custom dictionary read logic was modified such that it loads all dictionary entries into memory (requiring around 350MB in our implementation). This change resulted in significant speedup over the file-based MIT implementation packaged with the dictionary for random word lookups. We observed a speedup of roughly 11x for random synset retrievals and 1.5x

3 for repeated synset retrievals compared to MIT s implementation. The lower speedup for repeated lookups is due to the use of caching in the MIT implementation, which mitigates costly file access times for dictionary entries which are frequently accessed Lemmatizer Although the standard OpenNlp components provide reasonable accuracy for determining the correct part-ofspeech (PoS) tags for words appearing in the document, the process is not perfect. To attempt to correct the erroneous PoS tags, we developed a custom component that uses simple heuristic rules to guess the potential textual form of the lemma of the word along with the correct PoS tag. The lemma of a word is simply the base form of the word that forms the head of an entry in a dictionary (e.g. the lemmas of the words ate and mice are eat and mouse, respectively). Our heuristic for finding the actual PoS tag for a word observed in a document uses the OpenNlp PoS tag guess for the word and the PoS tag of the previous non-filler word. If the custom dictionary contains a lemma/pos pair matching the known lemma and OpenNlp PoS tag guess, the OpenNlp tag guess is assumed to be correct. Otherwise, we consult the dictionary using the following PoS tags (in the given order): verb, noun, adjective and adverb. The first matching PoS from this list when paired with the lemma is returned as correct. Note that the heuristic rules of calculating the lemma from the textual form of the word change based on the current PoS tag candidate. If no lemma/pos tag matches an entry in the custom dictionary then the OpenNlp tag guess is assumed to be correct and the lemma is assigned the textual form of the word Sense disambiguation The sense disambiguation scheme utilized is based on the Lesk algorithm described in [9]. The algorithm outlined below operates on sentences extracted from the document, performing the steps for each unprocessed word in the sentence. If, at any step in the algorithm, only one viable sense of the word/pos remains, that sense is selected and the following steps are not performed. First, exclude senses that are hyponyms of non-applicable senses (e.g. sport terms). Second, check if senses from the current and another word are part of the same synset hierarchy; if yes, the corresponding senses are selected for both words. Third, check if the current and the neighboring word are related to each other via lexical parallelism. Two words exhibit lexical parallelism if there exists a hypernym/hyponym (for nouns), similar-to (for adjectives), pertains-to or morphological similarity (for adverb) relationship between any of their possible senses. If lexical parallelism is observed, the senses in relation are selected for the corresponding words. Fourth, if the current word is a verb, check if it is collocated with any of the neighboring word lemmas. A verb lemma is collocated with another lemma if the verb has a lemma in any of its hypernym synsets that is morphologically similar to the other lemma. Fifth, perform the Lesk algorithm: check for maximum lemma overlap between the current sentence and the textual description of the possible senses. Sixth, extract example usage from the sysnset textual definition and check for patterns with the neighboring words. If all steps were successfully performed and there are still multiple candidate synsets associated with the word, all the remaining synsets are accepted as equally likely correct senses. 3.3 Bootstrapping mechanisms We explored two options for generating the bootstrapping training set. For both options, we attempted to use both sense-ambiguous (lemma/pos pair) and sensespecific (synset ID) features. In the sense-specific case we resolved the senses of the bootstrapping training set manually and employed the automatic sense disambiguation mechanism described above during training Bootstrapping based on keyword mapping The first bootstrapping approach we took was to utilize a keyword mapping document which describes the key words and phrases associated with each concept in the concept space. To generate the bootstrapping training set, we created a set of single-feature training documents derived from the keyword mapping document. The label for the training instance contained in each of these documents was simply a list of the concepts correctly associated with the keyword. This approach scales poorly because a domain expert must distill each concept into a list of the most important phrases and terms associated with that concept. Furthermore, it is important that these phrases and words not overlap to provide maximum differentiation between the concepts. Avoiding overlap becomes increasingly difficult as the size of the concept space grows. Additionally, it is not clear whether the few key words or phrases selected by the expert will have sufficient lexical overlap with the contents of an arbitrary document to produce meaningful classification results.

4 3.3.2 Bootstrapping based on ontology lexicalization The second bootstrapping approach we explored was to utilize an ontology created by a domain expert that described the concepts in the concept space. To generate the training set, we created a single virtual training document for each concept. The training document consisted of the definition, description and other descriptive textual information extracted from the ontology describing the associated concept. The classification label associated with that document was the label of the concept from which the information was extracted. This approach scales better than manually generating the key word mapping document because the ontology is required to contain textual information (e.g. description) associated with each concept. The downside is that the training instances generated in this fashion may contain misleading knowledge since nonkeywords may be included in the ontology text fields. As an example of this: the definition of a concept might always begin with the word Definition which will lead classified documents containing the word Definition to be erroneously grounded to all concepts. 4 Results To gauge the performance of the various proposed mechanisms for training the classifier we conducted 50 runs with randomly generated training (T) and validation (V) sets for each run. Each validation set consisted of V randomly selected classification instances from a pool of C correctly classified documents (C = 63). The remaining T classification instances were used for training the classifier. The training and validation sets were selected such that they were guaranteed to be disjoint if possible (i.e., no document was validated using a classification instance previously used for training in the same run unless C < T + V ). The training set was augmented with instances derived using various combinations of the following approaches: 1. No bootstrapping 2. Human-created concept keywords 3. Lexicalized ontology For each validation instance the top 10 scoring concepts after classification were selected and the remainder was rejected. These 10 concepts were compared to the correct concept labels and the recall, precision, and fscore (the harmonic mean of the former two) values were computed as arithmetic averages over all runs. To more clearly demonstrate the problem we faced with limited training data, we first examine the impact of decreasing the size of the training set with no training augmentation. Figure 1: Impact of training set size on classifier performance with no training augmentation As expected, decreasing the size of the training set had a deleterious effect on precision, recall and fscore. Note that with the 1T/10V configuration the classifier s performance was actually slightly worse than if 10 concepts had been selected from the roughly 600 in the hierarchy at random. This worse-than-random behavior was due primarily to incomplete coverage of the concept space by the single training instance. Because our classification system frequently encountered instances where there were few or no initial training instances as in the 1T/10V configuration, we clearly needed a mechanism for bootstrapping classifier knowledge. To understand how effective bootstrapping can be on its own and with standard training, we tested the following training configurations: B: bootstrap training only B+T: bootstrap training followed by standard training T: standard training only We also explored the impact of sense disambiguation during parsing using the following bootstrapping configurations: Dk: hand disambiguation of keyword senses, automatic disambiguation of training instances Dd: hand disambiguation of concept ontology descriptions, automatic disambiguation of training instances

5 Ak: ambiguous keywords (no disambiguation of training instances) Ad: ambiguous concept ontology descriptions (no disambiguation of training instances) Note that the Dk and Dd configurations above employed sense disambiguation. To maximize the benefit of sense disambiguation, we manually disambiguated the bootstrapping data. Figure 2 shows the performance of various training configurations in terms of their fscores normalized to the unbootstrapped 63T/10V case from Figure 1. In Figure 2, we observe that bootstrapping alone (B) performs poorly relative to the training alone (T) and both (B+T) configurations. This is expected because the quality of the bootstrapping training instances is significantly lower than the quality of real-world generated training instances. Figure 2 : Impact of training configuration on classifier performance (63T/10V) It is clear from Figure 2 that bootstrapping alone is a poor substitute for quality training data but does provide significantly better recall than simply guessing (which would produce a recall of 3.3% on average). On its own the best performing bootstrapping configuration was (Dk) which utilized the hand generated key word document. We also observe that sense disambiguation (used in the Dk and Dd cases) only provided a benefit over using ambiguous features when bootstrapping alone (B) was used. The benefit of sense disambiguation in this case is 46% greater for keyword (Dk) than with lexicalized ontology (Dd) bootstrapping. This significant improvement over lexicalized ontology bootstrapping is a direct result of the highly targeted and minimally overlapping language used in the keyword document whereas the lexicalized ontology bootstrap instances were observed to contain many overlapping terms and phrases between unique concepts. The next interesting observation from Figure 2 is that the bootstrapping and training configuration (B+T) performed better than training alone (T). This indicates that bootstrapping was beneficial even for concepts covered by labeled training instances. To quantify the interaction between bootstrapping and normal training, we next varied the training set size and examined the performance of the (Ad, B+T) case above. Note that no sense disambiguation was used during bootstrapping or training and that lexicalized ontology bootstrapping was used. These results are shown in Figure 3 as improvement factors for recall, precision and fscore relative to the corresponding values for the training only (T) case from Figure 1. Figure 3: Relative performance of lexicalized ontology bootstrapping + training compared to training alone For all of the tested training set sizes shown above, bootstrapping resulted in a significant improvement in fscore relative to the trained only case. Intuitively, the benefit of bootstrapping decreases as the training set increases in size and this was also observed. It is interesting to note that bootstrapping improved precision appreciably more than recall in all cases. This is because, during the post bootstrapping training process, exact concept matches were fed back to the system with higher weights than close concept matches.

6 It should be noted that while the relative improvements in recall, precision, and fscore for the 1T/10V case were large, classifier recall, precision, and fscore at that point were roughly 32%, 74% and 45%, respectively. Although still far from perfect, this result was significantly better than the results obtained with few or no initial training instances. 5 Conclusion and Future Work We have explored several approaches which can be used to generate initial training data using domain expert knowledge. The most effective of these was building a bootstrapping training set using a lexicalization of an expert generated ontology describing the concept space. Our lexicalized bootstrapping mechanism was able to achieve an fscore of 40% even with no initial training instances present. Additionally, we observed 15% improvement in fscore values using bootstrapping and training together even for the most trained configuration tested. One possible improvement to the techniques described in this work would be to experiment with more sophisticated sense disambiguation schemes. Our efforts to leverage WordNet to exploit synset relationships such as hypernyms and hyponyms were unsuccessful largely because we were unable to accurately map words in text to WordNet synsets automatically. In such a system it would be possible to derive a large number of similar words to those in the bootstrapping training set using the synset hypernym and hyponym relational links and we anticipate improved performance for bootstrapping using these approaches. [5] N. Guarino. Formal Ontology in Information Systems ; Proceedings of FOIS 98, Vol. 46, 3 15, June [6] D. F. Lally. Uima: An architectural approach to unstructured information processing in the corporate research environment ; Natural Language Engineering, Vol. 10, , [7] J. Hockenmaier, G. Bierner and J. Baldridge. Extending the Coverage of a CCG System ; Research in Language and Computation, Special Issue on Linguistic Theory and Grammar Implementation, Vol, 2, Issue 2, , [8] G. A. Miller. WordNet: a lexical database for English ; Communications of the ACM, Vol. 38, Issue 11, 39 41, [9] M. Lesk. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone ; Proceedings of the 5th annual international conference on systems documentation (SIGDOC 86), 24 26, [10] J. Staples, A. Ondi and T. Stirtzinger. Semiautonomous hierarchical document classification using an interactive grounding framework ; Proceedings of the ICAI 12, (to appear), Acknowledgement This work was supported in part by the Air Force Research Laboratory, Contract Nos. FA D , FA D and FA C References [1] T. M. Mitchell. Machine Learning. McGrawHill, [2] M. Berry. Survey of Text Mining: Clustering, Classification, and Retrieval, First Edition. Springer, [3] H. B. Bernick. Automatic document classification ; Journal of the ACM, Vol. 10, Issue 2, , [4] N. Chawla and G. Karakoulas. Learning from labeled and unlabeled data: an empirical study across techniques and domains ; Journal of Artificial Intelligence Research, Vol. 23, , 2005.

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Leveraging Sentiment to Compute Word Similarity

Leveraging Sentiment to Compute Word Similarity Leveraging Sentiment to Compute Word Similarity Balamurali A.R., Subhabrata Mukherjee, Akshat Malu and Pushpak Bhattacharyya Dept. of Computer Science and Engineering, IIT Bombay 6th International Global

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

Vocabulary Usage and Intelligibility in Learner Language

Vocabulary Usage and Intelligibility in Learner Language Vocabulary Usage and Intelligibility in Learner Language Emi Izumi, 1 Kiyotaka Uchimoto 1 and Hitoshi Isahara 1 1. Introduction In verbal communication, the primary purpose of which is to convey and understand

More information

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence. NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working

More information

Combining a Chinese Thesaurus with a Chinese Dictionary

Combining a Chinese Thesaurus with a Chinese Dictionary Combining a Chinese Thesaurus with a Chinese Dictionary Ji Donghong Kent Ridge Digital Labs 21 Heng Mui Keng Terrace Singapore, 119613 dhji @krdl.org.sg Gong Junping Department of Computer Science Ohio

More information

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2 Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2 Ted Pedersen Department of Computer Science University of Minnesota Duluth, MN, 55812 USA tpederse@d.umn.edu

More information

Parsing of part-of-speech tagged Assamese Texts

Parsing of part-of-speech tagged Assamese Texts IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal

More information

Word Sense Disambiguation

Word Sense Disambiguation Word Sense Disambiguation D. De Cao R. Basili Corso di Web Mining e Retrieval a.a. 2008-9 May 21, 2009 Excerpt of the R. Mihalcea and T. Pedersen AAAI 2005 Tutorial, at: http://www.d.umn.edu/ tpederse/tutorials/advances-in-wsd-aaai-2005.ppt

More information

A Bayesian Learning Approach to Concept-Based Document Classification

A Bayesian Learning Approach to Concept-Based Document Classification Databases and Information Systems Group (AG5) Max-Planck-Institute for Computer Science Saarbrücken, Germany A Bayesian Learning Approach to Concept-Based Document Classification by Georgiana Ifrim Supervisors

More information

Multilingual Sentiment and Subjectivity Analysis

Multilingual Sentiment and Subjectivity Analysis Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department

More information

A Case-Based Approach To Imitation Learning in Robotic Agents

A Case-Based Approach To Imitation Learning in Robotic Agents A Case-Based Approach To Imitation Learning in Robotic Agents Tesca Fitzgerald, Ashok Goel School of Interactive Computing Georgia Institute of Technology, Atlanta, GA 30332, USA {tesca.fitzgerald,goel}@cc.gatech.edu

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com

More information

An Interactive Intelligent Language Tutor Over The Internet

An Interactive Intelligent Language Tutor Over The Internet An Interactive Intelligent Language Tutor Over The Internet Trude Heift Linguistics Department and Language Learning Centre Simon Fraser University, B.C. Canada V5A1S6 E-mail: heift@sfu.ca Abstract: This

More information

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer

More information

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form Orthographic Form 1 Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form The development and testing of word-retrieval treatments for aphasia has generally focused

More information

Cross-Lingual Text Categorization

Cross-Lingual Text Categorization Cross-Lingual Text Categorization Nuria Bel 1, Cornelis H.A. Koster 2, and Marta Villegas 1 1 Grup d Investigació en Lingüística Computacional Universitat de Barcelona, 028 - Barcelona, Spain. {nuria,tona}@gilc.ub.es

More information

Distant Supervised Relation Extraction with Wikipedia and Freebase

Distant Supervised Relation Extraction with Wikipedia and Freebase Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational

More information

1. Introduction. 2. The OMBI database editor

1. Introduction. 2. The OMBI database editor OMBI bilingual lexical resources: Arabic-Dutch / Dutch-Arabic Carole Tiberius, Anna Aalstein, Instituut voor Nederlandse Lexicologie Jan Hoogland, Nederlands Instituut in Marokko (NIMAR) In this paper

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1 Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial

More information

The College Board Redesigned SAT Grade 12

The College Board Redesigned SAT Grade 12 A Correlation of, 2017 To the Redesigned SAT Introduction This document demonstrates how myperspectives English Language Arts meets the Reading, Writing and Language and Essay Domains of Redesigned SAT.

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Guidelines for Writing an Internship Report

Guidelines for Writing an Internship Report Guidelines for Writing an Internship Report Master of Commerce (MCOM) Program Bahauddin Zakariya University, Multan Table of Contents Table of Contents... 2 1. Introduction.... 3 2. The Required Components

More information

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability Shih-Bin Chen Dept. of Information and Computer Engineering, Chung-Yuan Christian University Chung-Li, Taiwan

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

Controlled vocabulary

Controlled vocabulary Indexing languages 6.2.2. Controlled vocabulary Overview Anyone who has struggled to find the exact search term to retrieve information about a certain subject can benefit from controlled vocabulary. Controlled

More information

Some Principles of Automated Natural Language Information Extraction

Some Principles of Automated Natural Language Information Extraction Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract

More information

Constructing Parallel Corpus from Movie Subtitles

Constructing Parallel Corpus from Movie Subtitles Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing

More information

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February

More information

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and

More information

Ontologies vs. classification systems

Ontologies vs. classification systems Ontologies vs. classification systems Bodil Nistrup Madsen Copenhagen Business School Copenhagen, Denmark bnm.isv@cbs.dk Hanne Erdman Thomsen Copenhagen Business School Copenhagen, Denmark het.isv@cbs.dk

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

Copyright Corwin 2015

Copyright Corwin 2015 2 Defining Essential Learnings How do I find clarity in a sea of standards? For students truly to be able to take responsibility for their learning, both teacher and students need to be very clear about

More information

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz

More information

Disambiguation of Thai Personal Name from Online News Articles

Disambiguation of Thai Personal Name from Online News Articles Disambiguation of Thai Personal Name from Online News Articles Phaisarn Sutheebanjard Graduate School of Information Technology Siam University Bangkok, Thailand mr.phaisarn@gmail.com Abstract Since online

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

Success Factors for Creativity Workshops in RE

Success Factors for Creativity Workshops in RE Success Factors for Creativity s in RE Sebastian Adam, Marcus Trapp Fraunhofer IESE Fraunhofer-Platz 1, 67663 Kaiserslautern, Germany {sebastian.adam, marcus.trapp}@iese.fraunhofer.de Abstract. In today

More information

Loughton School s curriculum evening. 28 th February 2017

Loughton School s curriculum evening. 28 th February 2017 Loughton School s curriculum evening 28 th February 2017 Aims of this session Share our approach to teaching writing, reading, SPaG and maths. Share resources, ideas and strategies to support children's

More information

On document relevance and lexical cohesion between query terms

On document relevance and lexical cohesion between query terms Information Processing and Management 42 (2006) 1230 1247 www.elsevier.com/locate/infoproman On document relevance and lexical cohesion between query terms Olga Vechtomova a, *, Murat Karamuftuoglu b,

More information

2.1 The Theory of Semantic Fields

2.1 The Theory of Semantic Fields 2 Semantic Domains In this chapter we define the concept of Semantic Domain, recently introduced in Computational Linguistics [56] and successfully exploited in NLP [29]. This notion is inspired by the

More information

Evolution of Symbolisation in Chimpanzees and Neural Nets

Evolution of Symbolisation in Chimpanzees and Neural Nets Evolution of Symbolisation in Chimpanzees and Neural Nets Angelo Cangelosi Centre for Neural and Adaptive Systems University of Plymouth (UK) a.cangelosi@plymouth.ac.uk Introduction Animal communication

More information

Matching Similarity for Keyword-Based Clustering

Matching Similarity for Keyword-Based Clustering Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web

More information

South Carolina English Language Arts

South Carolina English Language Arts South Carolina English Language Arts A S O F J U N E 2 0, 2 0 1 0, T H I S S TAT E H A D A D O P T E D T H E CO M M O N CO R E S TAT E S TA N DA R D S. DOCUMENTS REVIEWED South Carolina Academic Content

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS

CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS Pirjo Moen Department of Computer Science P.O. Box 68 FI-00014 University of Helsinki pirjo.moen@cs.helsinki.fi http://www.cs.helsinki.fi/pirjo.moen

More information

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION L I S T E N I N G Individual Component Checklist for use with ONE task ENGLISH VERSION INTRODUCTION This checklist has been designed for use as a practical tool for describing ONE TASK in a test of listening.

More information

CEFR Overall Illustrative English Proficiency Scales

CEFR Overall Illustrative English Proficiency Scales CEFR Overall Illustrative English Proficiency s CEFR CEFR OVERALL ORAL PRODUCTION Has a good command of idiomatic expressions and colloquialisms with awareness of connotative levels of meaning. Can convey

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Nathaniel Hayes Department of Computer Science Simpson College 701 N. C. St. Indianola, IA, 50125 nate.hayes@my.simpson.edu

More information

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade The third grade standards primarily address multiplication and division, which are covered in Math-U-See

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Analysis: Evaluation: Knowledge: Comprehension: Synthesis: Application:

Analysis: Evaluation: Knowledge: Comprehension: Synthesis: Application: In 1956, Benjamin Bloom headed a group of educational psychologists who developed a classification of levels of intellectual behavior important in learning. Bloom found that over 95 % of the test questions

More information

CS 598 Natural Language Processing

CS 598 Natural Language Processing CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

ScienceDirect. Malayalam question answering system

ScienceDirect. Malayalam question answering system Available online at www.sciencedirect.com ScienceDirect Procedia Technology 24 (2016 ) 1388 1392 International Conference on Emerging Trends in Engineering, Science and Technology (ICETEST - 2015) Malayalam

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Number of students enrolled in the program in Fall, 2011: 20. Faculty member completing template: Molly Dugan (Date: 1/26/2012)

Number of students enrolled in the program in Fall, 2011: 20. Faculty member completing template: Molly Dugan (Date: 1/26/2012) Program: Journalism Minor Department: Communication Studies Number of students enrolled in the program in Fall, 2011: 20 Faculty member completing template: Molly Dugan (Date: 1/26/2012) Period of reference

More information

Candidates must achieve a grade of at least C2 level in each examination in order to achieve the overall qualification at C2 Level.

Candidates must achieve a grade of at least C2 level in each examination in order to achieve the overall qualification at C2 Level. The Test of Interactive English, C2 Level Qualification Structure The Test of Interactive English consists of two units: Unit Name English English Each Unit is assessed via a separate examination, set,

More information

Universiteit Leiden ICT in Business

Universiteit Leiden ICT in Business Universiteit Leiden ICT in Business Ranking of Multi-Word Terms Name: Ricardo R.M. Blikman Student-no: s1184164 Internal report number: 2012-11 Date: 07/03/2013 1st supervisor: Prof. Dr. J.N. Kok 2nd supervisor:

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

On the Combined Behavior of Autonomous Resource Management Agents

On the Combined Behavior of Autonomous Resource Management Agents On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science

More information

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

The IDN Variant Issues Project: A Study of Issues Related to the Delegation of IDN Variant TLDs. 20 April 2011

The IDN Variant Issues Project: A Study of Issues Related to the Delegation of IDN Variant TLDs. 20 April 2011 The IDN Variant Issues Project: A Study of Issues Related to the Delegation of IDN Variant TLDs 20 April 2011 Project Proposal updated based on comments received during the Public Comment period held from

More information

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cl] 2 Apr 2017 Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

SETTING STANDARDS FOR CRITERION- REFERENCED MEASUREMENT

SETTING STANDARDS FOR CRITERION- REFERENCED MEASUREMENT SETTING STANDARDS FOR CRITERION- REFERENCED MEASUREMENT By: Dr. MAHMOUD M. GHANDOUR QATAR UNIVERSITY Improving human resources is the responsibility of the educational system in many societies. The outputs

More information

A Domain Ontology Development Environment Using a MRD and Text Corpus

A Domain Ontology Development Environment Using a MRD and Text Corpus A Domain Ontology Development Environment Using a MRD and Text Corpus Naomi Nakaya 1 and Masaki Kurematsu 2 and Takahira Yamaguchi 1 1 Faculty of Information, Shizuoka University 3-5-1 Johoku Hamamatsu

More information

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN From: AAAI Technical Report WS-98-08. Compilation copyright 1998, AAAI (www.aaai.org). All rights reserved. Recommender Systems: A GroupLens Perspective Joseph A. Konstan *t, John Riedl *t, AI Borchers,

More information

THE VERB ARGUMENT BROWSER

THE VERB ARGUMENT BROWSER THE VERB ARGUMENT BROWSER Bálint Sass sass.balint@itk.ppke.hu Péter Pázmány Catholic University, Budapest, Hungary 11 th International Conference on Text, Speech and Dialog 8-12 September 2008, Brno PREVIEW

More information

Senior Stenographer / Senior Typist Series (including equivalent Secretary titles)

Senior Stenographer / Senior Typist Series (including equivalent Secretary titles) New York State Department of Civil Service Committed to Innovation, Quality, and Excellence A Guide to the Written Test for the Senior Stenographer / Senior Typist Series (including equivalent Secretary

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016 AGENDA Advanced Learning Theories Alejandra J. Magana, Ph.D. admagana@purdue.edu Introduction to Learning Theories Role of Learning Theories and Frameworks Learning Design Research Design Dual Coding Theory

More information

The taming of the data:

The taming of the data: The taming of the data: Using text mining in building a corpus for diachronic analysis Stefania Degaetano-Ortlieb, Hannah Kermes, Ashraf Khamis, Jörg Knappen, Noam Ordan and Elke Teich Background Big data

More information

Data-driven Type Checking in Open Domain Question Answering

Data-driven Type Checking in Open Domain Question Answering Data-driven Type Checking in Open Domain Question Answering Stefan Schlobach a,1 David Ahn b,2 Maarten de Rijke b,3 Valentin Jijkoun b,4 a AI Department, Division of Mathematics and Computer Science, Vrije

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain

Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain Andreas Vlachos Computer Laboratory University of Cambridge Cambridge, CB3 0FD, UK av308@cl.cam.ac.uk Caroline Gasperin Computer

More information

How to Judge the Quality of an Objective Classroom Test

How to Judge the Quality of an Objective Classroom Test How to Judge the Quality of an Objective Classroom Test Technical Bulletin #6 Evaluation and Examination Service The University of Iowa (319) 335-0356 HOW TO JUDGE THE QUALITY OF AN OBJECTIVE CLASSROOM

More information

The MEANING Multilingual Central Repository

The MEANING Multilingual Central Repository The MEANING Multilingual Central Repository J. Atserias, L. Villarejo, G. Rigau, E. Agirre, J. Carroll, B. Magnini, P. Vossen January 27, 2004 http://www.lsi.upc.es/ nlp/meaning Jordi Atserias TALP Index

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

A process by any other name

A process by any other name January 05, 2016 Roger Tregear A process by any other name thoughts on the conflicted use of process language What s in a name? That which we call a rose By any other name would smell as sweet. William

More information