Using Synonyms for Author Recognition

Size: px
Start display at page:

Download "Using Synonyms for Author Recognition"

Transcription

1 Using Synonyms for Author Recognition Abstract. An approach for identifying authors using synonym sets is presented. Drawing on modern psycholinguistic research, we justify the basis of our theory. Having formally defined the operations needed to algorithmically determine authorship, we present the results of applying our method to a corpus of classic literature. We argue that this technique of author recognition is both accurate as an author identification tool, as well as applicable to other domains in computer science such as speaker recognition. 1 Introduction and Motivation Current research in the area of Stylometry focuses on identifying idiosyncrasies in written literature to identify the author. We present a novel approach for identifying authors in written text that is also applicable to identifying speakers from automatically transcribed discourse. In this paper, we argue that an author s choice of lexicons used by an author or speaker when many synonyms are available is idiosyncratic to the point of providing identification. Modern psycholinguistic evidence indicates that this is a well-founded approach. Though previous methods of author attribution have focused on linguistic elements peculiar to written text, the use of synonyms does not require such elements as punctuation to be effective and so may be applied to automatically transcribed speech as well. Such a flexible approach is ideal for situations such as a Smart Home environment where requests made in natural language could originate from sources as varied as a personal computer to a mounted microphone. Traditional methods might require the use of both a vocal analyzer and a text-specific analyzer to identify the source. When using synonyms, no such dependence on disparate technologies is incurred. The ability to discern the source of a document or statement also has value in the area of knowledge acquisition. One goal of knowledge acquisition is to gain the most accurate knowledge possible. However, all sources of information are not equally reliable and so such a system could be designed not trust them equally. The first step toward this type of learning is the ability to correctly identify the source of a piece of knowledge (i.e. a document). The presented method of author recognition aims to be adaptable to all of these applications.

2 2 Related Work Author attribution is a well-studied area of artificial intelligence. Formal methods for determining authorship have even older roots. The field of Stylometry has thrived well before the turn of the twentieth century having several documented methods of analyzing texts to settle disputed authorship. Linguistic idiosyncrasies that have been identified as characteristic of an author include everything from counting keywords to analyzing punctuation. 2.1 Related Efforts in Stylometry The classification of documents by content (as opposed to by author) has been one area in which similar techniques have been employed. Keywords have shown to be very effective in this area by many studies. This idea has been taken further by those such as Paek [10], who used keywords in image descriptions to classify the content of the images. In the field of Stylometry, several algorithmic approaches have been applied to author attribution. For instance, Brinegar [3], Glover, and Hirst [5] used distributions of word length statistics to determine authorship while others including Morton [9] used sentence length for identification. The number of distinct words in a set of documents was studied by Holmes [7] and Thisted. [14] 2.2 Psycholinguistic Foundations In the development of this method, consideration was given to not only the empirical correctness of its results, but also to the theoretical foundations on which it is built. Current psycholinguistic research suggests that synonyms are a part of language that is affected by one s environment. Developed by the Cognitive Science Laboratory at Princeton University, the WordNet lexical database is itself rooted in modern psycholinguistic theory. [13] It is organized based largely upon how humans are believed to understand words. It is no coincidence that synonym sets are at the heart of WordNet. In Holland s 1991 study, when subjects were requested to think of as many synonyms as possible for a set of words, a small but significant priming effect was found. [6] That is to say, subjects were more likely to produce as synonyms words that they had previously seen. This suggests that the synonyms produced by an individual are a product of experience, something that is very unique indeed. Let us digress a moment and further pursue the concept of priming. Associative priming, the process by which one concept leads to another (e.g. a dog might cause one to think about a bone), is a result of spreading activation. [1] The way in which activation potentials spread through a neural network is dependent upon what connections have been made and how often they have been asserted. According to Hebbian brain theory, connections that fire together are more likely to fire together in the future. Thus, from the perspective of cognitive psychology, it makes sense that though

3 many synonyms may be activated by a concept, the word at the forefront of one s mind will be based on experience. The uniqueness of this experience and the corresponding uniqueness of synonym choice may be exploited to determine authorship. 3 Theory 3.1 Definitions As a simple formalization of this theory, let us begin by defining a set of authors α, which have been encountered by our system before any identification is processed. We then define λ i as the lexicon corresponding to author α i so that λ i is the vocabulary richness. Next, consider two functions which may be applied to a word w in λ: occ(w) and syn(w) where occ(w) is the number of times word w was encountered and syn(w) is the number of synonyms for x. Consider a threshold θ, which is the minimum number of synonyms that a word must have before to be considered an idiosyncratic identifier of an author. Note that it is the use of a sufficiently large value of θ that provides a reasonable running time for our algorithm. Now we define the filtered lexicon σ i as author α i s lexicon with words having sufficiently large set of synonyms such that σ i λ i where each word σ ij has syn(σ ij ) θ. Having considered the task of learning each author s style, we can examine the case of identifying the work of an unknown author α u where α u α. The heart of the algorithm is in the intersection of the filtered lexicon of the unknown author with that of all known authors: ρ i = σ i σ u. (1) Here, there are some special considerations to be made as to exactly how the intersection of these two lexicons is to be calculated. Since we have also associated a number of occurrences occ(w) with each element of σ, we need to determine how to evaluate this function for an intersection. For our purposes, let the number of occurrences for a word used by both authors i and j be occ( σ i σ j ) = min( occ(σ i ), occ(σ j ) ). (2) Finally, we calculate the match factor for each author: ρi µ(i,u) = occ( ρ ij ) syn( ρij ) (3) j= 1 The hypothetical author α i of the text corresponds to the maximum value of the match factor where n = α such that µ(i,u) = max( µ(0,u), µ(1,u),, µ(i,u), µ(n,u) ) (4)

4 3.2 Tractability Though upon first glance, calculating σ i σ u for every author in α may spark complexity concerns, we have already addressed the problem of tractability by means of θ. Recall that σ is a subset of λ and, as we will see, can be a very small subset indeed, if the threshold is set high enough. Even though it seems we are cutting many possible matches, our theory holds that for reasonably small θ, we are cutting less important words, since the author was constrained by the lexicon of the language when choosing a word with syn(w) < θ, due to a lack of synonyms. 3.3 Parallelization By means of the independence of the various stages of the training and, later, the discernment process, this algorithm is an excellent candidate for parallelization (see Fig. 1). Each author and, in fact, each document that is added to the training set α can be evaluated separately, leaving the merging of the sets to a central dispatcher. Since the string operations necessary to calculate the statistics are far more processor-intensive than calculating the union of said sets, a very high rate of speedup is possible. However, true to Amdahl s law, the stage of identifying the work of an unknown author is dependent on having all training data prepared. Still, each intersection σ i σ u and the subsequent calculations of µ(i,u) are independent of one another, giving yet another opportunity for parallelization. In the end, it is reasonable to distribute the calculations associated with each individual author to a separate parallel process. Fig. 1. An illustration of the major parts of the algorithm that may be run in parallel

5 4 Implementation 4.1 WordNet As our method requires the ability to determine the number of synonyms for a word, we chose to use WordNet 1 to accomplish this task. In development since 1985, WordNet is now the one of the foremost lexical resources in computational linguistics. With over 118,000 word forms, it encompasses a substantial portion of the English language. [13] In WordNet, each word is linked to one or more senses. These senses, in turn, can reside in synonym sets. Thus, we can find if a word shares at least one of its senses in common with another word making it, by definition, a synonym. Furthermore, this gives us the number of synonyms for a word by summing the number of unique members of a word s synonym sets. Though unimportant from a theoretical standpoint, it should be noted that the actual version of WordNet used in this research was not the traditional C library, but rather a normalized database format for PostgreSQL. 2 This allowed the number of synonyms for a word to be determined in the execution of a single SQL statement. 4.2 Corpus The texts used to train the system consisted of works of classic literature by Charles Dickens and William Shakespeare. As these texts are in the public domain, they are freely available for review by the curious reader. 3 Table 1. Texts included in our experimental corpus. The corpus was comprised of 286,898 words total Set Total Words Included Texts Dickens Train 65,157 Battle of Life, Chimes, To be Read at Dusk Dickens Test 80,832 Cricket on the Hearth, Three Ghost Stories, A Christmas Carol Shakespeare Train 73,863 Comedy of Errors, Hamlet, Romeo and Juliet Shakespeare Test 67,046 Julius Caesar, Henry V, Macbeth 1 WordNet 2.1 is available for download from the Princeton Cognitive Science Laboratory at 2 WordNet SQL Builder for WordNet 2.1, the application used to generate a PostgreSQL database of WordNet, is available for download at 3 The texts listed here are available from Project Gutenberg at

6 The selected works of Dickens includes the so-called Christmas Stories, which are A Christmas Carol, The Chimes, The Cricket on the Hearth, and the Battle of Life, all written within the same decade. The other of Dickens texts are short-stories, To be Read at Dusk and Three Ghost Stories. Shakespeare s writings that were analyzed are some of his more famous comedies and tragedies such as The Comedy of Errors, Hamlet, Romeo and Juliet, Julius Caesar, Henry V, and Macbeth. These works of Shakespeare span over a 16-year period of his writing career. In total, the corpus contained 286,898 words. The works of Shakespeare contributed 140,909 of the words while Dickens text constituted 145,989 words of the total. Each author used a relatively large vocabulary, though still a very small portion of the English language as a whole. Shakespeare s lexicon had over 13,000 words while Dickens had in excess of 12,000. The corpus was divided into four sections. Most obviously, the texts were grouped by author. Secondly, each author s texts were divided in half, resulting in groups of about 72,000±9% words. These sets were tasked as train and test where the train was used in acquiring λ, the characteristic lexicon of an author and a match factor µ(i,u) was then calculated for each train-test pair σ Train σ Test. 5 Results 5.1 Author Identification Results For our test data, the results showed that there is a well-defined difference in the match factor of a correct pair and an incorrect pair (see Fig. 2 and Fig. 3). This trend is consistent through all tested values of the threshold θ. In the cases of both Dickens and Shakespeare, the correct test set was matched with its corresponding test set. Moving beyond the simple fact that the system managed to produce the correct answers, let us take note of the margin by which the correct answer was ranked above the others. The strongest case in the set was that of the Shakespeare training set versus the two test sets. With the difference between the matching and non-matching set being roughly 17%, there is little question that this test run was a success. In the case of the Dickens training set run against the test sets, the answer is less confident with a 10% difference between the matching and non-matching values. However, upon further consideration, it makes sense that the authors should have a good deal of their vocabulary in common. Though the unique qualities of an author s style may be subtle, we still have the ability to detect these subtleties and exploit them to determine authorship.

7 Match Factor 250,000 Percent Difference 200, , ,000 50,000 Shakespeare Train - Shakespeare Test Shakespeare Train - Dickens Test Shakespeare Test Dickens Test Threshold Value Fig. 2. The match factor µ(i,u) correctly correlates each test set with the training set written by the same author. Thus, given a set known to be by a certain author, the system can discern one author from another Percent Difference in Match Factors 25% Percent Difference 20% 15% 10% 5% Shakespeare Train vs Two Test Sets Dickens Train vs Two Test Sets 0% Threshold Value Fig. 3. By applying sufficiently large values of the synonym threshold θ, the number of synonym matches, and therefore the computational overhead, is greatly reduced

8 Number of Synonym Matches Number of Synonym Matches 2,000 1,800 1,600 1,400 1,200 1, Threshold Value Shakespeare Train - Shakespeare Test Shakespeare Train - Dickens Test Shakespeare Test Dickens Test Fig. 4. By applying sufficiently large values of the synonym threshold θ, the number of synonym matches, and therefore the computational overhead, is greatly reduced. Due to our particular interest in words with large numbers of synonyms, the cutting of words with small synonym sets does not negatively effect the algorithm s ability to distinguish authors from one another Reductions in Synonym Matches Reduction in Synonym Matches 100% 95% 90% 85% 80% 75% Threshold Value Shakespeare Train - Shakespeare Test Shakespeare Train - Dickens Test Shakespeare Test Dickens Test Fig. 5. Even moderate values of θ yield in excess of 90% reduction in the synonym matches between authors

9 5.2 Threshold Performance Results As expected, the cutting of words having few synonyms was successful in reducing the size of ρ for each author. In the case of Shakespeare s works, there were over 11,900 unique words. Yet, after filtering out words via a threshold of 30, only 539 words had to be examined, giving us a 94% overall reduction in the size of ρ. (see Fig. 4 and Fig. 5). The fact that the threshold was so successful in identifying those words that were useful and discarding others is key in making this algorithm tractable for larger sets of authors. Even higher values of θ do not negatively impact the accuracy of discerning authorship. 6 Future Work 6.1 Multilingual Implementation The next step in our work is to determine if it will be successful in other languages. Since the only language dependent part of our algorithm is WordNet, we simply need to query a lexical database for our target language. Currently, there is much work being done on the creation and automatic generation of WordNet databases for various languages. One such implementation is MEANING, a project using the web as a huge corpus in an effort to build a multilingual WordNet. Already, the project has produced a web-based interface to view their progress Integration with an Automatic Speech Transcriber One key point of this algorithm is that it is not dependent upon the peculiarities of the written word. That is, the system is not hindered by the punctuation generated by an automatic speech transcription system. Provided the transcription system can accurately generate text from speech, the results of our method of author recognition should seamlessly carry over to speaker recognition. One possible implementation of this involves Carnegie Mellon s Sphinx Live Decoder. Using Hidden Markov Models, this program has the ability to transcribe voice input to text on the fly. [8] This type of system would be ideal for situations in which input could come from disparate sources such as keyboards and microphones. 4 The demo of MEANING project can be found at The web-based demo can be accessed from this page.

10 6.3 Integration in to a Smart Home Environment In a Smart Home environment, it is key to provide a truly natural experience for the home s inhabitants. Thus, it follows that the Smart Home should be able to take vocal requests and that it should be able to discern who is making these requests. This will allow the system s responses to be more appropriate. For example, an truly intelligent Smart Home would not be likely to comply with a three-year-old child s demand for ten dozen cookies. A author-speaker recognition system using the methods proposed here is the first step in making this scenario a reality. 7 Conclusions Research using synonyms to recognize authors shows much promise for the future. Being grounded in psycholinguistic theory, we can be confident that it has a solid foundation on which future work can be built. Furthermore, the fact that this method can be optimized using higher threshold values and distributed processing allows it to be used in situations where running time is a consideration. Having displayed its ability to accurately identify authors for this domain, we look forward to applying our new theory to more areas of usage. References 1. Anderson, J.R. Cognitive Psychology and its Implications. New York: W.H. Freeman and Company. (1995) 2. Baayen, H., van Halteren, H., Neijt, A., Tweedie, F. An experiment in authorship attribution. JADT 2002: 6es Journees internationales d Analyse stastique des Donnees Textuelles. (2002) 3. Brinegar, C. Mark Twain and the Quintus Curtius Snodgrass Letters: A Statistical Test of Authorship. Journal of the American Statistical Association, 58, (1963) 4. Chaski, C. (2005). Who s At the Keyboard? Recent Results in Authorship Attribution. International Journal of Digital Evidence. Spring (2005) 5. Glover, A. and Hirst, G. Detecting stylistic inconsistencies in collaborative writing. In Sharples, Mike and van der Geest, Thea (eds.), The new writing environment: Writers at work in a world of technology. London: Springer-Verlag. (1996) 6. Holland, Cynthia Rose. Does synonym priming exist on a word completion task? Doctoral Thesis, Case Western Reserve University, Psychology. (1992) 7. Holmes, D. Authorship Attribution. Computers and the Humanities, 28, Kluwer Academic Publishers, Netherlands. (1994) 8. Huang, X. et al. The Sphinx-II Speech Recognition System. Computer Speech and Language. (1993) 9. Morton, A. The Authorship of Greek Prose. Journal of the Royal Statistical Society (A), 128, (1965) 10. Paek, S. et al. Integration of Visual and Text-Based Approaches for the Content Labeling and Classication of Photographs. ACM SIGIR. (1999)

11 11. Reiter, E. and S. Sripada. Contextual Influences on Near-Synonym Choice. Proceedings of INLG-2004, pages (2004) 12. Kaster, A., Siersdorfer, S., Gerhard, W. (2005). Combining Text and Linguistic Document Representations for Authorship Attribution. SIGIR Workshop: Stylistic Analysis of Text for Information Access (STYLE), Salvador, Bahia, Brazil. (2005) 13. Miller, George A. WordNet: A Lexical Database for English. Communications of the ACM. November 1995/Vol.38, No. 11 (1995) 14. Thisted, R. and Efron, B. Did Shakespeare Write a Newly-discovered Poem? Biometrika, 74, , (1987) 15. Uzuner, Ö., Katz, B. (2005). A Comparative Study of Language Models for Book and Author Recognition. Lecture Notes in Computer Science. Volume 3651/2005, pp Springer-Verlag GmbH. (2005)

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1 Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial

More information

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

TITLE: Shakespeare: The technical words. DATE(S): Project will run for four weeks during June or July

TITLE: Shakespeare: The technical words. DATE(S): Project will run for four weeks during June or July PROJECT: CulpeperSprint1 TITLE: Shakespeare: The technical words SUPERVISOR(S): Prof. Jonathan Culpeper DATE(S): Project will run for four weeks during June or July JOB DESCRIPTION: This project focuses

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

Matching Similarity for Keyword-Based Clustering

Matching Similarity for Keyword-Based Clustering Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web

More information

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

An Empirical and Computational Test of Linguistic Relativity

An Empirical and Computational Test of Linguistic Relativity An Empirical and Computational Test of Linguistic Relativity Kathleen M. Eberhard* (eberhard.1@nd.edu) Matthias Scheutz** (mscheutz@cse.nd.edu) Michael Heilman** (mheilman@nd.edu) *Department of Psychology,

More information

NCEO Technical Report 27

NCEO Technical Report 27 Home About Publications Special Topics Presentations State Policies Accommodations Bibliography Teleconferences Tools Related Sites Interpreting Trends in the Performance of Special Education Students

More information

Seminar - Organic Computing

Seminar - Organic Computing Seminar - Organic Computing Self-Organisation of OC-Systems Markus Franke 25.01.2006 Typeset by FoilTEX Timetable 1. Overview 2. Characteristics of SO-Systems 3. Concern with Nature 4. Design-Concepts

More information

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data Kurt VanLehn 1, Kenneth R. Koedinger 2, Alida Skogsholm 2, Adaeze Nwaigwe 2, Robert G.M. Hausmann 1, Anders Weinstein

More information

Prentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Correlated to Nebraska Reading/Writing Standards (Grade 10)

Prentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Correlated to Nebraska Reading/Writing Standards (Grade 10) Prentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Nebraska Reading/Writing Standards (Grade 10) 12.1 Reading The standards for grade 1 presume that basic skills in reading have

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Literature and the Language Arts Experiencing Literature

Literature and the Language Arts Experiencing Literature Correlation of Literature and the Language Arts Experiencing Literature Grade 9 2 nd edition to the Nebraska Reading/Writing Standards EMC/Paradigm Publishing 875 Montreal Way St. Paul, Minnesota 55102

More information

The Strong Minimalist Thesis and Bounded Optimality

The Strong Minimalist Thesis and Bounded Optimality The Strong Minimalist Thesis and Bounded Optimality DRAFT-IN-PROGRESS; SEND COMMENTS TO RICKL@UMICH.EDU Richard L. Lewis Department of Psychology University of Michigan 27 March 2010 1 Purpose of this

More information

Detecting English-French Cognates Using Orthographic Edit Distance

Detecting English-French Cognates Using Orthographic Edit Distance Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

Knowledge based expert systems D H A N A N J A Y K A L B A N D E Knowledge based expert systems D H A N A N J A Y K A L B A N D E What is a knowledge based system? A Knowledge Based System or a KBS is a computer program that uses artificial intelligence to solve problems

More information

An Interactive Intelligent Language Tutor Over The Internet

An Interactive Intelligent Language Tutor Over The Internet An Interactive Intelligent Language Tutor Over The Internet Trude Heift Linguistics Department and Language Learning Centre Simon Fraser University, B.C. Canada V5A1S6 E-mail: heift@sfu.ca Abstract: This

More information

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC On Human Computer Interaction, HCI Dr. Saif al Zahir Electrical and Computer Engineering Department UBC Human Computer Interaction HCI HCI is the study of people, computer technology, and the ways these

More information

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA Alta de Waal, Jacobus Venter and Etienne Barnard Abstract Most actionable evidence is identified during the analysis phase of digital forensic investigations.

More information

EXTENSIVE READING AND CLIL (GIOVANNA RIVEZZI) Liceo Scientifico e Linguistico E. Bérard Aosta

EXTENSIVE READING AND CLIL (GIOVANNA RIVEZZI) Liceo Scientifico e Linguistico E. Bérard Aosta EXTENSIVE READING AND CLIL (GIOVANNA RIVEZZI) Liceo Scientifico e Linguistico E. Bérard Aosta LICEO SCIENTIFICO E LINGUISTICO E. BÉRARD AOSTA School year 2013-2014: Liceo scientifico: 438 students Liceo

More information

On-Line Data Analytics

On-Line Data Analytics International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

Evolution of Symbolisation in Chimpanzees and Neural Nets

Evolution of Symbolisation in Chimpanzees and Neural Nets Evolution of Symbolisation in Chimpanzees and Neural Nets Angelo Cangelosi Centre for Neural and Adaptive Systems University of Plymouth (UK) a.cangelosi@plymouth.ac.uk Introduction Animal communication

More information

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Jana Kitzmann and Dirk Schiereck, Endowed Chair for Banking and Finance, EUROPEAN BUSINESS SCHOOL, International

More information

BUILD-IT: Intuitive plant layout mediated by natural interaction

BUILD-IT: Intuitive plant layout mediated by natural interaction BUILD-IT: Intuitive plant layout mediated by natural interaction By Morten Fjeld, Martin Bichsel and Matthias Rauterberg Morten Fjeld holds a MSc in Applied Mathematics from Norwegian University of Science

More information

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9)

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9) Nebraska Reading/Writing Standards, (Grade 9) 12.1 Reading The standards for grade 1 presume that basic skills in reading have been taught before grade 4 and that students are independent readers. For

More information

Rote rehearsal and spacing effects in the free recall of pure and mixed lists. By: Peter P.J.L. Verkoeijen and Peter F. Delaney

Rote rehearsal and spacing effects in the free recall of pure and mixed lists. By: Peter P.J.L. Verkoeijen and Peter F. Delaney Rote rehearsal and spacing effects in the free recall of pure and mixed lists By: Peter P.J.L. Verkoeijen and Peter F. Delaney Verkoeijen, P. P. J. L, & Delaney, P. F. (2008). Rote rehearsal and spacing

More information

Effect of Word Complexity on L2 Vocabulary Learning

Effect of Word Complexity on L2 Vocabulary Learning Effect of Word Complexity on L2 Vocabulary Learning Kevin Dela Rosa Language Technologies Institute Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA kdelaros@cs.cmu.edu Maxine Eskenazi Language

More information

learning collegiate assessment]

learning collegiate assessment] [ collegiate learning assessment] INSTITUTIONAL REPORT 2005 2006 Kalamazoo College council for aid to education 215 lexington avenue floor 21 new york new york 10016-6023 p 212.217.0700 f 212.661.9766

More information

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February

More information

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS Ruslan Mitkov (R.Mitkov@wlv.ac.uk) University of Wolverhampton ViktorPekar (v.pekar@wlv.ac.uk) University of Wolverhampton Dimitar

More information

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 141 ( 2014 ) 124 128 WCLTA 2013 Using Corpus Linguistics in the Development of Writing Blanka Frydrychova

More information

South Carolina English Language Arts

South Carolina English Language Arts South Carolina English Language Arts A S O F J U N E 2 0, 2 0 1 0, T H I S S TAT E H A D A D O P T E D T H E CO M M O N CO R E S TAT E S TA N DA R D S. DOCUMENTS REVIEWED South Carolina Academic Content

More information

An Online Handwriting Recognition System For Turkish

An Online Handwriting Recognition System For Turkish An Online Handwriting Recognition System For Turkish Esra Vural, Hakan Erdogan, Kemal Oflazer, Berrin Yanikoglu Sabanci University, Tuzla, Istanbul, Turkey 34956 ABSTRACT Despite recent developments in

More information

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE Submitted in partial fulfillment of the requirements for the degree of Sarjana Sastra (S.S.)

More information

EXECUTIVE SUMMARY. Online courses for credit recovery in high schools: Effectiveness and promising practices. April 2017

EXECUTIVE SUMMARY. Online courses for credit recovery in high schools: Effectiveness and promising practices. April 2017 EXECUTIVE SUMMARY Online courses for credit recovery in high schools: Effectiveness and promising practices April 2017 Prepared for the Nellie Mae Education Foundation by the UMass Donahue Institute 1

More information

A Case-Based Approach To Imitation Learning in Robotic Agents

A Case-Based Approach To Imitation Learning in Robotic Agents A Case-Based Approach To Imitation Learning in Robotic Agents Tesca Fitzgerald, Ashok Goel School of Interactive Computing Georgia Institute of Technology, Atlanta, GA 30332, USA {tesca.fitzgerald,goel}@cc.gatech.edu

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

correlated to the Nebraska Reading/Writing Standards Grades 9-12

correlated to the Nebraska Reading/Writing Standards Grades 9-12 correlated to the Nebraska Reading/Writing Standards Grades 9-12 CONTENTS CORRELATION: Grade 9... 1 Grade 10...21 Grade 11..39 Grade 12..58 McDougal Littell The Language of Literature correlated to the

More information

Large vocabulary off-line handwriting recognition: A survey

Large vocabulary off-line handwriting recognition: A survey Pattern Anal Applic (2003) 6: 97 121 DOI 10.1007/s10044-002-0169-3 ORIGINAL ARTICLE A. L. Koerich, R. Sabourin, C. Y. Suen Large vocabulary off-line handwriting recognition: A survey Received: 24/09/01

More information

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Marek Jaszuk, Teresa Mroczek, and Barbara Fryc University of Information Technology and Management, ul. Sucharskiego

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Ontologies vs. classification systems

Ontologies vs. classification systems Ontologies vs. classification systems Bodil Nistrup Madsen Copenhagen Business School Copenhagen, Denmark bnm.isv@cbs.dk Hanne Erdman Thomsen Copenhagen Business School Copenhagen, Denmark het.isv@cbs.dk

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

Multi-Lingual Text Leveling

Multi-Lingual Text Leveling Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency

More information

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

The MEANING Multilingual Central Repository

The MEANING Multilingual Central Repository The MEANING Multilingual Central Repository J. Atserias, L. Villarejo, G. Rigau, E. Agirre, J. Carroll, B. Magnini, P. Vossen January 27, 2004 http://www.lsi.upc.es/ nlp/meaning Jordi Atserias TALP Index

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

The Effectiveness of Realistic Mathematics Education Approach on Ability of Students Mathematical Concept Understanding

The Effectiveness of Realistic Mathematics Education Approach on Ability of Students Mathematical Concept Understanding International Journal of Sciences: Basic and Applied Research (IJSBAR) ISSN 2307-4531 (Print & Online) http://gssrr.org/index.php?journal=journalofbasicandapplied ---------------------------------------------------------------------------------------------------------------------------

More information

On document relevance and lexical cohesion between query terms

On document relevance and lexical cohesion between query terms Information Processing and Management 42 (2006) 1230 1247 www.elsevier.com/locate/infoproman On document relevance and lexical cohesion between query terms Olga Vechtomova a, *, Murat Karamuftuoglu b,

More information

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach #BaselOne7 Deep search Enhancing a search bar using machine learning Ilgün Ilgün & Cedric Reichenbach We are not researchers Outline I. Periscope: A search tool II. Goals III. Deep learning IV. Applying

More information

Laboratorio di Intelligenza Artificiale e Robotica

Laboratorio di Intelligenza Artificiale e Robotica Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning

More information

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation School of Computer Science Human-Computer Interaction Institute Carnegie Mellon University Year 2007 Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation Noboru Matsuda

More information

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com

More information

SARDNET: A Self-Organizing Feature Map for Sequences

SARDNET: A Self-Organizing Feature Map for Sequences SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu

More information

Corrective Feedback and Persistent Learning for Information Extraction

Corrective Feedback and Persistent Learning for Information Extraction Corrective Feedback and Persistent Learning for Information Extraction Aron Culotta a, Trausti Kristjansson b, Andrew McCallum a, Paul Viola c a Dept. of Computer Science, University of Massachusetts,

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Nathaniel Hayes Department of Computer Science Simpson College 701 N. C. St. Indianola, IA, 50125 nate.hayes@my.simpson.edu

More information

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s)) Ohio Academic Content Standards Grade Level Indicators (Grade 11) A. ACQUISITION OF VOCABULARY Students acquire vocabulary through exposure to language-rich situations, such as reading books and other

More information

Georgetown University at TREC 2017 Dynamic Domain Track

Georgetown University at TREC 2017 Dynamic Domain Track Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain

More information

Evolutive Neural Net Fuzzy Filtering: Basic Description

Evolutive Neural Net Fuzzy Filtering: Basic Description Journal of Intelligent Learning Systems and Applications, 2010, 2: 12-18 doi:10.4236/jilsa.2010.21002 Published Online February 2010 (http://www.scirp.org/journal/jilsa) Evolutive Neural Net Fuzzy Filtering:

More information

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature 1 st Grade Curriculum Map Common Core Standards Language Arts 2013 2014 1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature Key Ideas and Details

More information

On-the-Fly Customization of Automated Essay Scoring

On-the-Fly Customization of Automated Essay Scoring Research Report On-the-Fly Customization of Automated Essay Scoring Yigal Attali Research & Development December 2007 RR-07-42 On-the-Fly Customization of Automated Essay Scoring Yigal Attali ETS, Princeton,

More information

Evidence for Reliability, Validity and Learning Effectiveness

Evidence for Reliability, Validity and Learning Effectiveness PEARSON EDUCATION Evidence for Reliability, Validity and Learning Effectiveness Introduction Pearson Knowledge Technologies has conducted a large number and wide variety of reliability and validity studies

More information

content First Introductory book to cover CAPM First to differentiate expected and required returns First to discuss the intrinsic value of stocks

content First Introductory book to cover CAPM First to differentiate expected and required returns First to discuss the intrinsic value of stocks content First Introductory book to cover CAPM First to differentiate expected and required returns First to discuss the intrinsic value of stocks presentation First timelines to explain TVM First financial

More information

Vietnam War Multiple Choice Quiz

Vietnam War Multiple Choice Quiz Vietnam War Multiple Quiz Free PDF ebook Download: Vietnam War Quiz Download or Read Online ebook vietnam war multiple choice quiz in PDF Format From The Best User Guide Database The Vietnam War: Backwards

More information

Types of curriculum. Definitions of the different types of curriculum

Types of curriculum. Definitions of the different types of curriculum Types of Definitions of the different types of Leslie Owen Wilson. Ed. D. Contact Leslie When I asked my students what means to them, they always indicated that it means the overt or written thinking of

More information

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability Shih-Bin Chen Dept. of Information and Computer Engineering, Chung-Yuan Christian University Chung-Li, Taiwan

More information

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR ROLAND HAUSSER Institut für Deutsche Philologie Ludwig-Maximilians Universität München München, West Germany 1. CHOICE OF A PRIMITIVE OPERATION The

More information

Practice Examination IREB

Practice Examination IREB IREB Examination Requirements Engineering Advanced Level Elicitation and Consolidation Practice Examination Questionnaire: Set_EN_2013_Public_1.2 Syllabus: Version 1.0 Passed Failed Total number of points

More information

The Language of Football England vs. Germany (working title) by Elmar Thalhammer. Abstract

The Language of Football England vs. Germany (working title) by Elmar Thalhammer. Abstract The Language of Football England vs. Germany (working title) by Elmar Thalhammer Abstract As opposed to about fifteen years ago, football has now become a socially acceptable phenomenon in both Germany

More information

Higher Education Six-Year Plans

Higher Education Six-Year Plans Higher Education Six-Year Plans 2018-2024 House Appropriations Committee Retreat November 15, 2017 Tony Maggio, Staff Background The Higher Education Opportunity Act of 2011 included the requirement for

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Analysis: Evaluation: Knowledge: Comprehension: Synthesis: Application:

Analysis: Evaluation: Knowledge: Comprehension: Synthesis: Application: In 1956, Benjamin Bloom headed a group of educational psychologists who developed a classification of levels of intellectual behavior important in learning. Bloom found that over 95 % of the test questions

More information

Towards a Collaboration Framework for Selection of ICT Tools

Towards a Collaboration Framework for Selection of ICT Tools Towards a Collaboration Framework for Selection of ICT Tools Deepak Sahni, Jan Van den Bergh, and Karin Coninx Hasselt University - transnationale Universiteit Limburg Expertise Centre for Digital Media

More information

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

Erkki Mäkinen State change languages as homomorphic images of Szilard languages Erkki Mäkinen State change languages as homomorphic images of Szilard languages UNIVERSITY OF TAMPERE SCHOOL OF INFORMATION SCIENCES REPORTS IN INFORMATION SCIENCES 48 TAMPERE 2016 UNIVERSITY OF TAMPERE

More information

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract

More information

Finding Translations in Scanned Book Collections

Finding Translations in Scanned Book Collections Finding Translations in Scanned Book Collections Ismet Zeki Yalniz Dept. of Computer Science University of Massachusetts Amherst, MA, 01003 zeki@cs.umass.edu R. Manmatha Dept. of Computer Science University

More information

Memory-based grammatical error correction

Memory-based grammatical error correction Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,

More information

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS ELIZABETH ANNE SOMERS Spring 2011 A thesis submitted in partial

More information

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011 Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011 Cristian-Alexandru Drăgușanu, Marina Cufliuc, Adrian Iftene UAIC: Faculty of Computer Science, Alexandru Ioan Cuza University,

More information

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh The Effect of Discourse Markers on the Speaking Production of EFL Students Iman Moradimanesh Abstract The research aimed at investigating the relationship between discourse markers (DMs) and a special

More information

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing D. Indhumathi Research Scholar Department of Information Technology

More information

On the Combined Behavior of Autonomous Resource Management Agents

On the Combined Behavior of Autonomous Resource Management Agents On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science

More information

How to Judge the Quality of an Objective Classroom Test

How to Judge the Quality of an Objective Classroom Test How to Judge the Quality of an Objective Classroom Test Technical Bulletin #6 Evaluation and Examination Service The University of Iowa (319) 335-0356 HOW TO JUDGE THE QUALITY OF AN OBJECTIVE CLASSROOM

More information