Using Synonyms for Author Recognition
|
|
- Clifford Stevenson
- 6 years ago
- Views:
Transcription
1 Using Synonyms for Author Recognition Abstract. An approach for identifying authors using synonym sets is presented. Drawing on modern psycholinguistic research, we justify the basis of our theory. Having formally defined the operations needed to algorithmically determine authorship, we present the results of applying our method to a corpus of classic literature. We argue that this technique of author recognition is both accurate as an author identification tool, as well as applicable to other domains in computer science such as speaker recognition. 1 Introduction and Motivation Current research in the area of Stylometry focuses on identifying idiosyncrasies in written literature to identify the author. We present a novel approach for identifying authors in written text that is also applicable to identifying speakers from automatically transcribed discourse. In this paper, we argue that an author s choice of lexicons used by an author or speaker when many synonyms are available is idiosyncratic to the point of providing identification. Modern psycholinguistic evidence indicates that this is a well-founded approach. Though previous methods of author attribution have focused on linguistic elements peculiar to written text, the use of synonyms does not require such elements as punctuation to be effective and so may be applied to automatically transcribed speech as well. Such a flexible approach is ideal for situations such as a Smart Home environment where requests made in natural language could originate from sources as varied as a personal computer to a mounted microphone. Traditional methods might require the use of both a vocal analyzer and a text-specific analyzer to identify the source. When using synonyms, no such dependence on disparate technologies is incurred. The ability to discern the source of a document or statement also has value in the area of knowledge acquisition. One goal of knowledge acquisition is to gain the most accurate knowledge possible. However, all sources of information are not equally reliable and so such a system could be designed not trust them equally. The first step toward this type of learning is the ability to correctly identify the source of a piece of knowledge (i.e. a document). The presented method of author recognition aims to be adaptable to all of these applications.
2 2 Related Work Author attribution is a well-studied area of artificial intelligence. Formal methods for determining authorship have even older roots. The field of Stylometry has thrived well before the turn of the twentieth century having several documented methods of analyzing texts to settle disputed authorship. Linguistic idiosyncrasies that have been identified as characteristic of an author include everything from counting keywords to analyzing punctuation. 2.1 Related Efforts in Stylometry The classification of documents by content (as opposed to by author) has been one area in which similar techniques have been employed. Keywords have shown to be very effective in this area by many studies. This idea has been taken further by those such as Paek [10], who used keywords in image descriptions to classify the content of the images. In the field of Stylometry, several algorithmic approaches have been applied to author attribution. For instance, Brinegar [3], Glover, and Hirst [5] used distributions of word length statistics to determine authorship while others including Morton [9] used sentence length for identification. The number of distinct words in a set of documents was studied by Holmes [7] and Thisted. [14] 2.2 Psycholinguistic Foundations In the development of this method, consideration was given to not only the empirical correctness of its results, but also to the theoretical foundations on which it is built. Current psycholinguistic research suggests that synonyms are a part of language that is affected by one s environment. Developed by the Cognitive Science Laboratory at Princeton University, the WordNet lexical database is itself rooted in modern psycholinguistic theory. [13] It is organized based largely upon how humans are believed to understand words. It is no coincidence that synonym sets are at the heart of WordNet. In Holland s 1991 study, when subjects were requested to think of as many synonyms as possible for a set of words, a small but significant priming effect was found. [6] That is to say, subjects were more likely to produce as synonyms words that they had previously seen. This suggests that the synonyms produced by an individual are a product of experience, something that is very unique indeed. Let us digress a moment and further pursue the concept of priming. Associative priming, the process by which one concept leads to another (e.g. a dog might cause one to think about a bone), is a result of spreading activation. [1] The way in which activation potentials spread through a neural network is dependent upon what connections have been made and how often they have been asserted. According to Hebbian brain theory, connections that fire together are more likely to fire together in the future. Thus, from the perspective of cognitive psychology, it makes sense that though
3 many synonyms may be activated by a concept, the word at the forefront of one s mind will be based on experience. The uniqueness of this experience and the corresponding uniqueness of synonym choice may be exploited to determine authorship. 3 Theory 3.1 Definitions As a simple formalization of this theory, let us begin by defining a set of authors α, which have been encountered by our system before any identification is processed. We then define λ i as the lexicon corresponding to author α i so that λ i is the vocabulary richness. Next, consider two functions which may be applied to a word w in λ: occ(w) and syn(w) where occ(w) is the number of times word w was encountered and syn(w) is the number of synonyms for x. Consider a threshold θ, which is the minimum number of synonyms that a word must have before to be considered an idiosyncratic identifier of an author. Note that it is the use of a sufficiently large value of θ that provides a reasonable running time for our algorithm. Now we define the filtered lexicon σ i as author α i s lexicon with words having sufficiently large set of synonyms such that σ i λ i where each word σ ij has syn(σ ij ) θ. Having considered the task of learning each author s style, we can examine the case of identifying the work of an unknown author α u where α u α. The heart of the algorithm is in the intersection of the filtered lexicon of the unknown author with that of all known authors: ρ i = σ i σ u. (1) Here, there are some special considerations to be made as to exactly how the intersection of these two lexicons is to be calculated. Since we have also associated a number of occurrences occ(w) with each element of σ, we need to determine how to evaluate this function for an intersection. For our purposes, let the number of occurrences for a word used by both authors i and j be occ( σ i σ j ) = min( occ(σ i ), occ(σ j ) ). (2) Finally, we calculate the match factor for each author: ρi µ(i,u) = occ( ρ ij ) syn( ρij ) (3) j= 1 The hypothetical author α i of the text corresponds to the maximum value of the match factor where n = α such that µ(i,u) = max( µ(0,u), µ(1,u),, µ(i,u), µ(n,u) ) (4)
4 3.2 Tractability Though upon first glance, calculating σ i σ u for every author in α may spark complexity concerns, we have already addressed the problem of tractability by means of θ. Recall that σ is a subset of λ and, as we will see, can be a very small subset indeed, if the threshold is set high enough. Even though it seems we are cutting many possible matches, our theory holds that for reasonably small θ, we are cutting less important words, since the author was constrained by the lexicon of the language when choosing a word with syn(w) < θ, due to a lack of synonyms. 3.3 Parallelization By means of the independence of the various stages of the training and, later, the discernment process, this algorithm is an excellent candidate for parallelization (see Fig. 1). Each author and, in fact, each document that is added to the training set α can be evaluated separately, leaving the merging of the sets to a central dispatcher. Since the string operations necessary to calculate the statistics are far more processor-intensive than calculating the union of said sets, a very high rate of speedup is possible. However, true to Amdahl s law, the stage of identifying the work of an unknown author is dependent on having all training data prepared. Still, each intersection σ i σ u and the subsequent calculations of µ(i,u) are independent of one another, giving yet another opportunity for parallelization. In the end, it is reasonable to distribute the calculations associated with each individual author to a separate parallel process. Fig. 1. An illustration of the major parts of the algorithm that may be run in parallel
5 4 Implementation 4.1 WordNet As our method requires the ability to determine the number of synonyms for a word, we chose to use WordNet 1 to accomplish this task. In development since 1985, WordNet is now the one of the foremost lexical resources in computational linguistics. With over 118,000 word forms, it encompasses a substantial portion of the English language. [13] In WordNet, each word is linked to one or more senses. These senses, in turn, can reside in synonym sets. Thus, we can find if a word shares at least one of its senses in common with another word making it, by definition, a synonym. Furthermore, this gives us the number of synonyms for a word by summing the number of unique members of a word s synonym sets. Though unimportant from a theoretical standpoint, it should be noted that the actual version of WordNet used in this research was not the traditional C library, but rather a normalized database format for PostgreSQL. 2 This allowed the number of synonyms for a word to be determined in the execution of a single SQL statement. 4.2 Corpus The texts used to train the system consisted of works of classic literature by Charles Dickens and William Shakespeare. As these texts are in the public domain, they are freely available for review by the curious reader. 3 Table 1. Texts included in our experimental corpus. The corpus was comprised of 286,898 words total Set Total Words Included Texts Dickens Train 65,157 Battle of Life, Chimes, To be Read at Dusk Dickens Test 80,832 Cricket on the Hearth, Three Ghost Stories, A Christmas Carol Shakespeare Train 73,863 Comedy of Errors, Hamlet, Romeo and Juliet Shakespeare Test 67,046 Julius Caesar, Henry V, Macbeth 1 WordNet 2.1 is available for download from the Princeton Cognitive Science Laboratory at 2 WordNet SQL Builder for WordNet 2.1, the application used to generate a PostgreSQL database of WordNet, is available for download at 3 The texts listed here are available from Project Gutenberg at
6 The selected works of Dickens includes the so-called Christmas Stories, which are A Christmas Carol, The Chimes, The Cricket on the Hearth, and the Battle of Life, all written within the same decade. The other of Dickens texts are short-stories, To be Read at Dusk and Three Ghost Stories. Shakespeare s writings that were analyzed are some of his more famous comedies and tragedies such as The Comedy of Errors, Hamlet, Romeo and Juliet, Julius Caesar, Henry V, and Macbeth. These works of Shakespeare span over a 16-year period of his writing career. In total, the corpus contained 286,898 words. The works of Shakespeare contributed 140,909 of the words while Dickens text constituted 145,989 words of the total. Each author used a relatively large vocabulary, though still a very small portion of the English language as a whole. Shakespeare s lexicon had over 13,000 words while Dickens had in excess of 12,000. The corpus was divided into four sections. Most obviously, the texts were grouped by author. Secondly, each author s texts were divided in half, resulting in groups of about 72,000±9% words. These sets were tasked as train and test where the train was used in acquiring λ, the characteristic lexicon of an author and a match factor µ(i,u) was then calculated for each train-test pair σ Train σ Test. 5 Results 5.1 Author Identification Results For our test data, the results showed that there is a well-defined difference in the match factor of a correct pair and an incorrect pair (see Fig. 2 and Fig. 3). This trend is consistent through all tested values of the threshold θ. In the cases of both Dickens and Shakespeare, the correct test set was matched with its corresponding test set. Moving beyond the simple fact that the system managed to produce the correct answers, let us take note of the margin by which the correct answer was ranked above the others. The strongest case in the set was that of the Shakespeare training set versus the two test sets. With the difference between the matching and non-matching set being roughly 17%, there is little question that this test run was a success. In the case of the Dickens training set run against the test sets, the answer is less confident with a 10% difference between the matching and non-matching values. However, upon further consideration, it makes sense that the authors should have a good deal of their vocabulary in common. Though the unique qualities of an author s style may be subtle, we still have the ability to detect these subtleties and exploit them to determine authorship.
7 Match Factor 250,000 Percent Difference 200, , ,000 50,000 Shakespeare Train - Shakespeare Test Shakespeare Train - Dickens Test Shakespeare Test Dickens Test Threshold Value Fig. 2. The match factor µ(i,u) correctly correlates each test set with the training set written by the same author. Thus, given a set known to be by a certain author, the system can discern one author from another Percent Difference in Match Factors 25% Percent Difference 20% 15% 10% 5% Shakespeare Train vs Two Test Sets Dickens Train vs Two Test Sets 0% Threshold Value Fig. 3. By applying sufficiently large values of the synonym threshold θ, the number of synonym matches, and therefore the computational overhead, is greatly reduced
8 Number of Synonym Matches Number of Synonym Matches 2,000 1,800 1,600 1,400 1,200 1, Threshold Value Shakespeare Train - Shakespeare Test Shakespeare Train - Dickens Test Shakespeare Test Dickens Test Fig. 4. By applying sufficiently large values of the synonym threshold θ, the number of synonym matches, and therefore the computational overhead, is greatly reduced. Due to our particular interest in words with large numbers of synonyms, the cutting of words with small synonym sets does not negatively effect the algorithm s ability to distinguish authors from one another Reductions in Synonym Matches Reduction in Synonym Matches 100% 95% 90% 85% 80% 75% Threshold Value Shakespeare Train - Shakespeare Test Shakespeare Train - Dickens Test Shakespeare Test Dickens Test Fig. 5. Even moderate values of θ yield in excess of 90% reduction in the synonym matches between authors
9 5.2 Threshold Performance Results As expected, the cutting of words having few synonyms was successful in reducing the size of ρ for each author. In the case of Shakespeare s works, there were over 11,900 unique words. Yet, after filtering out words via a threshold of 30, only 539 words had to be examined, giving us a 94% overall reduction in the size of ρ. (see Fig. 4 and Fig. 5). The fact that the threshold was so successful in identifying those words that were useful and discarding others is key in making this algorithm tractable for larger sets of authors. Even higher values of θ do not negatively impact the accuracy of discerning authorship. 6 Future Work 6.1 Multilingual Implementation The next step in our work is to determine if it will be successful in other languages. Since the only language dependent part of our algorithm is WordNet, we simply need to query a lexical database for our target language. Currently, there is much work being done on the creation and automatic generation of WordNet databases for various languages. One such implementation is MEANING, a project using the web as a huge corpus in an effort to build a multilingual WordNet. Already, the project has produced a web-based interface to view their progress Integration with an Automatic Speech Transcriber One key point of this algorithm is that it is not dependent upon the peculiarities of the written word. That is, the system is not hindered by the punctuation generated by an automatic speech transcription system. Provided the transcription system can accurately generate text from speech, the results of our method of author recognition should seamlessly carry over to speaker recognition. One possible implementation of this involves Carnegie Mellon s Sphinx Live Decoder. Using Hidden Markov Models, this program has the ability to transcribe voice input to text on the fly. [8] This type of system would be ideal for situations in which input could come from disparate sources such as keyboards and microphones. 4 The demo of MEANING project can be found at The web-based demo can be accessed from this page.
10 6.3 Integration in to a Smart Home Environment In a Smart Home environment, it is key to provide a truly natural experience for the home s inhabitants. Thus, it follows that the Smart Home should be able to take vocal requests and that it should be able to discern who is making these requests. This will allow the system s responses to be more appropriate. For example, an truly intelligent Smart Home would not be likely to comply with a three-year-old child s demand for ten dozen cookies. A author-speaker recognition system using the methods proposed here is the first step in making this scenario a reality. 7 Conclusions Research using synonyms to recognize authors shows much promise for the future. Being grounded in psycholinguistic theory, we can be confident that it has a solid foundation on which future work can be built. Furthermore, the fact that this method can be optimized using higher threshold values and distributed processing allows it to be used in situations where running time is a consideration. Having displayed its ability to accurately identify authors for this domain, we look forward to applying our new theory to more areas of usage. References 1. Anderson, J.R. Cognitive Psychology and its Implications. New York: W.H. Freeman and Company. (1995) 2. Baayen, H., van Halteren, H., Neijt, A., Tweedie, F. An experiment in authorship attribution. JADT 2002: 6es Journees internationales d Analyse stastique des Donnees Textuelles. (2002) 3. Brinegar, C. Mark Twain and the Quintus Curtius Snodgrass Letters: A Statistical Test of Authorship. Journal of the American Statistical Association, 58, (1963) 4. Chaski, C. (2005). Who s At the Keyboard? Recent Results in Authorship Attribution. International Journal of Digital Evidence. Spring (2005) 5. Glover, A. and Hirst, G. Detecting stylistic inconsistencies in collaborative writing. In Sharples, Mike and van der Geest, Thea (eds.), The new writing environment: Writers at work in a world of technology. London: Springer-Verlag. (1996) 6. Holland, Cynthia Rose. Does synonym priming exist on a word completion task? Doctoral Thesis, Case Western Reserve University, Psychology. (1992) 7. Holmes, D. Authorship Attribution. Computers and the Humanities, 28, Kluwer Academic Publishers, Netherlands. (1994) 8. Huang, X. et al. The Sphinx-II Speech Recognition System. Computer Speech and Language. (1993) 9. Morton, A. The Authorship of Greek Prose. Journal of the Royal Statistical Society (A), 128, (1965) 10. Paek, S. et al. Integration of Visual and Text-Based Approaches for the Content Labeling and Classication of Photographs. ACM SIGIR. (1999)
11 11. Reiter, E. and S. Sripada. Contextual Influences on Near-Synonym Choice. Proceedings of INLG-2004, pages (2004) 12. Kaster, A., Siersdorfer, S., Gerhard, W. (2005). Combining Text and Linguistic Document Representations for Authorship Attribution. SIGIR Workshop: Stylistic Analysis of Text for Information Access (STYLE), Salvador, Bahia, Brazil. (2005) 13. Miller, George A. WordNet: A Lexical Database for English. Communications of the ACM. November 1995/Vol.38, No. 11 (1995) 14. Thisted, R. and Efron, B. Did Shakespeare Write a Newly-discovered Poem? Biometrika, 74, , (1987) 15. Uzuner, Ö., Katz, B. (2005). A Comparative Study of Language Models for Book and Author Recognition. Lecture Notes in Computer Science. Volume 3651/2005, pp Springer-Verlag GmbH. (2005)
A Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationNotes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1
Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial
More informationWE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT
WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationAQUA: An Ontology-Driven Question Answering System
AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationTITLE: Shakespeare: The technical words. DATE(S): Project will run for four weeks during June or July
PROJECT: CulpeperSprint1 TITLE: Shakespeare: The technical words SUPERVISOR(S): Prof. Jonathan Culpeper DATE(S): Project will run for four weeks during June or July JOB DESCRIPTION: This project focuses
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationMatching Similarity for Keyword-Based Clustering
Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web
More informationIterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages
Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer
More informationReducing Features to Improve Bug Prediction
Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science
More informationIntra-talker Variation: Audience Design Factors Affecting Lexical Selections
Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and
More informationOPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS
OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,
More informationHuman Emotion Recognition From Speech
RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati
More informationAn Empirical and Computational Test of Linguistic Relativity
An Empirical and Computational Test of Linguistic Relativity Kathleen M. Eberhard* (eberhard.1@nd.edu) Matthias Scheutz** (mscheutz@cse.nd.edu) Michael Heilman** (mheilman@nd.edu) *Department of Psychology,
More informationNCEO Technical Report 27
Home About Publications Special Topics Presentations State Policies Accommodations Bibliography Teleconferences Tools Related Sites Interpreting Trends in the Performance of Special Education Students
More informationSeminar - Organic Computing
Seminar - Organic Computing Self-Organisation of OC-Systems Markus Franke 25.01.2006 Typeset by FoilTEX Timetable 1. Overview 2. Characteristics of SO-Systems 3. Concern with Nature 4. Design-Concepts
More informationWhat s in a Step? Toward General, Abstract Representations of Tutoring System Log Data
What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data Kurt VanLehn 1, Kenneth R. Koedinger 2, Alida Skogsholm 2, Adaeze Nwaigwe 2, Robert G.M. Hausmann 1, Anders Weinstein
More informationPrentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Correlated to Nebraska Reading/Writing Standards (Grade 10)
Prentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Nebraska Reading/Writing Standards (Grade 10) 12.1 Reading The standards for grade 1 presume that basic skills in reading have
More informationRule Learning with Negation: Issues Regarding Effectiveness
Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX
More informationLiterature and the Language Arts Experiencing Literature
Correlation of Literature and the Language Arts Experiencing Literature Grade 9 2 nd edition to the Nebraska Reading/Writing Standards EMC/Paradigm Publishing 875 Montreal Way St. Paul, Minnesota 55102
More informationThe Strong Minimalist Thesis and Bounded Optimality
The Strong Minimalist Thesis and Bounded Optimality DRAFT-IN-PROGRESS; SEND COMMENTS TO RICKL@UMICH.EDU Richard L. Lewis Department of Psychology University of Michigan 27 March 2010 1 Purpose of this
More informationDetecting English-French Cognates Using Orthographic Edit Distance
Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationKnowledge based expert systems D H A N A N J A Y K A L B A N D E
Knowledge based expert systems D H A N A N J A Y K A L B A N D E What is a knowledge based system? A Knowledge Based System or a KBS is a computer program that uses artificial intelligence to solve problems
More informationAn Interactive Intelligent Language Tutor Over The Internet
An Interactive Intelligent Language Tutor Over The Internet Trude Heift Linguistics Department and Language Learning Centre Simon Fraser University, B.C. Canada V5A1S6 E-mail: heift@sfu.ca Abstract: This
More informationOn Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC
On Human Computer Interaction, HCI Dr. Saif al Zahir Electrical and Computer Engineering Department UBC Human Computer Interaction HCI HCI is the study of people, computer technology, and the ways these
More informationChapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard
Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA Alta de Waal, Jacobus Venter and Etienne Barnard Abstract Most actionable evidence is identified during the analysis phase of digital forensic investigations.
More informationEXTENSIVE READING AND CLIL (GIOVANNA RIVEZZI) Liceo Scientifico e Linguistico E. Bérard Aosta
EXTENSIVE READING AND CLIL (GIOVANNA RIVEZZI) Liceo Scientifico e Linguistico E. Bérard Aosta LICEO SCIENTIFICO E LINGUISTICO E. BÉRARD AOSTA School year 2013-2014: Liceo scientifico: 438 students Liceo
More informationOn-Line Data Analytics
International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob
More informationWeb as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics
(L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes
More informationEvolution of Symbolisation in Chimpanzees and Neural Nets
Evolution of Symbolisation in Chimpanzees and Neural Nets Angelo Cangelosi Centre for Neural and Adaptive Systems University of Plymouth (UK) a.cangelosi@plymouth.ac.uk Introduction Animal communication
More informationEntrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany
Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Jana Kitzmann and Dirk Schiereck, Endowed Chair for Banking and Finance, EUROPEAN BUSINESS SCHOOL, International
More informationBUILD-IT: Intuitive plant layout mediated by natural interaction
BUILD-IT: Intuitive plant layout mediated by natural interaction By Morten Fjeld, Martin Bichsel and Matthias Rauterberg Morten Fjeld holds a MSc in Applied Mathematics from Norwegian University of Science
More informationPrentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9)
Nebraska Reading/Writing Standards, (Grade 9) 12.1 Reading The standards for grade 1 presume that basic skills in reading have been taught before grade 4 and that students are independent readers. For
More informationRote rehearsal and spacing effects in the free recall of pure and mixed lists. By: Peter P.J.L. Verkoeijen and Peter F. Delaney
Rote rehearsal and spacing effects in the free recall of pure and mixed lists By: Peter P.J.L. Verkoeijen and Peter F. Delaney Verkoeijen, P. P. J. L, & Delaney, P. F. (2008). Rote rehearsal and spacing
More informationEffect of Word Complexity on L2 Vocabulary Learning
Effect of Word Complexity on L2 Vocabulary Learning Kevin Dela Rosa Language Technologies Institute Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA kdelaros@cs.cmu.edu Maxine Eskenazi Language
More informationlearning collegiate assessment]
[ collegiate learning assessment] INSTITUTIONAL REPORT 2005 2006 Kalamazoo College council for aid to education 215 lexington avenue floor 21 new york new york 10016-6023 p 212.217.0700 f 212.661.9766
More informationMaximizing Learning Through Course Alignment and Experience with Different Types of Knowledge
Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February
More informationMETHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS
METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS Ruslan Mitkov (R.Mitkov@wlv.ac.uk) University of Wolverhampton ViktorPekar (v.pekar@wlv.ac.uk) University of Wolverhampton Dimitar
More informationProcedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing
Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 141 ( 2014 ) 124 128 WCLTA 2013 Using Corpus Linguistics in the Development of Writing Blanka Frydrychova
More informationSouth Carolina English Language Arts
South Carolina English Language Arts A S O F J U N E 2 0, 2 0 1 0, T H I S S TAT E H A D A D O P T E D T H E CO M M O N CO R E S TAT E S TA N DA R D S. DOCUMENTS REVIEWED South Carolina Academic Content
More informationAn Online Handwriting Recognition System For Turkish
An Online Handwriting Recognition System For Turkish Esra Vural, Hakan Erdogan, Kemal Oflazer, Berrin Yanikoglu Sabanci University, Tuzla, Istanbul, Turkey 34956 ABSTRACT Despite recent developments in
More informationLEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE
LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE Submitted in partial fulfillment of the requirements for the degree of Sarjana Sastra (S.S.)
More informationEXECUTIVE SUMMARY. Online courses for credit recovery in high schools: Effectiveness and promising practices. April 2017
EXECUTIVE SUMMARY Online courses for credit recovery in high schools: Effectiveness and promising practices April 2017 Prepared for the Nellie Mae Education Foundation by the UMass Donahue Institute 1
More informationA Case-Based Approach To Imitation Learning in Robotic Agents
A Case-Based Approach To Imitation Learning in Robotic Agents Tesca Fitzgerald, Ashok Goel School of Interactive Computing Georgia Institute of Technology, Atlanta, GA 30332, USA {tesca.fitzgerald,goel}@cc.gatech.edu
More informationQuickStroke: An Incremental On-line Chinese Handwriting Recognition System
QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents
More informationcorrelated to the Nebraska Reading/Writing Standards Grades 9-12
correlated to the Nebraska Reading/Writing Standards Grades 9-12 CONTENTS CORRELATION: Grade 9... 1 Grade 10...21 Grade 11..39 Grade 12..58 McDougal Littell The Language of Literature correlated to the
More informationLarge vocabulary off-line handwriting recognition: A survey
Pattern Anal Applic (2003) 6: 97 121 DOI 10.1007/s10044-002-0169-3 ORIGINAL ARTICLE A. L. Koerich, R. Sabourin, C. Y. Suen Large vocabulary off-line handwriting recognition: A survey Received: 24/09/01
More informationData Integration through Clustering and Finding Statistical Relations - Validation of Approach
Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Marek Jaszuk, Teresa Mroczek, and Barbara Fryc University of Information Technology and Management, ul. Sucharskiego
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationOntologies vs. classification systems
Ontologies vs. classification systems Bodil Nistrup Madsen Copenhagen Business School Copenhagen, Denmark bnm.isv@cbs.dk Hanne Erdman Thomsen Copenhagen Business School Copenhagen, Denmark het.isv@cbs.dk
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationMulti-Lingual Text Leveling
Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency
More informationLearning From the Past with Experiment Databases
Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University
More informationThe MEANING Multilingual Central Repository
The MEANING Multilingual Central Repository J. Atserias, L. Villarejo, G. Rigau, E. Agirre, J. Carroll, B. Magnini, P. Vossen January 27, 2004 http://www.lsi.upc.es/ nlp/meaning Jordi Atserias TALP Index
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationThe Good Judgment Project: A large scale test of different methods of combining expert predictions
The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania
More informationThe Effectiveness of Realistic Mathematics Education Approach on Ability of Students Mathematical Concept Understanding
International Journal of Sciences: Basic and Applied Research (IJSBAR) ISSN 2307-4531 (Print & Online) http://gssrr.org/index.php?journal=journalofbasicandapplied ---------------------------------------------------------------------------------------------------------------------------
More informationOn document relevance and lexical cohesion between query terms
Information Processing and Management 42 (2006) 1230 1247 www.elsevier.com/locate/infoproman On document relevance and lexical cohesion between query terms Olga Vechtomova a, *, Murat Karamuftuoglu b,
More informationDeep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach
#BaselOne7 Deep search Enhancing a search bar using machine learning Ilgün Ilgün & Cedric Reichenbach We are not researchers Outline I. Periscope: A search tool II. Goals III. Deep learning IV. Applying
More informationLaboratorio di Intelligenza Artificiale e Robotica
Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning
More informationPredicting Students Performance with SimStudent: Learning Cognitive Skills from Observation
School of Computer Science Human-Computer Interaction Institute Carnegie Mellon University Year 2007 Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation Noboru Matsuda
More informationPredicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks
Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com
More informationSARDNET: A Self-Organizing Feature Map for Sequences
SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu
More informationCorrective Feedback and Persistent Learning for Information Extraction
Corrective Feedback and Persistent Learning for Information Extraction Aron Culotta a, Trausti Kristjansson b, Andrew McCallum a, Paul Viola c a Dept. of Computer Science, University of Massachusetts,
More informationAustralian Journal of Basic and Applied Sciences
AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean
More informationDefragmenting Textual Data by Leveraging the Syntactic Structure of the English Language
Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Nathaniel Hayes Department of Computer Science Simpson College 701 N. C. St. Indianola, IA, 50125 nate.hayes@my.simpson.edu
More informationPAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))
Ohio Academic Content Standards Grade Level Indicators (Grade 11) A. ACQUISITION OF VOCABULARY Students acquire vocabulary through exposure to language-rich situations, such as reading books and other
More informationGeorgetown University at TREC 2017 Dynamic Domain Track
Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain
More informationEvolutive Neural Net Fuzzy Filtering: Basic Description
Journal of Intelligent Learning Systems and Applications, 2010, 2: 12-18 doi:10.4236/jilsa.2010.21002 Published Online February 2010 (http://www.scirp.org/journal/jilsa) Evolutive Neural Net Fuzzy Filtering:
More information1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature
1 st Grade Curriculum Map Common Core Standards Language Arts 2013 2014 1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature Key Ideas and Details
More informationOn-the-Fly Customization of Automated Essay Scoring
Research Report On-the-Fly Customization of Automated Essay Scoring Yigal Attali Research & Development December 2007 RR-07-42 On-the-Fly Customization of Automated Essay Scoring Yigal Attali ETS, Princeton,
More informationEvidence for Reliability, Validity and Learning Effectiveness
PEARSON EDUCATION Evidence for Reliability, Validity and Learning Effectiveness Introduction Pearson Knowledge Technologies has conducted a large number and wide variety of reliability and validity studies
More informationcontent First Introductory book to cover CAPM First to differentiate expected and required returns First to discuss the intrinsic value of stocks
content First Introductory book to cover CAPM First to differentiate expected and required returns First to discuss the intrinsic value of stocks presentation First timelines to explain TVM First financial
More informationVietnam War Multiple Choice Quiz
Vietnam War Multiple Quiz Free PDF ebook Download: Vietnam War Quiz Download or Read Online ebook vietnam war multiple choice quiz in PDF Format From The Best User Guide Database The Vietnam War: Backwards
More informationTypes of curriculum. Definitions of the different types of curriculum
Types of Definitions of the different types of Leslie Owen Wilson. Ed. D. Contact Leslie When I asked my students what means to them, they always indicated that it means the overt or written thinking of
More informationDeveloping True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability
Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability Shih-Bin Chen Dept. of Information and Computer Engineering, Chung-Yuan Christian University Chung-Li, Taiwan
More informationCOMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR
COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR ROLAND HAUSSER Institut für Deutsche Philologie Ludwig-Maximilians Universität München München, West Germany 1. CHOICE OF A PRIMITIVE OPERATION The
More informationPractice Examination IREB
IREB Examination Requirements Engineering Advanced Level Elicitation and Consolidation Practice Examination Questionnaire: Set_EN_2013_Public_1.2 Syllabus: Version 1.0 Passed Failed Total number of points
More informationThe Language of Football England vs. Germany (working title) by Elmar Thalhammer. Abstract
The Language of Football England vs. Germany (working title) by Elmar Thalhammer Abstract As opposed to about fifteen years ago, football has now become a socially acceptable phenomenon in both Germany
More informationHigher Education Six-Year Plans
Higher Education Six-Year Plans 2018-2024 House Appropriations Committee Retreat November 15, 2017 Tony Maggio, Staff Background The Higher Education Opportunity Act of 2011 included the requirement for
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationAnalysis: Evaluation: Knowledge: Comprehension: Synthesis: Application:
In 1956, Benjamin Bloom headed a group of educational psychologists who developed a classification of levels of intellectual behavior important in learning. Bloom found that over 95 % of the test questions
More informationTowards a Collaboration Framework for Selection of ICT Tools
Towards a Collaboration Framework for Selection of ICT Tools Deepak Sahni, Jan Van den Bergh, and Karin Coninx Hasselt University - transnationale Universiteit Limburg Expertise Centre for Digital Media
More informationErkki Mäkinen State change languages as homomorphic images of Szilard languages
Erkki Mäkinen State change languages as homomorphic images of Szilard languages UNIVERSITY OF TAMPERE SCHOOL OF INFORMATION SCIENCES REPORTS IN INFORMATION SCIENCES 48 TAMPERE 2016 UNIVERSITY OF TAMPERE
More informationMULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY
MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract
More informationFinding Translations in Scanned Book Collections
Finding Translations in Scanned Book Collections Ismet Zeki Yalniz Dept. of Computer Science University of Massachusetts Amherst, MA, 01003 zeki@cs.umass.edu R. Manmatha Dept. of Computer Science University
More informationMemory-based grammatical error correction
Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,
More informationTHE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS
THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS ELIZABETH ANNE SOMERS Spring 2011 A thesis submitted in partial
More informationDetecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011
Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011 Cristian-Alexandru Drăgușanu, Marina Cufliuc, Adrian Iftene UAIC: Faculty of Computer Science, Alexandru Ioan Cuza University,
More information2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases
POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationTwitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationThe Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh
The Effect of Discourse Markers on the Speaking Production of EFL Students Iman Moradimanesh Abstract The research aimed at investigating the relationship between discourse markers (DMs) and a special
More informationFragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing
Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing D. Indhumathi Research Scholar Department of Information Technology
More informationOn the Combined Behavior of Autonomous Resource Management Agents
On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science
More informationHow to Judge the Quality of an Objective Classroom Test
How to Judge the Quality of an Objective Classroom Test Technical Bulletin #6 Evaluation and Examination Service The University of Iowa (319) 335-0356 HOW TO JUDGE THE QUALITY OF AN OBJECTIVE CLASSROOM
More information