A Comparison of Two Text Representations for Sentiment Analysis

Size: px
Start display at page:

Download "A Comparison of Two Text Representations for Sentiment Analysis"

Transcription

1 010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational Software Guangzhou University Guangzhou, China xiong.ong@gmail.com Andy Dong Design Lab University of Sydney Sydney, Australia a.dong@arch.usyd.edu.au Abstract This paper compares two representations of text within the same experimental setting for sentiment orientation analysis, and in particular focuses on the sensitivity of the analysis to sentence. The two representations compared in this paper are bag-of-words (BoW) and nine dimensional vector (9Dim). The former represents text with a high dimensional feature vector, which ignores grammatical structure and is lexicon-dependent. In contrast, the 9Dim representation encodes grammatical knowledge of clauses in sentences into a compact nine dimensional vector, which is lexicon-independent. Text is composed by multiple sentences since the grammatical structure of a single sentence or clause may not provide sufficient information for sentiment orientation classification. A convenient way to enrich grammatical knowledge in a text is to compose the text with multi-sentences, thereby ening the sample. We consider the of text is an important factor in text classification. The aim of this paper is to demonstrate how text sentiment orientation classifiers performance is improved when the of the sentence comprising a training vector is varied. The experimental results indicated that the accuracy of the classifiers benefits from the increasing of the text s, and the results also illustrated that the 9Dim method can provide comparable results to BoW under the same sentiment classification algorithm, support vector machines (SVM). Keywords-sentiment analysis; text representations; bagofwords; 9Dim I. INTRODUCTION Sentiment analysis concentrates on classifying documents according to their opinion and emotions expressed by their authors. Judging a document's orientation as positive or negative is a common two-class problem in sentiment analysis [1-3], which is also known as sentiment orientation analysis in text classification. With the expansion of internet, text classification has been found to be helpful in many respects. In some on-line e- commerce companies' websites, visitors or customers are encouraged to leave their comments or feedback. Summarising these comments or feedback with sentiment orientation analysis technology would help the websites to stimulate suppliers and interest potential suppliers and customers, or even appeal to value-adding [3]. Text classification is also helpful in information retrieval [4] and could be applied to recognise and filter spam. Furthermore, by analysing on-line communication such as on-line forums, chat rooms, newsgroups, and Dark Web [5], sentiment orientation analysis could provide support in tracking extremist groups, terrorists and hate groups []. In this paper, we investigate the effectiveness of adopting the support vector machines classification algorithm to the sentiment orientation analysis problem. An interesting issue of this problem is the relationship between the performance of text classifier and the of training text. For example, what is the optimal for a training text example? With that, a trained text classifier for sentiment orientation analysis would perform with the best performance and less memory space. Thus, apart from showing the results from the two text representation methods, we also analyse this issue to acquire a deeper understanding of the results we obtained. The remainder of this paper is organised as follows. Section II presents a review of related works in sentiment orientation analysis. Section III will identify research gaps and questions. Data collection and processing of design text is described in Section IV. Section V presents experiments used to compare the BoW and 9Dim methods. Section VI concludes with closing remarks. II. RELATED WORK In recent studies, the BoW text representation method has enjoyed much attention and achieved outstanding performance in sentiment analysis of semantic orientation in natural language []. The BoW representation expresses text without considering word order or word usage. BoW represents a document in this way: initially, a feature list (wordlist) is composed with all the words in the corpus the document belongs to. A numerical vector is created to represent the given document; each entry of the vector corresponds to a word which is contained in the wordlist. The vector is initialised as a zero-vector when a word is contained in both the document and the wordlist. The corresponding entry of the vector is labelled with 1 for the appearance, or a higher positive integer to indicate the frequency of the word in the document. The BoW method has been successfully adopted in both natural language processing [6, 7] and computer vision [8]. Let us take the following example from a document with two clauses (documents): (A) It is a great masterpiece. and (B) Martin is a good designer. The composed wordlist is {Martin, It, is, a, /$6.00 C 010 IEEE V11-35

2 010 International Conference on Computer Application and System Modeling (ICCASM 010) good, great, designer, masterpiece}. The representation vector for clause A is [ ], and [ ] is for clause B. Pang [3] demonstrated a system with the BoW representation and the common machine learning classification algorithm support vector machines (SVM) as a two-class classification problem. They compared Naїve Bayes, Maximum Entropy Classification and SVM classification techniques, machine learning methods known to be successful at topic classification tasks, to the semantic orientation of movie reviews. Within the theory of systemicfunctional linguistics, Martin and White [9] provide a rigorous, network-based model for sentiment, which linguists characterise as the construal of emotions and interpersonal relations in language. The model has been partially implemented [10] and the performance has improved about 7% compared to results reported by other researchers on the same data set, movie reviews. Given the superior performance of support vector machines in sentiment orientation analysis, we have opted to use support vector machines as the machine learning formalism for sentiment orientation analysis. The issue becomes one of ascertaining the best representation of the text for the machine learning algorithm. In our previous research [11, 1], we show how to represent a document by a nine dimension numerical vector (9Dim), which encodes grammatical knowledge and taxonomical semantic information of being about activities (Process), about objects (Product) or about agents (People), but is otherwise lexicon independent. The structure of the 9Dim vector is shown as follows: [ PCN PCV PCA PDN PDV PDA PPN PPV PPA] where PC = Process, PD = Product, PP = People, N = Noun, V = verb, and A = adjective/adverb. According to the noun's pertinence ratio with Process, Product or People in a design context [11, 13], a weight is distributed into each of these three categories. This pertinence ratio is a 3Dim numerical vector and named as K. For each rated sentence, part-of-speech tagging [14] will provide the phrase structure trees and typed dependency in order to obtain the grammatical relationships. A noun-based clustering algorithm is then applied. The basic idea is to identify every noun in a sentence and put all verbs and modifiers (adjectives and adverbs) connected to the noun together with it. Each noun is looked up (queried) in the WordNet [15] lexicographer database to ascertain the logical grouping that might indicate the appropriate category (Product, Process, People) for the word. The WordNet lexicographer database and their syntactic category and logical groupings were used to categorise words (nouns) as being about Product, Process or People. Verbs, adjectives and adverbs are categorised according to the category(ies) of the noun they relate to grammatically. These clusters of syntactically related words are called word groups. For the noun in each word group, rules were applied to identify which of the WordNet logical groupings would contain nouns in the categories [17]. Two correction factors are multiplied with the count of the frequency of occurrence of a word in the target clause applied: K, which is inversely 1 proportional to the number of possible Process-Product- People categories a WordNet logical grouping can belong to; and K mentioned above, since the correction factor K for a word may have up to three value, it is normally expressed as a vector of the form K ( word) = [ K, PC K ],, PD K,, PP,. The semantic orientation (SO) of the words in each word group is calculated using the SO-PMI measure, which is in turn based on their pointwise mutual information (PMI) [14]. The strategy for calculating the SO-PMI is to calculate the log-odds ((1)) of a canonical basket of positive (Pwords) or negative (Nwords) words appearing with the target word on the assumption that if the canonical good or bad word appears frequently with the target word then the target word has a similar semantic orientation. The log odds that two words co-occur: p( word1& word) PMI IR( word1, word) = log (1) pword ( 1) pword ( ) In this study, we used a Google query with the NEAR operator to look up the co-occurrence of the target word with the canonical basket of positive and negative words. The SO- PMI based on the NEAR operator is described by (). The semantic orientation of word based on mutual cooccurrence with a canonical basket of positive and negative words: Π pword Pwordshits( wordnearpword ) Π nword Nwords hits( nword ) SO PMI ( word ) = log Π ( ) Π ( ) nword Nwordshits pword nword Nwords hits wordnearnword () We selected a basket of 1 canonical positive and negative words. Adjectives and adverbs were selected based on most frequent occurrence in written and spoken English according to the British National Corpus [11]. Because this list is published separately, we joined both lists and ordered them by frequency per million words. We selected only those adjectives and adverbs which were judged positive or negative modifiers according to the General Inquirer corpus [ The basis for the selection of these frequently occurring words as the canonical words is the increased likelihood of finding documents which contain both the canonical word and the word for which the PMI IR is being calculated. This increases the accuracy of the SO-PMI measurement. Table I lists the canonical Pwords and Nwords and their frequency per million words. The SO-PMI of all unigrams (noun, verb, modifiers) in the target lexicon are pre-calculated and saved in a database to speed up the analysis. A rated sentence is processed with both of grammatical and semantic analyses. When all word-clusters in a sentence are processed, a complete 9-dimensional vector is generated. For detailed description and implementation of a complete 9- dimensional vector, please refer to one of our previous papers [11]. V11-36

3 010 International Conference on Computer Application and System Modeling (ICCASM 010) TABLE I. CANONICAL POSITIVE AND NEGATIVE WORDS Positive Words Negative Words good (176) bad (64) well (1119) difficult (0) great (635) dark (104) important (39) cold (103) able (304) cheap (68) III. clear (39) dangerous (58) RESEARCH GAPS AND QUESTIONS Based on the literature review in previous research, we have identified several important research gaps. The BoW method ignores semantic knowledge about words and grammar. It relies on a very high-dimension representation that hinges on training a system on a text domain which contains a high coverage of words that are likely to appear in the target corpus. On the other hand, the 9Dim representation embeds grammatical knowledge in a lower dimension vector, which is a lexicon-independent representation. It abstracts lexical knowledge toward potential sentiment content and does not need a connecting lexicon between the training corpora and the target corpora. The differences between the BoW representation and 9Dim create the following research questions. First, does the lower dimensional representation method cost less memory in implementation? If that is the case, that means we can get a text classifier with lower cost memory and comparable ability as its counterpart with higher dimension representation. Secondly, 9Dim is a sentence-based representation for text, which means a single 9Dim vector comprised of data from multiple sentences can be treated as a training example. Intuitively, the information contained by a simple sentence is less than that embedded in a complex sentence. Text is known to affect text classification. Researchers indicated that the of sentence is an important fact in text classification when each vector represents one sentence [16]. If so, is the of text (paragraph or multi-sentences) that is represented in a single training vector an important factor in text classification as well? IV. DATA COLLECTION AND PROCESSING This section presents the data collection process and processing methods which are used to produce the tagged data for the training and validation of the computational system. To conduct this research, it is necessary to create labelled design text. In this research, we studied text from creative industries engaged in design. That is, we needed to create a new data set consisting of text about design works, the process of designing, and designers, which were labelled for semantic orientation and category. By adopting a popular way in computational linguistics [3] to create data sets, a cohort of three native English speakers with a background in a design-related discipline (e.g., engineering, architecture, and computer science) was tasked with reading and categorising various design texts. The texts included formal and informal design text from various on-line sources and across various design-related disciplines. All design texts were collected by the author. Each coder was paid to classify the texts. The rating cohorts were trained to identify the proper category and its semantic orientation according to the context. Training lasted for one hour. During coding, two of the three coders had to agree on the semantic meaning (category), semantic orientation (orientation), and the value of the orientation, that is, positive or negative. Working in two-hour time blocks, the coders read various design texts, including formal design reports, reviews of designed works, reviews of designers, and transcripts of conversations of designers working together. After the rated text data was collected, spell-checked and grammar-checked, the data was saved in a sentence pool for composing training data and testing data. Data sets generated in this way could be statistically significant, but the difference was small enough to be utterly unimportant. V. ANALYSIS AND EXPERIMENTS RESULTS This section analyses the space complexity of the BoW and the 9Dim methods, then presents the results of the experiments on the appraisal system with different data sets. To compare the two representations' memory cost, the standard method is to compare the space complexity of both. Space complexity is the limiting behavior of the use of memory space of an algorithm when the size of the problem goes to infinity [17]. As discussed before, the implementation of the BoW consists of two steps: 1) compose vector for each paragraph with the wordlist; and ) train and validate a SVM classifier with the represented vectors. For the first step, the space complexity is dependent on the of the wordlist. If it is feature, then the space complexity is O( feature ). For the second step, the space complexity is dependent on light implementation of SVM. SVM is the implementation light adopted in this study; the time complexity of SVM is On ( ) [18], where n is the number of training examples. The total space complexity for the BoW implementation is O = O( feature ) + O( n ). BoW For the 9Dim method, there are three processing steps: 1) part-of-speech tagging; ) look up the K to get the pertinence ratio for each selected word to compose the 9Dim vector; and 3) train and validate a SVM classifier with the represented vectors. For the first step, the space complexity is Om ( ) [14], where m is the number of rated sentences. For the second step, the space complexity is dependent on the of K, K, so it is O. For the third step, it is K the same as the second step of the BoW, On ( ). The total space complexity for 9Dim implementation is O9 = O( m) + O + O( n ). Because n is the same in Dim K V11-37

4 010 International Conference on Computer Application and System Modeling (ICCASM 010) both O and O, BoW 9Dim Om ( ), OK and O( feature ) are the terms with lower order in their formula and can be ignored. Therefore, O9 Dim = O( n ) and OBoW = O( n ). The 9Dim and the BoW representation methods have the same time complexity. However, in practice, due to the lower feature dimension of the 9Dim representation when compared to the BoW representation, 9Dim has lower memory cost in implementation. The feature dimension of the 9Dim representation is a constant, 9, whereas for BoW it is the of wordlist. We implemented Pang's unigram experiment [3] and applied the same experiment setting on design text semantic orientation classification. Each sentence in the rated sentence pool is represented by a BoW vector with 1111-word wordlist. The represented rated sentence pool is split into two parts, one for composing training examples, and another for validation examples. The size of training or validation examples set is set to be 500. Each training or validation example is comprised by one or more represented rated sentence(s) which are chosen randomly from the training or validation sentence pool. The number of represented rated sentences in a training or validation example is adjusted gradually from one to 0. For each number, five iterations of training and validation are run; the accuracy bars of these experiments are shown in Figure 1 For the 9Dim method experiment, the rated sentence pool is represented by the 9Dim representation. Similar experimental conditions were adopted from the experiment based on the BoW representation. The only difference was that the number of represented rated sentences in a training or validation example was adjusted gradually from one to 80 and the train-validation iteration number is set to be 50. The accuracy bars of these 9Dim experiments are shown in Figure Figure 1 and Figure show that the accuracy of sentiment orientation classification accuracy is improved when the number of represented rated sentences in each example is increased. Figure. 9Dim-based design text sentiment orientation analysis VI. CONCLUSIONS In this paper, we compared two text representation methods for design text in two respects: the space complexity of their implementation; and sentiment orientation classification. The results show that it is possible to encode semantic information and grammatical knowledge into a lower dimension vector to represent text for the purposes of sentiment classification. It also shows that a grammatical knowledge-embedding representation method can provide extra information for the classification algorithm to identify sentiment orientation and thereby reduce the space complexity in implementation. The complexity analysis indicates that the 9Dim representation method is superior to BoW in space complexity in practice, and provides comparable accuracy in classification. Nevertheless, the results from this study also point out that text is also an important factor in text classification. The reason may be that the longer text may contain more information about the semantic orientation features. However, if the text is "too" long, it may include sentences of conflicting sentiment. So, while more sentences per vector for training may be desirable, a fewer number of sentences may be better for the classification stage. ACKNOWLEDGMENT This research was supported under Australian Research Council s Discovery Projects funding scheme (project number DP ). The first author would like to thank an early career grant from Guangzhou University. This research was carried out while the first author was studying in the University of Sydney as a PhD student. Figure 1. BoW-based design text sentiment orientation analysis. REFERENCES [1] P.D. Turney, Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews, Proc. of the 40th Annual Meeting on Association for Computational Linguistics (ACL '0), 00, pp [] A. Ahmed, C. Hsinchun, and S. Arab, Sentimen analysis in multiple languages: Feature selection for opnion classification V11-38

5 010 International Conference on Computer Application and System Modeling (ICCASM 010) in Web forums, ACM Trans. Inf. Syst. 6 (3) (008), pp [3] B. Pang, L. Lee, and S. Vaithyanathan, Thumbs up? sentiment classification using machine learning techniques, Proc. of the Conference on Empirical Methods in Natural Language Processing (EMNLP 0), 00, pp [4] W. Lam, M.E. Ruiz, and P. Srinivasan, Automatic text categorization and its application to text retrieval, IEEE Transactions on Knowledge and Data Engineering, 11(6), 1999, pp [5] H. Chen, Intelligence and Security Informatics for International Security: Information Sharing and Data Mining. New York: Springer-Verlag Inc., 006. [6] T.K. Landauer, P.W. Foltz, and D. Laham, An introduction to latent semantic analysis, Discourse Processes, 5(),1998, pp [7] D.M. Blei, A.Y. Ng, and M.I. Jordan, Latent dirichlet allocation, Journal of Machine Learning Research 3 (003), pp [8] G. Wang, Y. Zhang, and L. Fei-Fei, Using dependent regions for object categorization in a generative framework, Proc. of the 006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, 006, pp [9] J.R. Martin, and P.R.R. White, The Language of Evaluation: Appraisal in English, New York: Palgrave Macmillan, 005. [10] C. Whitelaw, N. Garg, and S. Argamon, Using appraisal groups for sentiment analysis, Proc. of the 14th ACM international conference on Information and knowledge management, New York: ACM, 005, pp [11] J. Wang, and A. Dong, A case study of computing appraisals in design text, in J.S. Gero (Ed.), Design Computing and Cognition '08 (DCC'08), Springer Netherlands, 008, pp [1] J. Wang, and A. Dong, How am I doing : computing the language of appraisal in design, Proc. of 16th International Conference on Engineering Design (ICED'07), 007, pp. ICED'07/14. [13] A. Dong, The Language of Design-Theory and Computation. London: Springer, 009. [14] M. Marneffe, B. MacCartney, and C.D. Manning, Generating typed dependency parses from phrase structure parses, Proc. of the IEEE / ACL 006 Workshop on Spoken Language Technology, 006. [15] C. Fellbaum, WordNet: an electronic lexical database, Cambridge: MIT Press, [16] E. Kelih, P. Grzybek, G. Antić, and E. Stadlober, Quantitative text typology the impact of sentence, Proc. of the 9th Annual Conference of the Gesellschaft für Klassifikation e.v., Berlin Heidelberg: Springer, 006, pp [17] U.S. National Institute of Standards and Technology, Algorithms and Theory of Computation Handbook, CRC Press LLC, [18] T. Joachims, Making large-scale SVM learning practical in Advances in Kernel Methods -- Support Vector Learning, MIT Press, 1999, pp V11-39

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 141 ( 2014 ) 124 128 WCLTA 2013 Using Corpus Linguistics in the Development of Writing Blanka Frydrychova

More information

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models 1 Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models James B.

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

Matching Similarity for Keyword-Based Clustering

Matching Similarity for Keyword-Based Clustering Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web

More information

Multilingual Sentiment and Subjectivity Analysis

Multilingual Sentiment and Subjectivity Analysis Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,

More information

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.

More information

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA Alta de Waal, Jacobus Venter and Etienne Barnard Abstract Most actionable evidence is identified during the analysis phase of digital forensic investigations.

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH ISSN: 0976-3104 Danti and Bhushan. ARTICLE OPEN ACCESS CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH Ajit Danti 1 and SN Bharath Bhushan 2* 1 Department

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

Universiteit Leiden ICT in Business

Universiteit Leiden ICT in Business Universiteit Leiden ICT in Business Ranking of Multi-Word Terms Name: Ricardo R.M. Blikman Student-no: s1184164 Internal report number: 2012-11 Date: 07/03/2013 1st supervisor: Prof. Dr. J.N. Kok 2nd supervisor:

More information

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE Submitted in partial fulfillment of the requirements for the degree of Sarjana Sastra (S.S.)

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Learning Methods for Fuzzy Systems

Learning Methods for Fuzzy Systems Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8

More information

Parsing of part-of-speech tagged Assamese Texts

Parsing of part-of-speech tagged Assamese Texts IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal

More information

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de

More information

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract

More information

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis International Journal of Arts Humanities and Social Sciences (IJAHSS) Volume 1 Issue 1 ǁ August 216. www.ijahss.com Linguistic Variation across Sports Category of Press Reportage from British Newspapers:

More information

Advanced Grammar in Use

Advanced Grammar in Use Advanced Grammar in Use A self-study reference and practice book for advanced learners of English Third Edition with answers and CD-ROM cambridge university press cambridge, new york, melbourne, madrid,

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

Formulaic Language and Fluency: ESL Teaching Applications

Formulaic Language and Fluency: ESL Teaching Applications Formulaic Language and Fluency: ESL Teaching Applications Formulaic Language Terminology Formulaic sequence One such item Formulaic language Non-count noun referring to these items Phraseology The study

More information

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Using Web Searches on Important Words to Create Background Sets for LSI Classification Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 1 CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 Peter A. Chew, Brett W. Bader, Ahmed Abdelali Proceedings of the 13 th SIGKDD, 2007 Tiago Luís Outline 2 Cross-Language IR (CLIR) Latent Semantic Analysis

More information

Vocabulary Usage and Intelligibility in Learner Language

Vocabulary Usage and Intelligibility in Learner Language Vocabulary Usage and Intelligibility in Learner Language Emi Izumi, 1 Kiyotaka Uchimoto 1 and Hitoshi Isahara 1 1. Introduction In verbal communication, the primary purpose of which is to convey and understand

More information

1. Introduction. 2. The OMBI database editor

1. Introduction. 2. The OMBI database editor OMBI bilingual lexical resources: Arabic-Dutch / Dutch-Arabic Carole Tiberius, Anna Aalstein, Instituut voor Nederlandse Lexicologie Jan Hoogland, Nederlands Instituut in Marokko (NIMAR) In this paper

More information

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282) B. PALTRIDGE, DISCOURSE ANALYSIS: AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC. 2012. PP. VI, 282) Review by Glenda Shopen _ This book is a revised edition of the author s 2006 introductory

More information

Evolution of Symbolisation in Chimpanzees and Neural Nets

Evolution of Symbolisation in Chimpanzees and Neural Nets Evolution of Symbolisation in Chimpanzees and Neural Nets Angelo Cangelosi Centre for Neural and Adaptive Systems University of Plymouth (UK) a.cangelosi@plymouth.ac.uk Introduction Animal communication

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

BULATS A2 WORDLIST 2

BULATS A2 WORDLIST 2 BULATS A2 WORDLIST 2 INTRODUCTION TO THE BULATS A2 WORDLIST 2 The BULATS A2 WORDLIST 21 is a list of approximately 750 words to help candidates aiming at an A2 pass in the Cambridge BULATS exam. It is

More information

Metadiscourse in Knowledge Building: A question about written or verbal metadiscourse

Metadiscourse in Knowledge Building: A question about written or verbal metadiscourse Metadiscourse in Knowledge Building: A question about written or verbal metadiscourse Rolf K. Baltzersen Paper submitted to the Knowledge Building Summer Institute 2013 in Puebla, Mexico Author: Rolf K.

More information

Loughton School s curriculum evening. 28 th February 2017

Loughton School s curriculum evening. 28 th February 2017 Loughton School s curriculum evening 28 th February 2017 Aims of this session Share our approach to teaching writing, reading, SPaG and maths. Share resources, ideas and strategies to support children's

More information

Robust Sense-Based Sentiment Classification

Robust Sense-Based Sentiment Classification Robust Sense-Based Sentiment Classification Balamurali A R 1 Aditya Joshi 2 Pushpak Bhattacharyya 2 1 IITB-Monash Research Academy, IIT Bombay 2 Dept. of Computer Science and Engineering, IIT Bombay Mumbai,

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

A Bayesian Learning Approach to Concept-Based Document Classification

A Bayesian Learning Approach to Concept-Based Document Classification Databases and Information Systems Group (AG5) Max-Planck-Institute for Computer Science Saarbrücken, Germany A Bayesian Learning Approach to Concept-Based Document Classification by Georgiana Ifrim Supervisors

More information

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer

More information

Language Independent Passage Retrieval for Question Answering

Language Independent Passage Retrieval for Question Answering Language Independent Passage Retrieval for Question Answering José Manuel Gómez-Soriano 1, Manuel Montes-y-Gómez 2, Emilio Sanchis-Arnal 1, Luis Villaseñor-Pineda 2, Paolo Rosso 1 1 Polytechnic University

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

Extracting and Ranking Product Features in Opinion Documents

Extracting and Ranking Product Features in Opinion Documents Extracting and Ranking Product Features in Opinion Documents Lei Zhang Department of Computer Science University of Illinois at Chicago 851 S. Morgan Street Chicago, IL 60607 lzhang3@cs.uic.edu Bing Liu

More information

Handling Sparsity for Verb Noun MWE Token Classification

Handling Sparsity for Verb Noun MWE Token Classification Handling Sparsity for Verb Noun MWE Token Classification Mona T. Diab Center for Computational Learning Systems Columbia University mdiab@ccls.columbia.edu Madhav Krishna Computer Science Department Columbia

More information

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011 Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011 Cristian-Alexandru Drăgușanu, Marina Cufliuc, Adrian Iftene UAIC: Faculty of Computer Science, Alexandru Ioan Cuza University,

More information

Corpus Linguistics (L615)

Corpus Linguistics (L615) (L615) Basics of Markus Dickinson Department of, Indiana University Spring 2013 1 / 23 : the extent to which a sample includes the full range of variability in a population distinguishes corpora from archives

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

Developing Grammar in Context

Developing Grammar in Context Developing Grammar in Context intermediate with answers Mark Nettle and Diana Hopkins PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE The Pitt Building, Trumpington Street, Cambridge, United

More information

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and

More information

On the Combined Behavior of Autonomous Resource Management Agents

On the Combined Behavior of Autonomous Resource Management Agents On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science

More information

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach #BaselOne7 Deep search Enhancing a search bar using machine learning Ilgün Ilgün & Cedric Reichenbach We are not researchers Outline I. Periscope: A search tool II. Goals III. Deep learning IV. Applying

More information

Australia s tertiary education sector

Australia s tertiary education sector Australia s tertiary education sector TOM KARMEL NHI NGUYEN NATIONAL CENTRE FOR VOCATIONAL EDUCATION RESEARCH Paper presented to the Centre for the Economics of Education and Training 7 th National Conference

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

arxiv: v1 [cs.lg] 3 May 2013

arxiv: v1 [cs.lg] 3 May 2013 Feature Selection Based on Term Frequency and T-Test for Text Categorization Deqing Wang dqwang@nlsde.buaa.edu.cn Hui Zhang hzhang@nlsde.buaa.edu.cn Rui Liu, Weifeng Lv {liurui,lwf}@nlsde.buaa.edu.cn arxiv:1305.0638v1

More information

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features Sriram Venkatapathy Language Technologies Research Centre, International Institute of Information Technology

More information

Conversational Framework for Web Search and Recommendations

Conversational Framework for Web Search and Recommendations Conversational Framework for Web Search and Recommendations Saurav Sahay and Ashwin Ram ssahay@cc.gatech.edu, ashwin@cc.gatech.edu College of Computing Georgia Institute of Technology Atlanta, GA Abstract.

More information

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Procedia - Social and Behavioral Sciences 154 ( 2014 ) Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 154 ( 2014 ) 263 267 THE XXV ANNUAL INTERNATIONAL ACADEMIC CONFERENCE, LANGUAGE AND CULTURE, 20-22 October

More information

An Interactive Intelligent Language Tutor Over The Internet

An Interactive Intelligent Language Tutor Over The Internet An Interactive Intelligent Language Tutor Over The Internet Trude Heift Linguistics Department and Language Learning Centre Simon Fraser University, B.C. Canada V5A1S6 E-mail: heift@sfu.ca Abstract: This

More information

The Smart/Empire TIPSTER IR System

The Smart/Empire TIPSTER IR System The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of

More information

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability Shih-Bin Chen Dept. of Information and Computer Engineering, Chung-Yuan Christian University Chung-Li, Taiwan

More information

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING Yong Sun, a * Colin Fidge b and Lin Ma a a CRC for Integrated Engineering Asset Management, School of Engineering Systems, Queensland

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Cross-lingual Short-Text Document Classification for Facebook Comments

Cross-lingual Short-Text Document Classification for Facebook Comments 2014 International Conference on Future Internet of Things and Cloud Cross-lingual Short-Text Document Classification for Facebook Comments Mosab Faqeeh, Nawaf Abdulla, Mahmoud Al-Ayyoub, Yaser Jararweh

More information

What the National Curriculum requires in reading at Y5 and Y6

What the National Curriculum requires in reading at Y5 and Y6 What the National Curriculum requires in reading at Y5 and Y6 Word reading apply their growing knowledge of root words, prefixes and suffixes (morphology and etymology), as listed in Appendix 1 of the

More information

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working

More information

LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume 11 : 12 December 2011 ISSN

LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume 11 : 12 December 2011 ISSN LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume ISSN 1930-2940 Managing Editor: M. S. Thirumalai, Ph.D. Editors: B. Mallikarjun, Ph.D. Sam Mohanlal, Ph.D. B. A. Sharada, Ph.D.

More information

Determining the Semantic Orientation of Terms through Gloss Classification

Determining the Semantic Orientation of Terms through Gloss Classification Determining the Semantic Orientation of Terms through Gloss Classification Andrea Esuli Istituto di Scienza e Tecnologie dell Informazione Consiglio Nazionale delle Ricerche Via G Moruzzi, 1 56124 Pisa,

More information

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

Georgetown University at TREC 2017 Dynamic Domain Track

Georgetown University at TREC 2017 Dynamic Domain Track Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

Some Principles of Automated Natural Language Information Extraction

Some Principles of Automated Natural Language Information Extraction Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract

More information

Cross-Lingual Text Categorization

Cross-Lingual Text Categorization Cross-Lingual Text Categorization Nuria Bel 1, Cornelis H.A. Koster 2, and Marta Villegas 1 1 Grup d Investigació en Lingüística Computacional Universitat de Barcelona, 028 - Barcelona, Spain. {nuria,tona}@gilc.ub.es

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

Online Updating of Word Representations for Part-of-Speech Tagging

Online Updating of Word Representations for Part-of-Speech Tagging Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org

More information

Automatic document classification of biological literature

Automatic document classification of biological literature BMC Bioinformatics This Provisional PDF corresponds to the article as it appeared upon acceptance. Copyedited and fully formatted PDF and full text (HTML) versions will be made available soon. Automatic

More information

SARDNET: A Self-Organizing Feature Map for Sequences

SARDNET: A Self-Organizing Feature Map for Sequences SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu

More information

TABE 9&10. Revised 8/2013- with reference to College and Career Readiness Standards

TABE 9&10. Revised 8/2013- with reference to College and Career Readiness Standards TABE 9&10 Revised 8/2013- with reference to College and Career Readiness Standards LEVEL E Test 1: Reading Name Class E01- INTERPRET GRAPHIC INFORMATION Signs Maps Graphs Consumer Materials Forms Dictionary

More information

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80. CONTENTS FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8 УРОК (Unit) 1 25 1.1. QUESTIONS WITH КТО AND ЧТО 27 1.2. GENDER OF NOUNS 29 1.3. PERSONAL PRONOUNS 31 УРОК (Unit) 2 38 2.1. PRESENT TENSE OF THE

More information

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data Kurt VanLehn 1, Kenneth R. Koedinger 2, Alida Skogsholm 2, Adaeze Nwaigwe 2, Robert G.M. Hausmann 1, Anders Weinstein

More information