SENTIMENT CLASSIFICATION OF MOVIE REVIEWS USING LINGUISTIC PARSING. Brian Eriksson.
|
|
- Sylvia Dorsey
- 5 years ago
- Views:
Transcription
1 SENTIMENT CLASSIFICATION OF MOVIE REVIEWS USING LINGUISTIC PARSING Brian Eriksson CS Natural Language Processing Final Project Report ABSTRACT The problem of sentiment analysis requires a deeper understanding of the English language than previously established techniques in the field obtain. The Linguistic Tree Transformation Algorithm is introduced as a method to exploit the syntactical dependencies between words in a sentence and to disambiguate word senses. The algorithm is tested against the established Pang/Lee dataset and a new list of Roger Ebert reviews. A new method of objective sentence removal is also introduced to improve established methods of sentiment analysis against full reviews with no user extraction of objective sentences. 1. INTRODUCTION The last few years has seen an explosion in the number of papers under the topic of sentiment analysis. This is a fundamental shift in the area of Natural Language Processing. Previously, the underlying problem had been one of topic classification, where one is concerned only about what is being communicated. With sentiment analysis, a deeper understanding of the document must be extracted. This shifts the concern from what is being communicated to how is it being communicated. Previous papers written on how to solve this problem ([1-3]) ignore the fundamental richness of the English language which is used to communicate sentiment, and instead focus on the use of previous methods (N-gram, etc.) which throw many of these useful feature away. New methods using the power of linguistic techniques to exploit English must be found to improve sentiment classification rates. Any new methods must focus on two large problems in the area of sentiment analysis, the non-local dependencies problem and the word sense disambiguation problem. 2. PREVIOUS WORK The cornerstone for work on sentiment analysis is Pang and Lee s 2002 paper [1]. The authors of that paper compare Naive Bayes, Maximum Entropy, and Support Vector Machine approaches to classifying sentiment of movie reviews. They explain the relatively poor performance of the methods (versus a standard topic classification problem) as a result of sentiment analysis requiring a deeper understanding of the document under analysis. In 2005 ([3]), they returned to the topic and examined the multi-class performance under the finer scale star rating dataset. They added a nearest neighbor classifier to their collection of approaches, but the results still show great room for improvement. A better approach is taken by Matsumoto, et al. in [2]. The authors of that paper recognize that word order and syntactic relations between words are extremely important in the area of sentiment classification, and therefore it is imperative that they are not discarded. The approach they purpose involves taking each sentence of a review and constructing a dependency tree. This dependency tree would then be pruned to create subtree for classification. This subtree would graph the connection between words, while retaining their syntactical relationships and order in the original sentence. One drawback to this approach is that a great number of these subtrees are produced in the training stage of the algorithm, and for performance purposes they discard all but the N most frequent number of these subtrees. Outside the area of sentiment analysis, focusing instead on classification of documents using linguistic parsing, is Michael Collins in [4]. Collins develops a distance metric for extracting dependency bigrams from linguistic tree structures. The classification rates are quite good, but the algorithm runs fairly slow. A more simplistic tree parsing algorithm may result in similar rates of classification, while making the algorithm processing time practical. 3. NON-LOCAL DEPENDENCIES PROBLEM One of the fundamental problems in extracting meaning from a sentence is the non-local dependency problem. Often times, two words that are syntactically linked in a sentence are separated by several words. In these cases, small N valued N-gram models would fail at extracting a correlation between the two
2 words. A new method must be devised to find pairs of words that are syntactically linked. An clear example of this problem is found in the sentence (from [6]): In a movie this bad, one plot element is really idiotic. To anyone reading the preceding sentence, it should be obvious that two sentimental ideas are trying to be communicated to the reader. The first is that the author is trying to indicate that the movie was bad, the second is the communication that the plot element was idiotic. Using standard N-gram approaches, a trigram model would be necessary to map the dependencies of (movie,bad), and a 5-gram model would be necessary for (plot, idiotic). A powerful classifier for sentiment analysis would extract the non-local bigrams of (movie,bad) and (plot, idiotic), while collecting a minimum number of other sentiment lacking bigrams. The fundamental difference between these two sentences can be clearly seen if one goes to the linguistic level. After parsing both sentences into their standard Chomskyean grammar form, the trees are seen in figures 1 and 2. Figure 1. Positive Sentence Parse 4. WORD SENSE DISAMBIGUATION PROBLEM The word sense disambiguation problem is mentioned by the authors of [1] as a fundamental problem in the area of sentiment analysis. As an example they used two sentences: Sentence 1: I love this story. Sentence 2: This is a love story. It should be obvious to the reader that the first sentence is communicating positive sentiment, and the second sentence is an objective statement with neutral sentiment. The problem of using standard Natural Language Processing techniques is apparent when using an unigram model on both sentences. Unigram Model: p (S) = p (w 1 ) p (w 2 )...p (w N ) p (I love this story) = p (I) p (love) p (this) p (story) p (This is a love story) = p (This) p (is) p (a) p (love) p (story) Both sentence have three words commonly occur, therefore the probability model can be modified as: p i = p (this) p (love) p (story) p (This is a love story) = p i p (is) p (a) p (I love this story) = p i p (I) The resulting probability model difference between the two sentiment differing sentences is very small. Figure 2. Netural Sentence Parse The sentences are decomposed into forms of Noun Phrases (NP), Verb Phrases (VP) and then singular words labels (NN - noun, VBZ - verb, etc.). The most important feature to notice is the label on the common word love. Notice that on the positive sentiment sentence, the word is used as a verb, as in the word is used to indicate an action that the author of the sentence performed. On the other hand, in the neutral sentence, the word love is used as a noun, simply to indicate a thing (in this case, the story type) and therefore has no sentiment attached. A powerful classifier for sentiment analysis would take into consideration the label of each word. 5. LINGUISTIC TREE TRANSFORMATION ALGORITHM With the word disambiguation and non-local dependencies problem at the forefront, a tree-pruning algorithm was developed. From [4], it was obvious that a linguistic tree algorithm must be reasonably simple for computation time purposes. The work in [2] was a good start in determining how the trees
3 should be pruned, but changes must be made to create a less sparse training feature set. After empirical analysis of many example movie reviews (from [6]) and examining the corresponding tree structure of their sentences, it was determined that one of the first steps should be to remove all leafs not labeled as a noun, verb, or adjective. It was then observed that most movie review documents are filled with fairly verbose sentences, and the while retaining the overall tree structure is important, there must be some flatten mechanism to ensure that large complex tree structures are simplified. With knowledge of these two actions needed (pruning and flattening), the following algorithm was devised. Figure 5. Linguistic Tree Transformation - Step 3 The Linguistic Tree Transform Algorithm: 1. Parse the sentence into the standard Chomskyean tree structure Figure 6. Linguistic Tree Transformation - Step 5 6. Create a list of all the single noun, verbs, and adjectives 7. Create a list of all pairs of noun-verbs, noun-adjectives, verb-adjectives at the same tree depth Figure 3. Linguistic Tree Transformation - Step 1 2. Pruning - Eliminate all leafs not labeled as noun, verb, or adjective Nouns Verbs Adjectives Noun-Verb Pairs Noun-Adjective Pairs Verb-Adjective Pairs movie, plot, element is bad, idiotic (Movie,is) (plot,is) (element,is) (movie,bad) (element,bad) (plot,bad) (movie,idiotic) (plot,idiotic) (element, idiotic) (is,idiotic) (is,bad) Table 1. List extracted from the Linguistic Tree Transformation Algorithm using Figure 6. Figure 4. Linguistic Tree Transformation - Step 2 3. Pruning - Set phrase node labels as NULL 4. Flattening - For each leaf, collapse with its label value. Concatenate the label value to the leaf node. 5. Flattening - For each node with only one leaf node, eliminate the node and raise the leaf node up one depth level in the tree As seen in Table 1, the two important bigram elements ({plot, idiotic} and {movie, bad}) have been extracted from the sentence. Also received are the classification labels (noun, verb, adjective) of the critical words of the sentence. This is an example of how the Linguistic Tree Transformation algorithm is a powerful tool for solving both the word sense disambiguation problem and the non-local dependencies problem. 6. OBJECTIVE SENTENCE REMOVAL ALGORITHM Previous work on sentiment analysis ([1-3]) focused on using datasets that contained movie reviews with objective sentences already removed. For sentiment analysis to become completely automated, an algorithm must be developed that
4 takes into account objective sentences. For movie reviews, a vast majority of the document usually consists of an explanation of the plot. For a classifier, this would corrupt the results greatly, as two movies of differing quality but with the same plot would generally get the same classification of sentiment. After analysis of several full movie reviews (from [6]), it was determined that a simplified approach to removing objective sentences can be taken that would result in a collection of satisfactory subjective sentences. The Objective Sentence Removal Algorithm: 1. Assume that the algorithm has prior knowledge of the movie title, the director s name, and the screenwriter s name 2. Examine each sentence in the document. For each occurrence of the the movie title, replace with MOVIE. For each occurrence of the director s name, replace string with DIRECTOR. For each occurrence of the screenwriter s name, replace the string with SCREENWRITER. 3. Examine each sentence in the document, if the sentence does not contain at least one word from List 1., eliminate the sentence from the document. 4. Create a document with the sentences that have not been eliminated. MOVIE DIRECTOR SCREENWRITER film script performance plot List 1. Word List for Objective Sentence Removal For comparison purposes, The Objective Sentence Removal algorithm was performed on the Roger Ebert review of the film The China Syndrome ([6]). The China Syndrome is a terrific thriller that incidentally raises the most unsettling questions about how safe nuclear power plants really are. But the movie is, above all, entertainment: well-acted, well-crafted, scary as hell. The director, James Bridges, uses an exquisite sense of timing and character development to bring us to the cliffhanger conclusion. The events leading up to the accident in The China Syndrome are indeed based on actual occurrences at nuclear plants. Even the most unlikely mishap (a stuck needle on a graph causing engineers to misread a crucial water level) really happened at the Dresden plant outside Chicago. The key character is Godell (Jack Lemmon), a shift supervisor at a big nuclear power plant in Southern California. He lives alone, quietly, and can say without any self-consciousness that the plant is his life. He believes in nuclear power. Text 1: Original Paragraph Looking at the excerpt of the movie review, one can see several sentences related to the plot with no sentiment about the movie included. Analyzing a sentence, such as He believes in nuclear power., would result in no added knowledge about the reviewer s thoughts on the film, and therefore would become noise to a sentiment classifier. MOVIE is a terrific thriller that incidentally raises the most unsettling questions about how safe nuclear power plants really are. But the MOVIE is, above all, entertainment: well-acted, well-crafted, scary as hell. The director, DIRECTOR, uses an exquisite sense of timing and character development to bring us to the cliffhanger conclusion. The events leading up to the accident in MOVIE are indeed based on actual occurrences at nuclear plants. Text 2: Paragraph after Algorithm As seen in the text above, the Objective Sentence Removal algorithm removes almost every objective sentence in the document, the exception being the last sentence which is kept due to it containing the name of the film. The algorithm retains all the subjective sentences containing sentiment about the film. 7. RESULTS The lists outputted from the Linguisitic Tree Transformation Algorithm were arranged into frequency SVM model form (for use with the SVM-light software package [7]). Performance was tested against a frequency unigram SVM model
5 and a frequency bigram SVM model (again, using [7]). The first dataset tested was the subjective sentence only Sentence Polarity Dataset v1.0 originally created by Pang and Lee. The dataset contains 5331 positive and 5331 negative processed sentences with all objective sentences removed by user interaction. From this database, the first 4000 sentences were used to form a training set, and the remaining 1331 sentences were used to test accuracy performance. Unigram SVM 75.11% Bigram SVM 71.04% Linguistic Tree Transform SVM 84.09% Table 2. Pang-Lee Accuracy Results The second dataset used was 120 complete reviews (30 zero star, 30 one star, 30 three star, 30 four star reviews) taken from [6]. All 120 reviews were taken by adding the header in figure 8, followed by the original review with absolutely no modification. Because of the use of full reviews, the documents contained many objective sentences (plot description, etc.). The performance of the algorithms was tested both with the Objective Sentence Removal Algorithm and without. Both tests use 60 documents (30 positive (three and four star) and 30 negative (one and zero star) reviews) to train the classifier, and then tests the accuracy on classifying the remaining 60 documents (30 positive and 30 negative reviews). -film titledirector name- -screenwriter name- Figure 7. - Ebert Review Header Unigram SVM 65.00% Bigram SVM 63.33% Linguistic Tree Transform SVM % Table 3. Ebert (w/o Objective Removal) Accuracy Results Unigram SVM 83.33% Bigram SVM 65.00% Linguistic Tree Transform SVM % Table 4. Ebert (w/objective Removal) Accuracy Results 8. FUTURE DIRECTIONS The development of this algorithm has left many openings for future improvements. It was hypothesized that the inclusion of synonyms would improve accuracy rates. The algorithm code currently includes functionality for using synonyms from the WordNet ([8]) software package. When implemented, the WordNet-modified accuracy rate was actually lower than without using the synonym data. After analyzing the synonyms received by the WordNet software package, it was determined that the software returned a large number of synonyms that most people would not use in regular conversation ( good returns the word goodness ) thus adding noise into the classification system. Because of this problem, the WordNet functionality was turned off in the algorithm. Future work could possibly modify WordNet data into a form useful for sentiment classification. Other directions possible are accounting for the use of sarcasm, taking antonyms when not appears before an adjective, and extending the algorithm to classify individual star ratings. 9. CONCLUSIONS The results show the power of the two algorithms introduced in this paper. The Linguistic Tree Transformation algorithm consistently performs better than the established N-gram methods, with a slightly less than nine percent classification accuracy improvement when using the Sentence Polarity Database 1.0. When using the Roger Ebert dataset, the Linguistic Tree Transformation algorithm performs perfect classification both with and without objective sentence removal. The need for objective sentence removal and the strength of the Objective Sentence Removal Algorithm can be seen by the improvement of the classification using N-gram methods (Unigram - 65% to 83.33%, Bigram % to 65%). 10. REFERENCES [1] B. Pang, L. Lee, S. Vaithyanathan, Thumbs up? Sentiment Classification using Machine Learning Techniques, Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, [2] S. Matsumoto, H. Takamura, M. Okumura, Sentiment Classification using Word Sub-Sequences and Dependency Sub-Tree, Proceedings of PAKDD, [3] B. Pang, L. Lee, Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, Proceedings of the ACL, [4] M. Collins, A new statistical parser based on bigram lexical dependencies, Proceedings of the 34th annual meeting on Association for Computational Linguistics, 1996.
6 [5] D. Prescher, A Tutorial on the Expectation- Maximization Algorithm Including Maximum- Likelihood Estimation and EM Training of Probabilistic Context-Free Grammars, The 15th European Summer School in Logic, Language and Information, [6] R. Ebert, Roger Ebert Reviews [7] T. Joachims Making large-scale SVM Learning Practical, Advances in Kernel Methods - Support Vector Learning, [8] C. Fellbaum Wordnet: An Electronic Lexical Database, MIT Press, 1998.
A Comparison of Two Text Representations for Sentiment Analysis
010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational
More informationNetpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models
Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models 1 Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models James B.
More informationTwitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationSwitchboard Language Model Improvement with Conversational Data from Gigaword
Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationRobust Sense-Based Sentiment Classification
Robust Sense-Based Sentiment Classification Balamurali A R 1 Aditya Joshi 2 Pushpak Bhattacharyya 2 1 IITB-Monash Research Academy, IIT Bombay 2 Dept. of Computer Science and Engineering, IIT Bombay Mumbai,
More informationChunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.
NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationRule Learning with Negation: Issues Regarding Effectiveness
Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX
More informationThe stages of event extraction
The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks
More informationAQUA: An Ontology-Driven Question Answering System
AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationCS 446: Machine Learning
CS 446: Machine Learning Introduction to LBJava: a Learning Based Programming Language Writing classifiers Christos Christodoulopoulos Parisa Kordjamshidi Motivation 2 Motivation You still have not learnt
More informationPrediction of Maximal Projection for Semantic Role Labeling
Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationProduct Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments
Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &
More informationA Bayesian Learning Approach to Concept-Based Document Classification
Databases and Information Systems Group (AG5) Max-Planck-Institute for Computer Science Saarbrücken, Germany A Bayesian Learning Approach to Concept-Based Document Classification by Georgiana Ifrim Supervisors
More informationLearning From the Past with Experiment Databases
Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationParsing of part-of-speech tagged Assamese Texts
IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal
More informationMachine Learning from Garden Path Sentences: The Application of Computational Linguistics
Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,
More informationSyntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm syntax: from the Greek syntaxis, meaning setting out together
More informationMultilingual Sentiment and Subjectivity Analysis
Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department
More information11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation
tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationObjectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition
Chapter 2: The Representation of Knowledge Expert Systems: Principles and Programming, Fourth Edition Objectives Introduce the study of logic Learn the difference between formal logic and informal logic
More informationarxiv: v1 [cs.cl] 2 Apr 2017
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,
More informationA Vector Space Approach for Aspect-Based Sentiment Analysis
A Vector Space Approach for Aspect-Based Sentiment Analysis by Abdulaziz Alghunaim B.S., Massachusetts Institute of Technology (2015) Submitted to the Department of Electrical Engineering and Computer
More informationLeveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word Similarity Balamurali A.R., Subhabrata Mukherjee, Akshat Malu and Pushpak Bhattacharyya Dept. of Computer Science and Engineering, IIT Bombay 6th International Global
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationThe Internet as a Normative Corpus: Grammar Checking with a Search Engine
The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a
More informationThe College Board Redesigned SAT Grade 12
A Correlation of, 2017 To the Redesigned SAT Introduction This document demonstrates how myperspectives English Language Arts meets the Reading, Writing and Language and Essay Domains of Redesigned SAT.
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationModeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures
Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Ulrike Baldewein (ulrike@coli.uni-sb.de) Computational Psycholinguistics, Saarland University D-66041 Saarbrücken,
More informationUsing Web Searches on Important Words to Create Background Sets for LSI Classification
Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract
More informationExtracting Verb Expressions Implying Negative Opinions
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Extracting Verb Expressions Implying Negative Opinions Huayi Li, Arjun Mukherjee, Jianfeng Si, Bing Liu Department of Computer
More informationVerbal Behaviors and Persuasiveness in Online Multimedia Content
Verbal Behaviors and Persuasiveness in Online Multimedia Content Moitreya Chatterjee, Sunghyun Park*, Han Suk Shim*, Kenji Sagae and Louis-Philippe Morency USC Institute for Creative Technologies Los Angeles,
More informationNatural Language Processing. George Konidaris
Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2017 Natural Language Processing Understanding spoken/written sentences in a natural language. Major area of research in AI. Why? Humans
More informationExperiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling
Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad
More informationGuidelines for Writing an Internship Report
Guidelines for Writing an Internship Report Master of Commerce (MCOM) Program Bahauddin Zakariya University, Multan Table of Contents Table of Contents... 2 1. Introduction.... 3 2. The Required Components
More informationWriting a composition
A good composition has three elements: Writing a composition an introduction: A topic sentence which contains the main idea of the paragraph. a body : Supporting sentences that develop the main idea. a
More informationMovie Review Mining and Summarization
Movie Review Mining and Summarization Li Zhuang Microsoft Research Asia Department of Computer Science and Technology, Tsinghua University Beijing, P.R.China f-lzhuang@hotmail.com Feng Jing Microsoft Research
More informationUniversiteit Leiden ICT in Business
Universiteit Leiden ICT in Business Ranking of Multi-Word Terms Name: Ricardo R.M. Blikman Student-no: s1184164 Internal report number: 2012-11 Date: 07/03/2013 1st supervisor: Prof. Dr. J.N. Kok 2nd supervisor:
More informationSpeech Emotion Recognition Using Support Vector Machine
Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,
More informationProof Theory for Syntacticians
Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 5, 2012 Logics for Linguistics Many different kinds of logic are directly applicable to formalizing theories in syntax
More informationMETHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS
METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS Ruslan Mitkov (R.Mitkov@wlv.ac.uk) University of Wolverhampton ViktorPekar (v.pekar@wlv.ac.uk) University of Wolverhampton Dimitar
More information2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases
POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationTHE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING
SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,
More informationMeasuring the relative compositionality of verb-noun (V-N) collocations by integrating features
Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features Sriram Venkatapathy Language Technologies Research Centre, International Institute of Information Technology
More informationA Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many
Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.
More informationEnsemble Technique Utilization for Indonesian Dependency Parser
Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id
More informationLarge vocabulary off-line handwriting recognition: A survey
Pattern Anal Applic (2003) 6: 97 121 DOI 10.1007/s10044-002-0169-3 ORIGINAL ARTICLE A. L. Koerich, R. Sabourin, C. Y. Suen Large vocabulary off-line handwriting recognition: A survey Received: 24/09/01
More informationIntra-talker Variation: Audience Design Factors Affecting Lexical Selections
Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and
More informationAn Interactive Intelligent Language Tutor Over The Internet
An Interactive Intelligent Language Tutor Over The Internet Trude Heift Linguistics Department and Language Learning Centre Simon Fraser University, B.C. Canada V5A1S6 E-mail: heift@sfu.ca Abstract: This
More informationHuman Emotion Recognition From Speech
RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati
More informationSome Principles of Automated Natural Language Information Extraction
Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract
More informationVocabulary Usage and Intelligibility in Learner Language
Vocabulary Usage and Intelligibility in Learner Language Emi Izumi, 1 Kiyotaka Uchimoto 1 and Hitoshi Isahara 1 1. Introduction In verbal communication, the primary purpose of which is to convey and understand
More informationPredicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks
Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com
More informationExtracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models
Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Richard Johansson and Alessandro Moschitti DISI, University of Trento Via Sommarive 14, 38123 Trento (TN),
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationEdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,
More informationLoughton School s curriculum evening. 28 th February 2017
Loughton School s curriculum evening 28 th February 2017 Aims of this session Share our approach to teaching writing, reading, SPaG and maths. Share resources, ideas and strategies to support children's
More informationImproved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form
Orthographic Form 1 Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form The development and testing of word-retrieval treatments for aphasia has generally focused
More informationAustralian Journal of Basic and Applied Sciences
AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean
More informationA Domain Ontology Development Environment Using a MRD and Text Corpus
A Domain Ontology Development Environment Using a MRD and Text Corpus Naomi Nakaya 1 and Masaki Kurematsu 2 and Takahira Yamaguchi 1 1 Faculty of Information, Shizuoka University 3-5-1 Johoku Hamamatsu
More informationParallel Evaluation in Stratal OT * Adam Baker University of Arizona
Parallel Evaluation in Stratal OT * Adam Baker University of Arizona tabaker@u.arizona.edu 1.0. Introduction The model of Stratal OT presented by Kiparsky (forthcoming), has not and will not prove uncontroversial
More informationAccuracy (%) # features
Question Terminology and Representation for Question Type Classication Noriko Tomuro DePaul University School of Computer Science, Telecommunications and Information Systems 243 S. Wabash Ave. Chicago,
More informationWhat the National Curriculum requires in reading at Y5 and Y6
What the National Curriculum requires in reading at Y5 and Y6 Word reading apply their growing knowledge of root words, prefixes and suffixes (morphology and etymology), as listed in Appendix 1 of the
More informationDetermining the Semantic Orientation of Terms through Gloss Classification
Determining the Semantic Orientation of Terms through Gloss Classification Andrea Esuli Istituto di Scienza e Tecnologie dell Informazione Consiglio Nazionale delle Ricerche Via G Moruzzi, 1 56124 Pisa,
More informationThree New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA
Three New Probabilistic Models for Dependency Parsing: An Exploration Jason M. Eisner CIS Department, University of Pennsylvania 200 S. 33rd St., Philadelphia, PA 19104-6389, USA jeisner@linc.cis.upenn.edu
More informationBeyond the Pipeline: Discrete Optimization in NLP
Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We
More informationEnhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities
Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion
More informationCEFR Overall Illustrative English Proficiency Scales
CEFR Overall Illustrative English Proficiency s CEFR CEFR OVERALL ORAL PRODUCTION Has a good command of idiomatic expressions and colloquialisms with awareness of connotative levels of meaning. Can convey
More informationIterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages
Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer
More informationMulti-Lingual Text Leveling
Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency
More informationThe 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X
The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,
More informationIndian Institute of Technology, Kanpur
Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar
More informationIntroduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and
More informationAnalysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion
More informationLTAG-spinal and the Treebank
LTAG-spinal and the Treebank a new resource for incremental, dependency and semantic parsing Libin Shen (lshen@bbn.com) BBN Technologies, 10 Moulton Street, Cambridge, MA 02138, USA Lucas Champollion (champoll@ling.upenn.edu)
More informationOnline Updating of Word Representations for Part-of-Speech Tagging
Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org
More informationContext Free Grammars. Many slides from Michael Collins
Context Free Grammars Many slides from Michael Collins Overview I An introduction to the parsing problem I Context free grammars I A brief(!) sketch of the syntax of English I Examples of ambiguous structures
More informationESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly
ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly Inflected Languages Classical Approaches to Tagging The slides are posted on the web. The url is http://chss.montclair.edu/~feldmana/esslli10/.
More informationDistant Supervised Relation Extraction with Wikipedia and Freebase
Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational
More informationThe Smart/Empire TIPSTER IR System
The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of
More informationScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 98 (2016 ) 368 373 The 6th International Conference on Current and Future Trends of Information and Communication Technologies
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationMining Topic-level Opinion Influence in Microblog
Mining Topic-level Opinion Influence in Microblog Daifeng Li Dept. of Computer Science and Technology Tsinghua University ldf3824@yahoo.com.cn Jie Tang Dept. of Computer Science and Technology Tsinghua
More informationUsing Semantic Relations to Refine Coreference Decisions
Using Semantic Relations to Refine Coreference Decisions Heng Ji David Westbrook Ralph Grishman Department of Computer Science New York University New York, NY, 10003, USA hengji@cs.nyu.edu westbroo@cs.nyu.edu
More informationA Graph Based Authorship Identification Approach
A Graph Based Authorship Identification Approach Notebook for PAN at CLEF 2015 Helena Gómez-Adorno 1, Grigori Sidorov 1, David Pinto 2, and Ilia Markov 1 1 Center for Computing Research, Instituto Politécnico
More informationIntroduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.
to as a linguistic theory to to a member of the family of linguistic frameworks that are called generative grammars a grammar which is formalized to a high degree and thus makes exact predictions about
More informationWhy Pay Attention to Race?
Why Pay Attention to Race? Witnessing Whiteness Chapter 1 Workshop 1.1 1.1-1 Dear Facilitator(s), This workshop series was carefully crafted, reviewed (by a multiracial team), and revised with several
More informationELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading
ELA/ELD Correlation Matrix for ELD Materials Grade 1 Reading The English Language Arts (ELA) required for the one hour of English-Language Development (ELD) Materials are listed in Appendix 9-A, Matrix
More informationAssessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2
Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2 Ted Pedersen Department of Computer Science University of Minnesota Duluth, MN, 55812 USA tpederse@d.umn.edu
More informationShort Text Understanding Through Lexical-Semantic Analysis
Short Text Understanding Through Lexical-Semantic Analysis Wen Hua #1, Zhongyuan Wang 2, Haixun Wang 3, Kai Zheng #4, Xiaofang Zhou #5 School of Information, Renmin University of China, Beijing, China
More information