Motivation, Methods and Evaluation. Sowmya Vajjala
|
|
- Janel Sullivan
- 5 years ago
- Views:
Transcription
1 Motivation, Methods and Evaluation (with Detmar Meurers) Center for Language Technology University of Gothenburg, Sweden 20 November / 29
2 What is readability analysis? We want to measure how difficult it is to read a text, based on properties of the text, using criteria which are data-induced: using corpora with graded texts theory-driven: constructs known to reflect complexity given a purpose, e.g., humans: to support reading and comprehension read texts at a specific level of language proficiency. carry out specific tasks (e.g., answer questions) etc., machines: evaluation of generation systems sometimes personalized to a user, through information- obtained directly (e.g., questionnaire), or indirectly (e.g, inferred from nature of a search query) 2 / 29
3 Why do we need readability for sentences? some application scenarios selecting appropriate sentences for language learners in CALL. (Segler 2007; Pilán et al. 2013, 2014) understanding the difficulty of survey questions (Lenzner 2013) predicting sentence fluency in Machine Translation (Chae & Nenkova 2009) for text simplification (Vajjala & Meurers 2014a; Dell Orletta et al. 2014) 3 / 29
4 Why do WE need it? identifying target sentences for text simplification. evaluating text simplification approaches. 4 / 29
5 : Overview Corpus: publicly accessible, sentence level corpora (texts not prepared by us) : from Vajjala & Meurers (2014b), that work well at a text level. : 1. binary classification (easy vs difficult) 2. apply document level regression model on sentences. 3. pair-wise ranking Evaluation: within and cross corpus evaluations with multiple real-life datasets 5 / 29
6 -1: Wikipedia-SimpleWikipedia Zhu et al. (2010) created a publicly available, sentence aligned corpus from Wikipedia and Simple Wikipedia. 80,000 pairs of sentences in simplified and unsimplified versions. Example pair: 1. Wiki: Chinese styles vary greatly from era to era and are traditionally named after the ruling dynasty. 2. Simple Wiki: There are many Chinese artistic styles, which are usually named after the ruling dynasty. 6 / 29
7 -2: OneStopEnglish.com OneStopEnglish (OSE) is an English teachers resource website published by the Macmillan Education Group. They publish Weekly News Lessons which consist of news articles sourced from The Guardian. The articles are rewritten by teaching experts for English language learners at three reading levels (elementary, intermediate, advanced) We obtained permission to collect articles and compiled a corpus of 76 article triplets (228 in total) 7 / 29
8 -2: OneStopEnglish.com sentence aligned corpus creation creation process: 1. parse the pdf files and extract text content. 2. split all texts into sentences. 3. compare sentences between versions and match them by their cosine similarity (Nelken & Shieber 2006). two versions of the corpus: 1. OSE2Corpus: 3000 sentence pairs. 2. OSE3Corpus: 850 sentence triplets. * contact me if anyone wants to use this corpus. 8 / 29
9 OSE Corpus: Example adv: In Beijing, mourners and admirers made their way to lay flowers and light candles at the Apple Store. inter: In Beijing, mourners and admirers came to lay flowers and light candles at the Apple Store. ele: In Beijing, people went to the Apple Store with flowers and candles. 9 / 29
10 -1 From Vajjala & Meurers (2014b) Lexical lexical richness features from Second Language Acquisition (SLA) research e.g., Type-Token ratio, noun variation,... POS density features e.g., # nouns/# words, # adverbs/# words,... traditional features and formulae e.g., # sentence length in words... Syntactic syntactic complexity features from SLA research. e.g., # dep. clauses/clause, average clause length,... other parse tree features e.g., # NPs per sentence, avg. parse tree height, / 29
11 -2 Morphological properties of words e.g., Does the word contain a stem along with an affix? abundant=abound+ant Age of Acquisition (AoA) average age-of-acquisition of words in a text Other Psycholinguistic features e.g., word abstractness Avg. number of senses per word (obtained from WordNet) 11 / 29
12 Sentence : Binary Classification Vajjala & Meurers (2014a) We started with training a sentence-level readability model on Wikipedia corpus: Binary classification: simple hard 65 68% accuracy, depending on training set size. increasing training sample size from 10K to 80K samples did not improve the accuracy much! As regression: r = 0.4 Why is it so bad? 12 / 29
13 What is the problem? What happens if we just apply a document level readability model on this corpus? Model (Vajjala & Meurers 2014b): outputs readability score on a scale of 1-5, 5 being difficult. Percentage of the total sentences at that level Distribution of reading levels of Normal and Simplified Wiki Simple Wiki Reading level 13 / 29
14 What can we infer? There are all sorts of sentences in both versions. Wikipedia has more sentences at higher reading levels than Simple Wikipedia. - Is this the reason binary classification failed? one idea: A simple sentence is only simpler than its unsimplified version. It can also still be hard. Simplification could be relative, not absolute. 14 / 29
15 Is Simplification Relative? How can we study this? One approach: compute reading levels of normal (N) and simplified (S) sentences using our document level readability model. evaluate simplification classification using the percentages of S<N, S=N and S>N the higher the percentage for S<N, the better the model is, at evaluating sentence level readability. Why?: Simplified versions are expected to be at a lower reading level than Normal versions!! How big must S N be to interpret it as a categorical difference in reading level? We call this the d-value. 15 / 29
16 What exactly is d-value? It is a measure of how fine-grained the model is in identifying reading-level differences between sentences. For example, let us say d = 0.3. Now, when N = 3.4, S = 3.2, S N = 0.2, <d S=N. If N=3.5, S=3.1, S N = 0.4, >d S<N. What is good for us?: the model should be able to identify as many pairs as possible as S<N. S= N is probably okay, but S>N is bad. 16 / 29
17 Influence of d Question 1: Comparison Does changing of Normal and the Simplified d-value affect our results? Percentage of the total samples S<N S=N S>N d-value desired scenario: percentage of (S<N) > (S=N) > (S>N) 17 / 29
18 Influence of N Question 2: How does the reading level of the unsimplified sentence (N) affect the results? 18 / 29
19 Influence of N Percentage of the total samples Question 2: How does the reading level of the unsimplified sentence (N) affect the results? Comparison at Higher Values of N S<N S=N S>N Percentage of the total samples Comparison at Lower Values of N S<N S=N S>N d-value For harder sentences (when N >=2.5) d-value For easier sentences (when N<2.5) 18 / 29
20 What do we learn from these graphs? The accuracy of relative comparison of reading levels of sentences depends on: 1. minimum S N required to identify a categorical difference (d). 2. reading level of the original, unsimplified sentence (N). It is difficult to identify simplifications for an already simple sentence. But, this approach works well for complex sentences. *more details about this: Vajjala & Meurers (2014a). 19 / 29
21 What Next? How about modeling this as pair-wise ranking instead? Why? 1. We do not have exact reading level annotations at sentence level. 2. But we know that the simplified version should have a lower reading level. ranking cares only about relative differences, not absolute differences. perhaps, an ideal learning method for this problem? 20 / 29
22 Pairwise : A Primer typically used in information retrieval, to rank search results by doing a pair-wise comparison between them. learning goal: minimize the number of ordering errors and mis-classified pairs. usual purpose: look at a pair of documents and rank them based on their relevance to the query. our purpose: learn a binary classifier that can tell which sentence is simpler, given a pair of sentences. 21 / 29
23 Pairwise : Evaluation errors: percentage of reversed pairs. accuracy: percentage of correctly ranked pairs. if there are two sentences (N - unsimplified, S - simplified), rank(s) > rank(n), it is counted as an error. if there are three sentences (A, I, B), and our system gives a readability ranking: I,B,A there are two ranking errors here. 1. I is ranked higher than A. 2. B is ranked higher than A. note: There are other measures of efficiency for ranking, that are tailored to information retrieval applications. 22 / 29
24 with Train-Test Data setup Training sets: 1. Wiki-Train: 2000 pairs of Wikipedia-SimpleWikipedia parallel sentences. 2. OSE2-Train: 2000 pairs of sentences from OSE corpus (advanced beginner, advanced intermediate, and intermediate beginner combinations.) 3. OSE3-Train: 750 sentence triplets from OSE corpus (each triplet has a single sentence in 3 versions). 4. WikiOSE: mixed training set, consisting of Wiki-Train and OSE2-Train. (size: 4000 pairs) Test sets: 1. Wiki-Test: pairs. 2. OSE2-Test: 1000 pairs. 3. OSE3-Test: 100 triplets. 23 / 29
25 Results algorithm: SVM rank results: training with 2-level datasets Training => Wiki-Train OSE2-Train Wiki-Test 81.8% 77.5% OSE2-Test 74.6% 81.5% OSE3-Test 74.7% 79.3% training with a 3 level corpus and a mixed corpus. Training => OSE-3Level-750 WikiOSE Wiki-Test-78Kpairs 78.6% 81.3% OSE2-Test 82.4% 80.7% OSE3-Test 79.6% 84.0% 24 / 29
26 How does this compare with the previous approach? vs Regression 1. with regression model, depending on d-value, we were: able to predict correctly in 60% of the cases. identified no difference between the sentence versions in 10% of the cases. identified the order wrongly in 30% of the cases. 2. ranking approach: We can predict the order correctly with 80% accuracy. works with multiple datasets and levels of simplification. clearly, ranking is working better than regression. 25 / 29
27 What features work with? training set: OSE-2 level, test set: OSE-2Level-Test. performance of feature groups: 1. Psycholinguistic % 2. Syntactic Complexity % 3. Celex % 4. Lexical Richness - 67% around 55-60% accuracy with good single features. 26 / 29
28 so far readability based ranking works well in distinguishing simplified and unsimplified versions of a sentence. features that worked at document level work well on sentences too with good accuracy. the approach can also make distinctions between multiple levels without losing on the performance. we get good results with cross-corpus and mixed-corpus evaluations too. its fairly generalizable to other texts that informational in nature (e.g., news, encyclopaedia articles etc.,) 27 / 29
29 Current and Future Work What feature selection approaches work for ranking? What linguistic properties change the most between Advanced to Intermediate vs Intermediate to Beginner? How do we eliminate correlated features? Using with actual automatic text-simplification approaches to evaluate them in a data-driven manner / 29
30 End of Story! Thank you for your patience :-) Questions? 29 / 29
31 References Chae, J. & A. Nenkova (2009). Predicting the fluency of text with shallow structural features: case studies of machine translation and human-written text. In Proceedings of the 12th Conference of the European Chapter of the ACL. Dell Orletta, F., M. Wieling, A. Cimino, G. Venturi & S. Montemagni (2014). Assessing the of : Which and? In Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications (BEA9). Baltimore, Maryland, USA: ACL, pp Lenzner, T. (2013). Are Formulas Valid Tools for Assessing Survey Question Difficulty? Sociological Methods and Research -,. Nelken, R. & S. M. Shieber (2006). Towards robust context-sensitive sentence alignment for monolingual corpora. In In 11th Conference of the European Chapter of the Association of Computational Linguistics. Assoc. for Computational Linguistics, pp Pilán, I., E. Volodina & R. Johansson (2013). Automatic Selection of Suitable for Language Learning Exercises. In L. Bradley & S. Thouësny (eds.), 20 Years of EUROCALL: Learning from the Past, Looking to the Future. Proceedings of the 2013 EUROCALL Conference. pp Pilán, I., E. Volodina & R. Johansson (2014). Rule-based and machine learning approaches for second language sentence-level readability. In Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications (BEA9). Baltimore, Maryland, USA: ACL, pp / 29
32 Segler, T. M. (2007). Investigating the Selection of Example for Unknown Target Words in ICALL Reading Texts for L2 German. Ph.D. thesis, Institute for Communicating and Collaborative Systems, School of Informatics, University of Edinburgh. URL Vajjala, S. & D. Meurers (2014a). Assessing the relative reading level of sentence pairs for text simplification. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL). ACL, Gothenburg, Sweden: Association for Computational Linguistics, pp Vajjala, S. & D. Meurers (2014b). Text Simplification: From Analyzing Documents to Identifying Sentential Simplifications. International Journal of Applied Linguistics, Special Issue on Current Research in and Text Simplification Thomas François and Delphine Bernhard. Zhu, Z., D. Bernhard & I. Gurevych (2010). A Monolingual Tree-based Translation Model for Sentence Simplification. In Proceedings of The 23rd International Conference on Computational Linguistics (COLING), August Beijing, China. pp / 29
Multi-Lingual Text Leveling
Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationEnsemble Technique Utilization for Indonesian Dependency Parser
Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationThe Karlsruhe Institute of Technology Translation Systems for the WMT 2011
The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 Teresa Herrmann, Mohammed Mediani, Jan Niehues and Alex Waibel Karlsruhe Institute of Technology Karlsruhe, Germany firstname.lastname@kit.edu
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationAQUA: An Ontology-Driven Question Answering System
AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.
More informationWeb as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics
(L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes
More informationTarget Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data
Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se
More informationIntra-talker Variation: Audience Design Factors Affecting Lexical Selections
Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and
More informationEnhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities
Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion
More informationA Semantic Similarity Measure Based on Lexico-Syntactic Patterns
A Semantic Similarity Measure Based on Lexico-Syntactic Patterns Alexander Panchenko, Olga Morozova and Hubert Naets Center for Natural Language Processing (CENTAL) Université catholique de Louvain Belgium
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationarxiv: v1 [cs.cl] 2 Apr 2017
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,
More informationDistant Supervised Relation Extraction with Wikipedia and Freebase
Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational
More informationA Comparison of Two Text Representations for Sentiment Analysis
010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational
More informationTwitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationLip reading: Japanese vowel recognition by tracking temporal changes of lip shape
Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Koshi Odagiri 1, and Yoichi Muraoka 1 1 Graduate School of Fundamental/Computer Science and Engineering, Waseda University,
More informationThe stages of event extraction
The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks
More informationTHE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING
SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,
More informationA Bayesian Learning Approach to Concept-Based Document Classification
Databases and Information Systems Group (AG5) Max-Planck-Institute for Computer Science Saarbrücken, Germany A Bayesian Learning Approach to Concept-Based Document Classification by Georgiana Ifrim Supervisors
More informationCS 598 Natural Language Processing
CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationAtypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty
Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty Julie Medero and Mari Ostendorf Electrical Engineering Department University of Washington Seattle, WA 98195 USA {jmedero,ostendor}@uw.edu
More informationTHE VERB ARGUMENT BROWSER
THE VERB ARGUMENT BROWSER Bálint Sass sass.balint@itk.ppke.hu Péter Pázmány Catholic University, Budapest, Hungary 11 th International Conference on Text, Speech and Dialog 8-12 September 2008, Brno PREVIEW
More informationExploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data
Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer
More informationMULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY
MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract
More informationGCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education
GCSE Mathematics B (Linear) Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education Mark Scheme for November 2014 Oxford Cambridge and RSA Examinations OCR (Oxford Cambridge
More informationOCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationCross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels
Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels Jörg Tiedemann Uppsala University Department of Linguistics and Philology firstname.lastname@lingfil.uu.se Abstract
More informationBeyond the Pipeline: Discrete Optimization in NLP
Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We
More informationSoftware Maintenance
1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories
More informationCS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus
CS 1103 Computer Science I Honors Fall 2016 Instructor Muller Syllabus Welcome to CS1103. This course is an introduction to the art and science of computer programming and to some of the fundamental concepts
More informationPredicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks
Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com
More informationAlgebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview
Algebra 1, Quarter 3, Unit 3.1 Line of Best Fit Overview Number of instructional days 6 (1 day assessment) (1 day = 45 minutes) Content to be learned Analyze scatter plots and construct the line of best
More information2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases
POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz
More informationApplications of memory-based natural language processing
Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationGetting Started with Deliberate Practice
Getting Started with Deliberate Practice Most of the implementation guides so far in Learning on Steroids have focused on conceptual skills. Things like being able to form mental images, remembering facts
More informationAssessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2
Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2 Ted Pedersen Department of Computer Science University of Minnesota Duluth, MN, 55812 USA tpederse@d.umn.edu
More informationA heuristic framework for pivot-based bilingual dictionary induction
2013 International Conference on Culture and Computing A heuristic framework for pivot-based bilingual dictionary induction Mairidan Wushouer, Toru Ishida, Donghui Lin Department of Social Informatics,
More informationCS 446: Machine Learning
CS 446: Machine Learning Introduction to LBJava: a Learning Based Programming Language Writing classifiers Christos Christodoulopoulos Parisa Kordjamshidi Motivation 2 Motivation You still have not learnt
More informationIndian Institute of Technology, Kanpur
Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar
More informationParsing of part-of-speech tagged Assamese Texts
IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal
More informationFlorida Reading Endorsement Alignment Matrix Competency 1
Florida Reading Endorsement Alignment Matrix Competency 1 Reading Endorsement Guiding Principle: Teachers will understand and teach reading as an ongoing strategic process resulting in students comprehending
More informationThe taming of the data:
The taming of the data: Using text mining in building a corpus for diachronic analysis Stefania Degaetano-Ortlieb, Hannah Kermes, Ashraf Khamis, Jörg Knappen, Noam Ordan and Elke Teich Background Big data
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More information1.11 I Know What Do You Know?
50 SECONDARY MATH 1 // MODULE 1 1.11 I Know What Do You Know? A Practice Understanding Task CC BY Jim Larrison https://flic.kr/p/9mp2c9 In each of the problems below I share some of the information that
More informationDeveloping a TT-MCTAG for German with an RCG-based Parser
Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,
More informationConstructing Parallel Corpus from Movie Subtitles
Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing
More informationChunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.
NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationOnline Updating of Word Representations for Part-of-Speech Tagging
Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org
More informationCLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction
CLASSIFICATION OF PROGRAM Critical Elements Analysis 1 Program Name: Macmillan/McGraw Hill Reading 2003 Date of Publication: 2003 Publisher: Macmillan/McGraw Hill Reviewer Code: 1. X The program meets
More informationDefragmenting Textual Data by Leveraging the Syntactic Structure of the English Language
Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Nathaniel Hayes Department of Computer Science Simpson College 701 N. C. St. Indianola, IA, 50125 nate.hayes@my.simpson.edu
More informationInternational Conference on Education and Educational Psychology (ICEEPSY 2012)
Available online at www.sciencedirect.com Procedia - Social and Behavioral Sciences 69 ( 2012 ) 984 989 International Conference on Education and Educational Psychology (ICEEPSY 2012) Second language research
More informationLeveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word Similarity Balamurali A.R., Subhabrata Mukherjee, Akshat Malu and Pushpak Bhattacharyya Dept. of Computer Science and Engineering, IIT Bombay 6th International Global
More informationExtracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models
Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Richard Johansson and Alessandro Moschitti DISI, University of Trento Via Sommarive 14, 38123 Trento (TN),
More informationTerm Weighting based on Document Revision History
Term Weighting based on Document Revision History Sérgio Nunes, Cristina Ribeiro, and Gabriel David INESC Porto, DEI, Faculdade de Engenharia, Universidade do Porto. Rua Dr. Roberto Frias, s/n. 4200-465
More informationLQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization
LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization Annemarie Friedrich, Marina Valeeva and Alexis Palmer COMPUTATIONAL LINGUISTICS & PHONETICS SAARLAND UNIVERSITY, GERMANY
More informationLearning From the Past with Experiment Databases
Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University
More informationGACE Computer Science Assessment Test at a Glance
GACE Computer Science Assessment Test at a Glance Updated May 2017 See the GACE Computer Science Assessment Study Companion for practice questions and preparation resources. Assessment Name Computer Science
More informationFinding Translations in Scanned Book Collections
Finding Translations in Scanned Book Collections Ismet Zeki Yalniz Dept. of Computer Science University of Massachusetts Amherst, MA, 01003 zeki@cs.umass.edu R. Manmatha Dept. of Computer Science University
More informationRegression for Sentence-Level MT Evaluation with Pseudo References
Regression for Sentence-Level MT Evaluation with Pseudo References Joshua S. Albrecht and Rebecca Hwa Department of Computer Science University of Pittsburgh {jsa8,hwa}@cs.pitt.edu Abstract Many automatic
More informationPOS tagging of Chinese Buddhist texts using Recurrent Neural Networks
POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important
More informationCaMLA Working Papers
CaMLA Working Papers 2015 02 The Characteristics of the Michigan English Test Reading Texts and Items and their Relationship to Item Difficulty Khaled Barkaoui York University Canada 2015 The Characteristics
More informationOn document relevance and lexical cohesion between query terms
Information Processing and Management 42 (2006) 1230 1247 www.elsevier.com/locate/infoproman On document relevance and lexical cohesion between query terms Olga Vechtomova a, *, Murat Karamuftuoglu b,
More informationStacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes
Stacks Teacher notes Activity description (Interactive not shown on this sheet.) Pupils start by exploring the patterns generated by moving counters between two stacks according to a fixed rule, doubling
More informationThe Role of the Head in the Interpretation of English Deverbal Compounds
The Role of the Head in the Interpretation of English Deverbal Compounds Gianina Iordăchioaia i, Lonneke van der Plas ii, Glorianna Jagfeld i (Universität Stuttgart i, University of Malta ii ) Wen wurmt
More informationAGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS
AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic
More informationCombining a Chinese Thesaurus with a Chinese Dictionary
Combining a Chinese Thesaurus with a Chinese Dictionary Ji Donghong Kent Ridge Digital Labs 21 Heng Mui Keng Terrace Singapore, 119613 dhji @krdl.org.sg Gong Junping Department of Computer Science Ohio
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationSEMAFOR: Frame Argument Resolution with Log-Linear Models
SEMAFOR: Frame Argument Resolution with Log-Linear Models Desai Chen or, The Case of the Missing Arguments Nathan Schneider SemEval July 16, 2010 Dipanjan Das School of Computer Science Carnegie Mellon
More informationMachine Learning from Garden Path Sentences: The Application of Computational Linguistics
Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,
More informationA Right to Access Implies A Right to Know: An Open Online Platform for Research on the Readability of Law
A Right to Access Implies A Right to Know: An Open Online Platform for Research on the Readability of Law Michael Curtotti* Eric McCreathº * Legal Counsel, ANU Students Association & ANU Postgraduate and
More informationCompositional Semantics
Compositional Semantics CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Words, bag of words Sequences Trees Meaning Representing Meaning An important goal of NLP/AI: convert natural language
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationMultilingual Sentiment and Subjectivity Analysis
Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department
More informationTHE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS
THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS ELIZABETH ANNE SOMERS Spring 2011 A thesis submitted in partial
More informationOPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS
OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,
More informationThe Smart/Empire TIPSTER IR System
The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationOutline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt
Outline Using Web Data for Linguistic Purposes NCLT, Dublin City University Outline Outline 1 Corpora as linguistic tools 2 Limitations of web data Strategies to enhance web data 3 Corpora as linguistic
More informationBuild on students informal understanding of sharing and proportionality to develop initial fraction concepts.
Recommendation 1 Build on students informal understanding of sharing and proportionality to develop initial fraction concepts. Students come to kindergarten with a rudimentary understanding of basic fraction
More informationBANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS
Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.
More informationColumbia University at DUC 2004
Columbia University at DUC 2004 Sasha Blair-Goldensohn, David Evans, Vasileios Hatzivassiloglou, Kathleen McKeown, Ani Nenkova, Rebecca Passonneau, Barry Schiffman, Andrew Schlaikjer, Advaith Siddharthan,
More informationA Graph Based Authorship Identification Approach
A Graph Based Authorship Identification Approach Notebook for PAN at CLEF 2015 Helena Gómez-Adorno 1, Grigori Sidorov 1, David Pinto 2, and Ilia Markov 1 1 Center for Computing Research, Instituto Politécnico
More informationLanguage Acquisition Chart
Language Acquisition Chart This chart was designed to help teachers better understand the process of second language acquisition. Please use this chart as a resource for learning more about the way people
More informationVocabulary Usage and Intelligibility in Learner Language
Vocabulary Usage and Intelligibility in Learner Language Emi Izumi, 1 Kiyotaka Uchimoto 1 and Hitoshi Isahara 1 1. Introduction In verbal communication, the primary purpose of which is to convey and understand
More informationFAQ (Frequently Asked Questions)
FAQ (Frequently Asked Questions) Q. How can we contact the DIGITAL EDUCATION PROJECT and the NATIONAL DIGITAL SCHOOLBOOK LIBRARY PROGRAM for additional information and questions? A. VISIT OUR WEBSITE at
More informationScienceDirect. Malayalam question answering system
Available online at www.sciencedirect.com ScienceDirect Procedia Technology 24 (2016 ) 1388 1392 International Conference on Emerging Trends in Engineering, Science and Technology (ICETEST - 2015) Malayalam
More informationSyntactic and Lexical Simplification: The Impact on EFL Listening Comprehension at Low and High Language Proficiency Levels
ISSN 1798-4769 Journal of Language Teaching and Research, Vol. 5, No. 3, pp. 566-571, May 2014 Manufactured in Finland. doi:10.4304/jltr.5.3.566-571 Syntactic and Lexical Simplification: The Impact on
More informationAccurate Unlexicalized Parsing for Modern Hebrew
Accurate Unlexicalized Parsing for Modern Hebrew Reut Tsarfaty and Khalil Sima an Institute for Logic, Language and Computation, University of Amsterdam Plantage Muidergracht 24, 1018TV Amsterdam, The
More informationInformatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy
Informatics 2A: Language Complexity and the Chomsky Hierarchy September 28, 2010 Starter 1 Is there a finite state machine that recognises all those strings s from the alphabet {a, b} where the difference
More informationTowards a MWE-driven A* parsing with LTAGs [WG2,WG3]
Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Jakub Waszczuk, Agata Savary To cite this version: Jakub Waszczuk, Agata Savary. Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]. PARSEME 6th general
More informationRover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes
Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes WHAT STUDENTS DO: Establishing Communication Procedures Following Curiosity on Mars often means roving to places with interesting
More information