Natural Language Processing
|
|
- Rolf Thomas
- 6 years ago
- Views:
Transcription
1 Natural Language Processing Part-of-Speech Tagging Joakim Nivre Uppsala University Department of Linguistics and Philology Natural Language Processing 1(13)
2 Parts of Speech I Basic grammatical categories used since antiquity: 1. Noun 2. Verb 3. Adjective 4. Adverb 5. Preposition 6. Pronoun 7. Conjunction 8. Interjection I Lots of debate in linguistics about their nature and universality I Nevertheless very robust and useful for NLP Natural Language Processing 2(13)
3 Part-of-Speech Tagging I Assign a part-of-speech tag to every word of a sentence Word Tag Holmes PROPN put VERB the DET keys NOUN on ADP the DET table NOUN. PUNCT Natural Language Processing 3(13)
4 Why is PoS tagging useful? I First step in a vast number of practical tasks 1. Text-to-speech how to pronounce lead or insult? 2. Parsing need to know if a word is NOUN or VERB 3. Information extraction finding names, relations, etc. I Used as a backoff model for word tokens (sparse data) Natural Language Processing 4(13)
5 Why is PoS tagging hard? I Lexical ambiguity: 1. Prince is expected to race/verb tomorrow 2. People wonder about the race/noun for outer space I Unknown words: 1. The rural Babbitt who bloviates about progress and growth Natural Language Processing 5(13)
6 How is it done? I Lexical information (the word itself) I Known words can be looked up in a lexicon listing possible tags for each word I Unknown words can be analyzed with respect to affixes, capitalization, special symbols, etc. I Contextual information (surrounding words) I A language model can rank tags in context I Many different models and techniques (more later) Natural Language Processing 6(13)
7 Quiz 1 I Consider the following incomplete sentence: She sent a letter to the... I Which parts of speech are likely to occur next? 1. ADJ 2. NOUN 3. VERB Natural Language Processing 7(13)
8 Tag Sets I There are many potential distinctions we can draw I Tag sets range from coarse-grained to fine-grained 1. Universal Dependencies: 17 tags 2. Penn Treebank, English: 45 tags 3. SUC, Swedish: 25 tags + features 150 tags I Choice of tag set may depend on application Natural Language Processing 8(13)
9 Universal Dependencies (UD) Open class words Closed class words Other ADJ adjective ADP preposition/postposition PUNCT punctuation ADV adverb AUX auxiliary verb SYM symbol INTJ interjection CONJ coordinating conjunction X unspecified NOUN noun DET determiner PROPN proper noun NUM numeral VERB verb PART particle PRON pronoun SCONJ subordinating conjunction Natural Language Processing 9(13)
10 Penn Treebank Natural Language Processing 10(13)
11 How hard is PoS tagging? Natural Language Processing 11(13)
12 Evaluation I Evaluation against a manually annotated gold standard I Evaluation metrics: I Accuracy = percentage of correctly tagged tokens I Separate results for ambiguous and/or unknown words I State of the art: I I I 96 98% for English news text What about Turkish? What about Twitter? Natural Language Processing 12(13)
13 Quiz 2 I Consider the following tagging: She/PRON won/verb the/det race/verb I What accuracy score would you give it? % 2. 75% 3. 50% Natural Language Processing 13(13)
14 Natural Language Processing Tagging Methods Joakim Nivre Uppsala University Department of Linguistics and Philology Natural Language Processing 1(14)
15 Part-of-Speech Tagging I Task: I Assign a part-of-speech tag to every word of a sentence I Useful tools and techniques: I Lexicon mapping words to possible tags I Linguistic rules for disambiguation in context I Statistical models of tags and words in context I Heuristics for handling unknown words I In this lecture: I Transformation-based tagging a rule-based approach I HMM tagging a statistical approach Natural Language Processing 2(14)
16 Transformation-Based Tagging I Assign each word its most frequent tag Prince is expected to race/noun tomorrow People wonder about the race/noun for outer space Prince is expected to run/verb tomorrow People wonder about the run/verb for charity Natural Language Processing 3(14)
17 Transformation-Based Tagging I Assign each word its most frequent tag Prince is expected to race/noun tomorrow People wonder about the race/noun for outer space Prince is expected to run/verb tomorrow People wonder about the run/verb for charity I Use a sequence of rules to refine the tagging 1. NOUN! VERB if preceding word is to 2. VERB! NOUN if preceding word is the Natural Language Processing 3(14)
18 Transformation-Based Tagging I Assign each word its most frequent tag Prince is expected to race/noun tomorrow People wonder about the race/noun for outer space Prince is expected to run/verb tomorrow People wonder about the run/verb for charity I Use a sequence of rules to refine the tagging 1. NOUN! VERB if preceding word is to 2. VERB! NOUN if preceding word is the Prince is expected to race/verb tomorrow People wonder about the race/noun for outer space Prince is expected to run/verb tomorrow People wonder about the run/noun for charity Natural Language Processing 3(14)
19 Transformation-Based Tagging I Learning a set of rules from a tagged corpus: 1. Define a set of rule templates 2. Assign every word its most frequent tag 3. Repeat until no further improvement: 3.1 Apply every rule to the current tagged corpus by itself 3.2 Add the best rule R to the sequence of rules 3.3 Transform the current tagged corpus using R Natural Language Processing 4(14)
20 Transformation-Based Tagging I Learning a set of rules from a tagged corpus: 1. Define a set of rule templates 2. Assign every word its most frequent tag 3. Repeat until no further improvement: 3.1 Apply every rule to the current tagged corpus by itself 3.2 Add the best rule R to the sequence of rules 3.3 Transform the current tagged corpus using R I Using the rules to tag a new text: 1. Assign every word its most frequent tag 2. For every rule R 1,...,R n in the learned sequence: 2.1 Transform the current tagged corpus using R i Natural Language Processing 4(14)
21 Quiz 1 I Consider the following initial taggings of the word light: light/verb the/det candle/noun see/verb the/det light/verb carry/verb the/det light/verb suitcase/noun... I And suppose we apply the following two rules in sequence: 1. VERB! NOUN if preceding word is DET 2. NOUN! ADJ if preceding word is DET and following word is NOUN I Which of the following statements are true? 1. All three occurrences are correctly tagged in the end 2. There is at least one error in the end tagging 3. Removing the second rule gives one more error in the end 4. Switching the order of the rules has no impact on the end result Natural Language Processing 5(14)
22 Statistical Tagging I Basic ideas: I Build a statistical model of words and their tags I Estimate model parameters from (tagged) corpus data I Use the model to assign the most probable tags to words I Example: I Part-of-speech tagging using Hidden Markov Models (HMM) Natural Language Processing 6(14)
23 Hidden Markov Models I Markov models are probabilistic sequence models used for problems such as: 1. Speech recognition 2. Spell checking 3. Part-of-speech tagging 4. Named entity recognition I Given a word sequence w 1,...,w n,wewantto find the most probable tag sequence t 1,...,t n : argmax t 1,...,t n P(t 1,...,t n w 1,...,w n ) Natural Language Processing 7(14)
24 Model Construction I Bayesian inversion: P(t 1,...,t n w 1,...,w n )= P(t 1,...,t n )P(w 1,...,w n t 1,...,t n ) P(w 1,...,w n ) I Submodels: 1. Prior: P(t 1,...,t n ) 2. Likelihood: P(w 1,...,w n t 1,...,t n ) 3. Marginal: P(w 1,...,w n ) canbeignoredinargmaxsearch Natural Language Processing 8(14)
25 Markov Assumptions I Context model (prior) P(t 1,...,t n )= I Lexical model (likelihood) ny P(t i t i k...,t i 1 ) i=1 ny P(w 1,...,w n t 1,...,t n )= P(w i t i ) i=1 Natural Language Processing 9(14)
26 Model Parameters I Contextual probabilities P(t i t i k,...,t i 1 ) I Lexical probabilities P(w i t i ) I We can estimate these probabilities from a tagged corpus: ˆP MLE (w i t i )= f (w i, t i ) f (t i ) ˆP MLE (t i t i k,...,t i 1 )= f (t i k,...,t i 1, t i ) f (t i k,...,t i 1 ) Natural Language Processing 10(14)
27 Computing Probabilities I The probability of a tagging: P(t 1,...,t n, w 1,...,w n )= ny P(t i t i k,...,t i 1 )P(w i t i ) i=1 I Finding the most probable tagging: argmax t 1,...,t n ny P(t i t i k,...,t i 1 )P(w i t i ) i=1 I This requires an efficient algorithm (more later) Natural Language Processing 11(14)
28 Example P(she PRON) = 0.1 P(PRON START) = 0.5 P(can AUX) = 0.2 P(AUX PRON) = 0.2 P(can NOUN) = P(NOUN PRON) = P(run VERB) = 0.01 P(VERB AUX) = 0.5 P(run NOUN) = P(NOUN AUX) = P(VERB NOUN) = 0.2 P(NOUN NOUN) = 0.1 Natural Language Processing 12(14)
29 Example P(she PRON) = 0.1 P(PRON START) = 0.5 P(can AUX) = 0.2 P(AUX PRON) = 0.2 P(can NOUN) = P(NOUN PRON) = P(run VERB) = 0.01 P(VERB AUX) = 0.5 P(run NOUN) = P(NOUN AUX) = P(VERB NOUN) = 0.2 P(NOUN NOUN) = 0.1 P(she/PRON can/aux run/verb) = = Natural Language Processing 12(14)
30 Example P(she PRON) = 0.1 P(PRON START) = 0.5 P(can AUX) = 0.2 P(AUX PRON) = 0.2 P(can NOUN) = P(NOUN PRON) = P(run VERB) = 0.01 P(VERB AUX) = 0.5 P(run NOUN) = P(NOUN AUX) = P(VERB NOUN) = 0.2 P(NOUN NOUN) = 0.1 P(she/PRON can/aux run/verb) = = P(she/PRON can/noun run/noun) = = Natural Language Processing 12(14)
31 Fundamental Problems I Decoding: I How do we compute the best tag sequence given parameters? I Learning: I How do we estimate the parameters? Natural Language Processing 13(14)
32 Quiz 2 I Consider this simple HMM for tagging: P(she PRON) = 0.1 P(PRON START) = 0.5 P(can AUX) = 0.2 P(AUX PRON) = 0.2 P(can NOUN) = P(NOUN PRON) = P(run VERB) = 0.01 P(VERB AUX) = 0.5 P(run NOUN) = P(NOUN AUX) = P(VERB NOUN) = 0.2 P(NOUN NOUN) = 0.1 I Which of the following statements are true? 1. The probability that can is a NOUN is The probability that the word after an AUX is not a VERB is P(she/PRON can/aux) > P(she/PRON can/noun) Natural Language Processing 14(14)
Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities
Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion
More information2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases
POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz
More informationBULATS A2 WORDLIST 2
BULATS A2 WORDLIST 2 INTRODUCTION TO THE BULATS A2 WORDLIST 2 The BULATS A2 WORDLIST 21 is a list of approximately 750 words to help candidates aiming at an A2 pass in the Cambridge BULATS exam. It is
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationTarget Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data
Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se
More informationESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly
ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly Inflected Languages Classical Approaches to Tagging The slides are posted on the web. The url is http://chss.montclair.edu/~feldmana/esslli10/.
More information11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation
tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each
More informationHeuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger
Page 1 of 35 Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger Kaihong Liu, MD, MS, Wendy Chapman, PhD, Rebecca Hwa, PhD, and Rebecca S. Crowley, MD, MS
More informationCS 598 Natural Language Processing
CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@
More informationAn Evaluation of POS Taggers for the CHILDES Corpus
City University of New York (CUNY) CUNY Academic Works Dissertations, Theses, and Capstone Projects Graduate Center 9-30-2016 An Evaluation of POS Taggers for the CHILDES Corpus Rui Huang The Graduate
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationSpecifying a shallow grammatical for parsing purposes
Specifying a shallow grammatical for parsing purposes representation Atro Voutilainen and Timo J~irvinen Research Unit for Multilingual Language Technology P.O. Box 4 FIN-0004 University of Helsinki Finland
More informationGrammars & Parsing, Part 1:
Grammars & Parsing, Part 1: Rules, representations, and transformations- oh my! Sentence VP The teacher Verb gave the lecture 2015-02-12 CS 562/662: Natural Language Processing Game plan for today: Review
More informationCh VI- SENTENCE PATTERNS.
Ch VI- SENTENCE PATTERNS faizrisd@gmail.com www.pakfaizal.com It is a common fact that in the making of well-formed sentences we badly need several syntactic devices used to link together words by means
More informationWords come in categories
Nouns Words come in categories D: A grammatical category is a class of expressions which share a common set of grammatical properties (a.k.a. word class or part of speech). Words come in categories Open
More informationTraining and evaluation of POS taggers on the French MULTITAG corpus
Training and evaluation of POS taggers on the French MULTITAG corpus A. Allauzen, H. Bonneau-Maynard LIMSI/CNRS; Univ Paris-Sud, Orsay, F-91405 {allauzen,maynard}@limsi.fr Abstract The explicit introduction
More informationTaught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,
First Grade Standards These are the standards for what is taught in first grade. It is the expectation that these skills will be reinforced after they have been taught. Taught Throughout the Year Foundational
More informationarxiv:cmp-lg/ v1 7 Jun 1997 Abstract
Comparing a Linguistic and a Stochastic Tagger Christer Samuelsson Lucent Technologies Bell Laboratories 600 Mountain Ave, Room 2D-339 Murray Hill, NJ 07974, USA christer@research.bell-labs.com Atro Voutilainen
More informationDevelopment of the First LRs for Macedonian: Current Projects
Development of the First LRs for Macedonian: Current Projects Ruska Ivanovska-Naskova Faculty of Philology- University St. Cyril and Methodius Bul. Krste Petkov Misirkov bb, 1000 Skopje, Macedonia rivanovska@flf.ukim.edu.mk
More informationLanguage Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus
Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,
More informationLeveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word Similarity Balamurali A.R., Subhabrata Mukherjee, Akshat Malu and Pushpak Bhattacharyya Dept. of Computer Science and Engineering, IIT Bombay 6th International Global
More informationCross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels
Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels Jörg Tiedemann Uppsala University Department of Linguistics and Philology firstname.lastname@lingfil.uu.se Abstract
More information! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,
! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &, 4 The Interaction of Knowledge Sources in Word Sense Disambiguation Mark Stevenson Yorick Wilks University of Shef eld University of Shef eld Word sense
More informationIndian Institute of Technology, Kanpur
Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar
More informationParsing of part-of-speech tagged Assamese Texts
IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal
More informationEnsemble Technique Utilization for Indonesian Dependency Parser
Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id
More information1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature
1 st Grade Curriculum Map Common Core Standards Language Arts 2013 2014 1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature Key Ideas and Details
More informationEdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationThe stages of event extraction
The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks
More informationBasic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1
Basic Parsing with Context-Free Grammars Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Announcements HW 2 to go out today. Next Tuesday most important for background to assignment Sign up
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationIntroduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.
to as a linguistic theory to to a member of the family of linguistic frameworks that are called generative grammars a grammar which is formalized to a high degree and thus makes exact predictions about
More informationThe Internet as a Normative Corpus: Grammar Checking with a Search Engine
The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a
More informationReading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-
New York Grade 7 Core Performance Indicators Grades 7 8: common to all four ELA standards Throughout grades 7 and 8, students demonstrate the following core performance indicators in the key ideas of reading,
More informationWriting a composition
A good composition has three elements: Writing a composition an introduction: A topic sentence which contains the main idea of the paragraph. a body : Supporting sentences that develop the main idea. a
More informationHoughton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)
Houghton Mifflin Reading Correlation to the Standards for English Language Arts (Grade1) 8.3 JOHNNY APPLESEED Biography TARGET SKILLS: 8.3 Johnny Appleseed Phonemic Awareness Phonics Comprehension Vocabulary
More informationChapter 4: Valence & Agreement CSLI Publications
Chapter 4: Valence & Agreement Reminder: Where We Are Simple CFG doesn t allow us to cross-classify categories, e.g., verbs can be grouped by transitivity (deny vs. disappear) or by number (deny vs. denies).
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationUniversity of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma
University of Alberta Large-Scale Semi-Supervised Learning for Natural Language Processing by Shane Bergsma A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of
More informationGrade 7. Prentice Hall. Literature, The Penguin Edition, Grade Oregon English/Language Arts Grade-Level Standards. Grade 7
Grade 7 Prentice Hall Literature, The Penguin Edition, Grade 7 2007 C O R R E L A T E D T O Grade 7 Read or demonstrate progress toward reading at an independent and instructional reading level appropriate
More informationAnalysis of Probabilistic Parsing in NLP
Analysis of Probabilistic Parsing in NLP Krishna Karoo, Dr.Girish Katkar Research Scholar, Department of Electronics & Computer Science, R.T.M. Nagpur University, Nagpur, India Head of Department, Department
More informationFirst Grade Curriculum Highlights: In alignment with the Common Core Standards
First Grade Curriculum Highlights: In alignment with the Common Core Standards ENGLISH LANGUAGE ARTS Foundational Skills Print Concepts Demonstrate understanding of the organization and basic features
More informationBANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS
Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.
More informationChunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.
NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and
More informationDeveloping Grammar in Context
Developing Grammar in Context intermediate with answers Mark Nettle and Diana Hopkins PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE The Pitt Building, Trumpington Street, Cambridge, United
More informationPOS tagging of Chinese Buddhist texts using Recurrent Neural Networks
POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important
More informationMyths, Legends, Fairytales and Novels (Writing a Letter)
Assessment Focus This task focuses on Communication through the mode of Writing at Levels 3, 4 and 5. Two linked tasks (Hot Seating and Character Study) that use the same context are available to assess
More informationDEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS
DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS Julia Tmshkina Centre for Text Techitology, North-West University, 253 Potchefstroom, South Africa 2025770@puk.ac.za
More informationSample Goals and Benchmarks
Sample Goals and Benchmarks for Students with Hearing Loss In this document, you will find examples of potential goals and benchmarks for each area. Please note that these are just examples. You should
More informationPrediction of Maximal Projection for Semantic Role Labeling
Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba
More informationOpportunities for Writing Title Key Stage 1 Key Stage 2 Narrative
English Teaching Cycle The English curriculum at Wardley CE Primary is based upon the National Curriculum. Our English is taught through a text based curriculum as we believe this is the best way to develop
More informationThe Discourse Anaphoric Properties of Connectives
The Discourse Anaphoric Properties of Connectives Cassandre Creswell, Kate Forbes, Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi Λ, Bonnie Webber y Λ University of Pennsylvania 3401 Walnut Street Philadelphia,
More informationModeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures
Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Ulrike Baldewein (ulrike@coli.uni-sb.de) Computational Psycholinguistics, Saarland University D-66041 Saarbrücken,
More informationSenior Stenographer / Senior Typist Series (including equivalent Secretary titles)
New York State Department of Civil Service Committed to Innovation, Quality, and Excellence A Guide to the Written Test for the Senior Stenographer / Senior Typist Series (including equivalent Secretary
More informationLinguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis
International Journal of Arts Humanities and Social Sciences (IJAHSS) Volume 1 Issue 1 ǁ August 216. www.ijahss.com Linguistic Variation across Sports Category of Press Reportage from British Newspapers:
More informationUNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen
UNIVERSITY OF OSLO Department of Informatics Dialog Act Recognition using Dependency Features Master s thesis Sindre Wetjen November 15, 2013 Acknowledgments First I want to thank my supervisors Lilja
More informationThe Effect of Multiple Grammatical Errors on Processing Non-Native Writing
The Effect of Multiple Grammatical Errors on Processing Non-Native Writing Courtney Napoles Johns Hopkins University courtneyn@jhu.edu Aoife Cahill Nitin Madnani Educational Testing Service {acahill,nmadnani}@ets.org
More informationSurvey on parsing three dependency representations for English
Survey on parsing three dependency representations for English Angelina Ivanova Stephan Oepen Lilja Øvrelid University of Oslo, Department of Informatics { angelii oe liljao }@ifi.uio.no Abstract In this
More informationA Syllable Based Word Recognition Model for Korean Noun Extraction
are used as the most important terms (features) that express the document in NLP applications such as information retrieval, document categorization, text summarization, information extraction, and etc.
More informationDefragmenting Textual Data by Leveraging the Syntactic Structure of the English Language
Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Nathaniel Hayes Department of Computer Science Simpson College 701 N. C. St. Indianola, IA, 50125 nate.hayes@my.simpson.edu
More informationCORPUS ANALYSIS CORPUS ANALYSIS QUANTITATIVE ANALYSIS
CORPUS ANALYSIS Antonella Serra CORPUS ANALYSIS ITINEARIES ON LINE: SARDINIA, CAPRI AND CORSICA TOTAL NUMBER OF WORD TOKENS 13.260 TOTAL NUMBER OF WORD TYPES 3188 QUANTITATIVE ANALYSIS THE MOST SIGNIFICATIVE
More informationLTAG-spinal and the Treebank
LTAG-spinal and the Treebank a new resource for incremental, dependency and semantic parsing Libin Shen (lshen@bbn.com) BBN Technologies, 10 Moulton Street, Cambridge, MA 02138, USA Lucas Champollion (champoll@ling.upenn.edu)
More informationParsing Morphologically Rich Languages:
1 / 39 Rich Languages: Sandra Kübler Indiana University 2 / 39 Rich Languages joint work with Daniel Dakota, Wolfgang Maier, Joakim Nivre, Djamé Seddah, Reut Tsarfaty, Daniel Whyatt, and many more def.
More informationBasic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.
Basic Syntax Doug Arnold doug@essex.ac.uk We review some basic grammatical ideas and terminology, and look at some common constructions in English. 1 Categories 1.1 Word level (lexical and functional)
More informationMemory-based grammatical error correction
Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,
More informationCross-lingual Transfer Parsing for Low-Resourced Languages: An Irish Case Study
Cross-lingual Transfer Parsing for Low-Resourced Languages: An Irish Case Study Teresa Lynn 1,2, Jennifer Foster 1, Mark Dras 2 and Lamia Tounsi 1 1 CNGL, School of Computing, Dublin City University, Ireland
More informationCharacter Stream Parsing of Mixed-lingual Text
Character Stream Parsing of Mixed-lingual Text Harald Romsdorfer and Beat Pfister Speech Processing Group Computer Engineering and Networks Laboratory ETH Zurich {romsdorfer,pfister}@tik.ee.ethz.ch Abstract
More informationOakland Unified School District English/ Language Arts Course Syllabus
Oakland Unified School District English/ Language Arts Course Syllabus For Secondary Schools The attached course syllabus is a developmental and integrated approach to skill acquisition throughout the
More informationLoughton School s curriculum evening. 28 th February 2017
Loughton School s curriculum evening 28 th February 2017 Aims of this session Share our approach to teaching writing, reading, SPaG and maths. Share resources, ideas and strategies to support children's
More informationProject in the framework of the AIM-WEST project Annotation of MWEs for translation
Project in the framework of the AIM-WEST project Annotation of MWEs for translation 1 Agnès Tutin LIDILEM/LIG Université Grenoble Alpes 30 october 2014 Outline 2 Why annotate MWEs in corpora? A first experiment
More informationExperts Retrieval with Multiword-Enhanced Author Topic Model
NAACL 10 Workshop on Semantic Search Experts Retrieval with Multiword-Enhanced Author Topic Model Nikhil Johri Dan Roth Yuancheng Tu Dept. of Computer Science Dept. of Linguistics University of Illinois
More informationLarge vocabulary off-line handwriting recognition: A survey
Pattern Anal Applic (2003) 6: 97 121 DOI 10.1007/s10044-002-0169-3 ORIGINAL ARTICLE A. L. Koerich, R. Sabourin, C. Y. Suen Large vocabulary off-line handwriting recognition: A survey Received: 24/09/01
More informationMultilingual Sentiment and Subjectivity Analysis
Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department
More informationNatural Language Processing. George Konidaris
Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2017 Natural Language Processing Understanding spoken/written sentences in a natural language. Major area of research in AI. Why? Humans
More informationknarrator: A Model For Authors To Simplify Authoring Process Using Natural Language Processing To Portuguese
knarrator: A Model For Authors To Simplify Authoring Process Using Natural Language Processing To Portuguese Adriano Kerber Daniel Camozzato Rossana Queiroz Vinícius Cassol Universidade do Vale do Rio
More informationCan Human Verb Associations help identify Salient Features for Semantic Verb Classification?
Can Human Verb Associations help identify Salient Features for Semantic Verb Classification? Sabine Schulte im Walde Institut für Maschinelle Sprachverarbeitung Universität Stuttgart Seminar für Sprachwissenschaft,
More informationA Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles
A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles Rayner Alfred 1, Adam Mujat 1, and Joe Henry Obit 2 1 School of Engineering and Information Technology, Universiti Malaysia Sabah, Jalan
More informationAdvanced Grammar in Use
Advanced Grammar in Use A self-study reference and practice book for advanced learners of English Third Edition with answers and CD-ROM cambridge university press cambridge, new york, melbourne, madrid,
More informationThe Choice of Features for Classification of Verbs in Biomedical Texts
The Choice of Features for Classification of Verbs in Biomedical Texts Anna Korhonen University of Cambridge Computer Laboratory 15 JJ Thomson Avenue Cambridge CB3 0FD, UK alk23@cl.cam.ac.uk Yuval Krymolowski
More informationA Bayesian Learning Approach to Concept-Based Document Classification
Databases and Information Systems Group (AG5) Max-Planck-Institute for Computer Science Saarbrücken, Germany A Bayesian Learning Approach to Concept-Based Document Classification by Georgiana Ifrim Supervisors
More informationAdjectives tell you more about a noun (for example: the red dress ).
Curriculum Jargon busters Grammar glossary Key: Words in bold are examples. Words underlined are terms you can look up in this glossary. Words in italics are important to the definition. Term Adjective
More informationTwitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationBASIC ENGLISH. Book GRAMMAR
BASIC ENGLISH Book 1 GRAMMAR Anne Seaton Y. H. Mew Book 1 Three Watson Irvine, CA 92618-2767 Web site: www.sdlback.com First published in the United States by Saddleback Educational Publishing, 3 Watson,
More informationEAGLE: an Error-Annotated Corpus of Beginning Learner German
EAGLE: an Error-Annotated Corpus of Beginning Learner German Adriane Boyd Department of Linguistics The Ohio State University adriane@ling.osu.edu Abstract This paper describes the Error-Annotated German
More information4 th Grade Reading Language Arts Pacing Guide
TN Ready Domains Foundational Skills Writing Standards to Emphasize in Various Lessons throughout the Entire Year State TN Ready Standards I Can Statement Assessment Information RF.4.3 : Know and apply
More informationNCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches
NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches Yu-Chun Wang Chun-Kai Wu Richard Tzong-Han Tsai Department of Computer Science
More informationConstructing Parallel Corpus from Movie Subtitles
Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationRefining the Design of a Contracting Finite-State Dependency Parser
Refining the Design of a Contracting Finite-State Dependency Parser Anssi Yli-Jyrä and Jussi Piitulainen and Atro Voutilainen The Department of Modern Languages PO Box 3 00014 University of Helsinki {anssi.yli-jyra,jussi.piitulainen,atro.voutilainen}@helsinki.fi
More informationTHE VERB ARGUMENT BROWSER
THE VERB ARGUMENT BROWSER Bálint Sass sass.balint@itk.ppke.hu Péter Pázmány Catholic University, Budapest, Hungary 11 th International Conference on Text, Speech and Dialog 8-12 September 2008, Brno PREVIEW
More informationThe Smart/Empire TIPSTER IR System
The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of
More informationAQUA: An Ontology-Driven Question Answering System
AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.
More informationVocabulary Usage and Intelligibility in Learner Language
Vocabulary Usage and Intelligibility in Learner Language Emi Izumi, 1 Kiyotaka Uchimoto 1 and Hitoshi Isahara 1 1. Introduction In verbal communication, the primary purpose of which is to convey and understand
More informationSAMPLE. Chapter 1: Background. A. Basic Introduction. B. Why It s Important to Teach/Learn Grammar in the First Place
Contents Chapter One: Background Page 1 Chapter Two: Implementation Page 7 Chapter Three: Materials Page 13 A. Reproducible Help Pages Page 13 B. Reproducible Marking Guide Page 22 C. Reproducible Sentence
More information5 th Grade Language Arts Curriculum Map
5 th Grade Language Arts Curriculum Map Quarter 1 Unit of Study: Launching Writer s Workshop 5.L.1 - Demonstrate command of the conventions of Standard English grammar and usage when writing or speaking.
More informationEmmaus Lutheran School English Language Arts Curriculum
Emmaus Lutheran School English Language Arts Curriculum Rationale based on Scripture God is the Creator of all things, including English Language Arts. Our school is committed to providing students with
More informationThe College Board Redesigned SAT Grade 12
A Correlation of, 2017 To the Redesigned SAT Introduction This document demonstrates how myperspectives English Language Arts meets the Reading, Writing and Language and Essay Domains of Redesigned SAT.
More informationUniversiteit Leiden ICT in Business
Universiteit Leiden ICT in Business Ranking of Multi-Word Terms Name: Ricardo R.M. Blikman Student-no: s1184164 Internal report number: 2012-11 Date: 07/03/2013 1st supervisor: Prof. Dr. J.N. Kok 2nd supervisor:
More information