Problems in Current Text Simplification Research
|
|
- Branden Bradford
- 5 years ago
- Views:
Transcription
1 Problems in Current Text Simplification Research Wei Xu Chris Callison-Burch Courtney Napoles UPenn UPenn JHU TACL EMNLP Sep
2 What is Text Simplification
3 What is Text Simplification INPUT Applesauce is a puree made of apples.
4 What is Text Simplification INPUT OUT-1 Applesauce is a puree made of apples. Applesauce is a soft paste.
5 What is Text Simplification INPUT OUT-1 OUT-2 Applesauce is a puree made of apples. Applesauce is a soft paste. Applesauce is a paste. It is made of apples.
6 What is Text Simplification INPUT OUT-1 OUT-2 Applesauce is a puree made of apples. Applesauce is a soft paste. Applesauce is a paste. It is made of apples. for children, disabled, non-native speakers for other NLP tasks (MT, summarization )
7 What is Text Simplification paraphrasing INPUT OUT-1 OUT-2 Applesauce is a puree made of apples. Applesauce is a soft paste. Applesauce is a paste. It is made of apples. for children, disabled, non-native speakers for other NLP tasks (MT, summarization )
8 What is Text Simplification paraphrasing deletion INPUT OUT-1 OUT-2 Applesauce is a puree made of apples. Applesauce is a soft paste. Applesauce is a paste. It is made of apples. for children, disabled, non-native speakers for other NLP tasks (MT, summarization )
9 What is Text Simplification!! paraphrasing deletion splitting INPUT OUT-1 OUT-2 Applesauce is a puree made of apples. Applesauce is a soft paste. Applesauce is a paste. It is made of apples. for children, disabled, non-native speakers for other NLP tasks (MT, summarization )
10 What is Text Simplification!! paraphrasing deletion splitting INPUT OUT-1 OUT-2 Applesauce is a puree made of apples. Applesauce is a soft paste. Applesauce is a paste. It is made of apples. for children, disabled, non-native speakers for other NLP tasks (MT, summarization )
11 Goal of Text Simplification INPUT OUT-1 OUT-2 Applesauce is a puree made of apples. Applesauce is a soft paste. Applesauce is a paste. It is made of apples.
12 Goal of Text Simplification grammaticality INPUT OUT-1 OUT-2 Applesauce is a puree made of apples. Applesauce is a soft paste. Applesauce is a paste. It is made of apples.
13 Goal of Text Simplification grammaticality meaning preservation INPUT OUT-1 OUT-2 Applesauce is a puree made of apples. Applesauce is a soft paste. Applesauce is a paste. It is made of apples.
14 Goal of Text Simplification grammaticality meaning preservation simplicity INPUT OUT-1 OUT-2 Applesauce is a puree made of apples. Applesauce is a soft paste. Applesauce is a paste. It is made of apples.
15 Goal of Text Simplification grammaticality meaning preservation simplicity INPUT OUT-1 Applesauce is a puree made of apples. Applesauce is a soft paste. Human Evaluation OUT-2 Applesauce is a paste. It is made of apples
16 Goal of Text Simplification grammaticality meaning preservation simplicity INPUT OUT-1 Applesauce is a puree made of apples. Applesauce is a soft paste. Human Evaluation OUT-2 Applesauce is a paste. It is made of apples (no reliable automatic evaluation yet)
17 Brief History of Sentence Simplification
18 Brief History of Sentence Simplification rule-based 1997 Chandrasekar & Srinivas 1999 Dras (PhD thesis) 2000 Carroll, Minnen, Pearce, Canning, Devlin 2002 Canning (PhD thesis) 2004 Siddharthan (PhD thesis) 2010 Zhu, Bernhard, Gurevych 2011 Woodsend & Lapata 2011 Coster & Kauchak 2012 Wubben, van den Bosch, Krahmer 2014 Narayan & Gardent 2014 Siddharthan (Survey) 2014 Angrosh, Nomoto, Siddharthan 2014 Narayan (PhD thesis) Now Xu, Callison-Burch, Napoles (Opinion)
19 Brief History of Sentence Simplification rule-based 1997 Chandrasekar & Srinivas 1999 Dras (PhD thesis) 2000 Carroll, Minnen, Pearce, Canning, Devlin 2002 Canning (PhD thesis) 2004 Siddharthan (PhD thesis) 2010 Zhu, Bernhard, Gurevych 2011 Woodsend & Lapata 2011 Coster & Kauchak 2012 Wubben, van den Bosch, Krahmer 2014 Narayan & Gardent 2014 Siddharthan (Survey) 2014 Angrosh, Nomoto, Siddharthan 2014 Narayan (PhD thesis) Now Xu, Callison-Burch, Napoles (Opinion)
20 Parallel Wikipedia Corpus
21 Brief History of Sentence Simplification rule-based machine translation 1997 Chandrasekar & Srinivas 1999 Dras (PhD thesis) 2000 Carroll, Minnen, Pearce, Canning, Devlin 2002 Canning (PhD thesis) 2004 Siddharthan (PhD thesis) 2010 Zhu, Bernhard, Gurevych 2011 Woodsend & Lapata 2011 Coster & Kauchak 2012 Wubben, van den Bosch, Krahmer 2014 Narayan & Gardent 2014 Siddharthan (Survey) 2014 Angrosh, Nomoto, Siddharthan 2014 Narayan (PhD thesis) Now Xu, Callison-Burch, Napoles (Opinion)
22 Brief History of Sentence Simplification rule-based machine translation 1997 Chandrasekar & Srinivas 1999 Dras (PhD thesis) 2000 Carroll, Minnen, Pearce, Canning, Devlin 2002 Canning (PhD thesis) 2004 Siddharthan (PhD thesis) 2010 Zhu, Bernhard, Gurevych 2011 Woodsend & Lapata 2011 Coster & Kauchak 2012 Wubben, van den Bosch, Krahmer 2014 Narayan & Gardent 2014 Siddharthan (Survey) 2014 Angrosh, Nomoto, Siddharthan 2014 Narayan (PhD thesis) Now Xu, Callison-Burch, Napoles (Opinion)
23 Brief History of Sentence Simplification rule-based machine translation 1997 Chandrasekar & Srinivas 1999 Dras (PhD thesis) 2000 Carroll, Minnen, Pearce, Canning, Devlin 2002 Canning (PhD thesis) 2004 Siddharthan (PhD thesis) 2010 Zhu, Bernhard, Gurevych 2011 Woodsend & Lapata 2011 Coster & Kauchak 2012 Wubben, van den Bosch, Krahmer 2014 Narayan & Gardent 2014 Siddharthan (Survey) 2014 Angrosh, Nomoto, Siddharthan 2014 Narayan (PhD thesis) Now Xu, Callison-Burch, Napoles (Opinion)
24 Problems in Simplification Research State-of-the-art evaluation is suboptimal. But we have been doing this in the past 5 years*. Simple Wikipedia data dominated in the past 5 years. But its quality was taken for granted. It limits the scope of research. * (Angrosh et al. 2014) tried comprehension quiz
25 Why this is important? Breakthrough on Sea wind direction wind direction better understanding better review more diverse research better data and evaluation better model a straight path upwind a zigzag path upwind
26 Why this is important? Breakthrough on Sea wind direction wind direction better understanding better review more diverse research better data and evaluation better model a straight path upwind a zigzag path upwind
27 Why this is important? Simplification Breakthrough on Sea wind direction wind direction better understanding better review more diverse research better data and evaluation better model a straight path upwind a zigzag path upwind
28
29 Recently, there have been several attempts at addressing the text simplification task as a monolingual translation problem However, they did not try to seek reasons for the success or the failure of their systems. Štajner, Béchara, Saggion (2015)
30 Recently, there have been several attempts at addressing the text simplification task as a monolingual translation problem However, they did not try to seek reasons for the success or the failure of their systems. Štajner, Béchara, Saggion (2015) WHY DID THIS HAPPEN? state-of-the-art competition 1 2 not easy to do
31 Opinion #1 Current evaluation doesn t tell us what s going on.
32 System Comparability sub-systems paraphrasing evaluation criteria grammaticality deletion meaning preservation!! splitting simplicity not easy to measure
33 System Comparability sub-systems paraphrasing evaluation criteria grammaticality deletion meaning preservation!! splitting simplicity We need more controlled evaluation: not easy to measure
34 System Comparability sub-systems paraphrasing evaluation criteria grammaticality deletion meaning preservation!! splitting simplicity We need more controlled evaluation: - evaluate sub-tasks separately not easy to measure
35 System Comparability sub-systems paraphrasing evaluation criteria grammaticality deletion meaning preservation!! splitting simplicity We need more controlled evaluation: - evaluate sub-tasks separately not easy to measure - target specific audience (e.g year old)
36 Opinion #2 Simple Wikipedia is not that simple
37
38
39 Specific questions that need addressing are : we need to better understand the quality of Simple English Wikipedia, a resource that has been used to train many SMT based simplification systems Advaith Siddharthan (2014 Survey)
40 Specific questions that need addressing are : we need to better understand the quality of Simple English Wikipedia, a resource that has been used to train many SMT based simplification systems Advaith Siddharthan (2014 Survey) WHAT S NEW? We quantitively and systematically answer this quest.
41
42 Quality of Parallel Wikipedia Corpus* alignment error real simplification 17% 50% 33% not simpler
43 Inaccuracy in Parallel Wikipedia Corpus* alignment error real real (two sentences have different meaning) simplification 17% 50% 33% not not simpler
44 Inaccuracy in Parallel Wikipedia Corpus* alignment error 17% real real (two sentences have different meaning) simplification Best automatic sentence alignment gets about 0.7 F1 score (Hwang et al. 2015) 50% 33% not not simpler
45 Inadequacy in Parallel Wikipedia Corpus* alignment error 17% real real (two sentences have different meaning) simplification Best automatic sentence alignment gets about 0.7 F1 score (Hwang et al. 2015) 50% 33% not simpler
46 Inadequacy in Parallel Wikipedia Corpus* alignment error 17% real real (two sentences have different meaning) simplification Best automatic sentence alignment gets about 0.7 F1 score (Hwang et al. 2015) 50% 33% Sentences can have similar meaning but not simplification not simpler
47 Inadequacy in Parallel Wikipedia Corpus* alignment error real simplification (aligned and simpler) 17% 33% 50% not simpler
48 Inadequacy in Parallel Wikipedia Corpus* r real simplification (aligned and simpler)? 50%
49 Inadequacy in Parallel Wikipedia Corpus* r real simplification? (aligned and simpler) 50% Some sentences are simpler by only one word while the rest of sentence is still complex
50 Issues with Parallel Wikipedia Corpus
51 Issues with Parallel Wikipedia Corpus suboptimal for estimating translation probabilities
52 Issues with Parallel Wikipedia Corpus suboptimal for estimating translation probabilities suboptimal for developing automatic metrics
53 Issues with Parallel Wikipedia Corpus suboptimal for estimating translation probabilities suboptimal for developing automatic metrics suboptimal for tuning MT system
54 Issues with Parallel Wikipedia Corpus suboptimal for estimating translation probabilities suboptimal for developing automatic metrics suboptimal for tuning MT system unsuitable for document-level simplification
55 Opinion #3 New data can help
56 Newsela Dataset every article at 5 levels of simplification written by trained editors, comes with comprehension quizzes Wei Xu, Chris Callison- Burch, Courtney Napoles. Problems in Current Text Simplifica@on Research: New Data Can Help TACL (2015)
57 Wikipedia* Newsela alignment error 17% real simplification alignment error not simpler 2% 6% real simplification 50% 33% 92% not simpler manual inspection of aligned sentence pairs
58 Wikipedia* Newsela Good simplification needs more paraphrasing. deletion + paraphrase 24% deletion only 42% deletion + paraphrase deletion only 7% 20% paraphrase only 34% 74% paraphrase only degree of paraphrasing
59 Wikipedia* Newsela Good simplification could be much shorter. Normal Simple sentence length (#words) see syntax analysis in the paper
60 Wikipedia* (total 2.6 million tokens) Newsela (total 1.3 million tokens) Good simplification uses a much smaller vocabulary. Normal Simple Normal Simple 23,771 71,340 6,669 19,849 19, chimpanzee chimp 18% reduction 48% reduction vocabulary size (#unique words)
61 Wikipedia* Newsela Good simplification reduces certain function word usage. commune, as and northern northwestern film ; southwestern footballer, and " of which as percent including director most significantly reduced words (weighted log-odds-ratio analysis w/ informative Dirichlet prior)
62 Wikipedia* Newsela Normal Simple Postal officials recently tried to, which could. Postal officials recently tried to. That could. which which where where 0 2,000 4,000 6,000 8, approximately approximately most significantly reduced words see syntax analysis in the paper
63 Wikipedia* Newsela Normal Simple Postal officials recently tried to, which could. Postal officials recently tried to. That could. which which where where 0 2,000 4,000 6,000 8, approximately approximately most significantly reduced words see syntax analysis in the paper
64 Wikipedia* Newsela Normal Simple Postal officials recently tried to, which could. Postal officials recently tried to. That could. which which where where 0 2,000 4,000 6,000 8, approximately approximately most significantly reduced words see syntax analysis in the paper
65 Wikipedia* Newsela Wikipedia is not suitable for full-document simplification. 3.19% 57.28% document compression ratio (simple/normal) see discourse analysis in the paper
66 Opinion #1 Current evaluation doesn t tell us what s going on. Opinion #2 Simple Wikipedia is not that simple. Opinion #3 New data can help.
67 My Suggestions
68 My Suggestions to reviewers:
69 My Suggestions to reviewers: - be open-minded to papers that may not follow previous evaluation setup, may not outperform the state-of-theart on Wikipedia
70 My Suggestions to reviewers: - be open-minded to papers that may not follow previous evaluation setup, may not outperform the state-of-theart on Wikipedia - be sympathetic towards papers specially on data construction*, data analysis* and automatic evaluation metrics * (Pellow & Maxine, 2014 HCOMP; Marcelo & Specia, 2014 PITR)
71 My Suggestions to reviewers: - be open-minded to papers that may not follow previous evaluation setup, may not outperform the state-of-theart on Wikipedia - be sympathetic towards papers specially on data construction*, data analysis* and automatic evaluation metrics - read our paper * (Pellow & Maxine, 2014 HCOMP; Marcelo & Specia, 2014 PITR)
72 My Suggestions
73 My Suggestions to researchers:
74 My Suggestions to researchers: - consider working on text simplification ( pre-bleu age )
75 My Suggestions to researchers: - consider working on text simplification ( pre-bleu age ) - improve evaluation
76 My Suggestions to researchers: - consider working on text simplification ( pre-bleu age ) - improve evaluation - make your system replicable
77 My Suggestions to researchers: - consider working on text simplification ( pre-bleu age ) - improve evaluation - make your system replicable - read our paper
78 Thank you Questions? Opinions? Sponsor: NSF Newsela data are available at h5ps://newsela.com/data/
79 Back Up
80 Wikipedia* Newsela simple cue words complex conjunc1ons change of discourse connectives (odds-ratio)
81 Reasons of Quality Issues in Parallel Wikipedia Corpus The Simple Wikipedia was created by volunteers with no specific objective; Articles in Simple Wikipedia do not necessarily map Normal Wikipedia; As an encyclopedia, Wikipedia contains extremely difficulty words and sentences.
82 Newsela Dataset Original Simple-1 Simple-2 Simple-3 Slightly more fourth-graders nationwide are reading proficiently compared with a decade ago, but only a third of them are now reading well, according to a new report. Fourth-graders in most states are better readers than they were a decade ago. But only a third of them actually are able to read well, according to a new report. Fourth-graders in most states are better readers than they were a decade ago. But only a third of them actually are able to read well, according to a new report. Most fourth-graders are better readers than they were 10 years ago. But few of them can actually read well. Simple-4 Fourth-graders are better readers than 10 years ago. But few of them read well.
83 Newsela Dataset 1,130 news articles Time: 2013 January ~ 2015 March Source: Chicago Tribune, Seattle Times, LA Times, The Baltimore Sun Original: 56k sentences Simple: 64k sentences
Problems in Current Text Simplification Research: New Data Can Help
Problems in Current Text Simplification Research: New Data Can Help Wei Xu 1 and Chris Callison-Burch 1 and Courtney Napoles 2 1 Computer and Information Science Department University of Pennsylvania {xwe,
More informationarxiv: v1 [cs.cl] 2 Apr 2017
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,
More informationIntra-talker Variation: Audience Design Factors Affecting Lexical Selections
Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and
More informationNoisy SMS Machine Translation in Low-Density Languages
Noisy SMS Machine Translation in Low-Density Languages Vladimir Eidelman, Kristy Hollingshead, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department of
More informationDomain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling
Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling Pratyush Banerjee, Sudip Kumar Naskar, Johann Roturier 1, Andy Way 2, Josef van Genabith
More informationEffect of Word Complexity on L2 Vocabulary Learning
Effect of Word Complexity on L2 Vocabulary Learning Kevin Dela Rosa Language Technologies Institute Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA kdelaros@cs.cmu.edu Maxine Eskenazi Language
More informationDetecting English-French Cognates Using Orthographic Edit Distance
Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationSyntactic and Lexical Simplification: The Impact on EFL Listening Comprehension at Low and High Language Proficiency Levels
ISSN 1798-4769 Journal of Language Teaching and Research, Vol. 5, No. 3, pp. 566-571, May 2014 Manufactured in Finland. doi:10.4304/jltr.5.3.566-571 Syntactic and Lexical Simplification: The Impact on
More informationContent Language Objectives (CLOs) August 2012, H. Butts & G. De Anda
Content Language Objectives (CLOs) Outcomes Identify the evolution of the CLO Identify the components of the CLO Understand how the CLO helps provide all students the opportunity to access the rigor of
More informationGrade 11 Language Arts (2 Semester Course) CURRICULUM. Course Description ENGLISH 11 (2 Semester Course) Duration: 2 Semesters Prerequisite: None
Grade 11 Language Arts (2 Semester Course) CURRICULUM Course Description ENGLISH 11 (2 Semester Course) Duration: 2 Semesters Prerequisite: None Through the integrated study of literature, composition,
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationThe Karlsruhe Institute of Technology Translation Systems for the WMT 2011
The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 Teresa Herrmann, Mohammed Mediani, Jan Niehues and Alex Waibel Karlsruhe Institute of Technology Karlsruhe, Germany firstname.lastname@kit.edu
More informationThe Survey of Adult Skills (PIAAC) provides a picture of adults proficiency in three key information-processing skills:
SPAIN Key issues The gap between the skills proficiency of the youngest and oldest adults in Spain is the second largest in the survey. About one in four adults in Spain scores at the lowest levels in
More informationCROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2
1 CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 Peter A. Chew, Brett W. Bader, Ahmed Abdelali Proceedings of the 13 th SIGKDD, 2007 Tiago Luís Outline 2 Cross-Language IR (CLIR) Latent Semantic Analysis
More informationMulti-Lingual Text Leveling
Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency
More informationRe-evaluating the Role of Bleu in Machine Translation Research
Re-evaluating the Role of Bleu in Machine Translation Research Chris Callison-Burch Miles Osborne Philipp Koehn School on Informatics University of Edinburgh 2 Buccleuch Place Edinburgh, EH8 9LW callison-burch@ed.ac.uk
More informationApplications of memory-based natural language processing
Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal
More informationCross-lingual Text Fragment Alignment using Divergence from Randomness
Cross-lingual Text Fragment Alignment using Divergence from Randomness Sirvan Yahyaei, Marco Bonzanini, and Thomas Roelleke Queen Mary, University of London Mile End Road, E1 4NS London, UK {sirvan,marcob,thor}@eecs.qmul.ac.uk
More informationMemory-based grammatical error correction
Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,
More informationTarget Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data
Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se
More informationOn-the-Fly Customization of Automated Essay Scoring
Research Report On-the-Fly Customization of Automated Essay Scoring Yigal Attali Research & Development December 2007 RR-07-42 On-the-Fly Customization of Automated Essay Scoring Yigal Attali ETS, Princeton,
More informationPhysics 270: Experimental Physics
2017 edition Lab Manual Physics 270 3 Physics 270: Experimental Physics Lecture: Lab: Instructor: Office: Email: Tuesdays, 2 3:50 PM Thursdays, 2 4:50 PM Dr. Uttam Manna 313C Moulton Hall umanna@ilstu.edu
More informationMISSISSIPPI OCCUPATIONAL DIPLOMA EMPLOYMENT ENGLISH I: NINTH, TENTH, ELEVENTH AND TWELFTH GRADES
MISSISSIPPI OCCUPATIONAL DIPLOMA EMPLOYMENT ENGLISH I: NINTH, TENTH, ELEVENTH AND TWELFTH GRADES Students will: 1. Recognize main idea in written, oral, and visual formats. Examples: Stories, informational
More informationMany instructors use a weighted total to calculate their grades. This lesson explains how to set up a weighted total using categories.
Weighted Totals Many instructors use a weighted total to calculate their grades. This lesson explains how to set up a weighted total using categories. Set up your grading scheme in your syllabus Your syllabus
More informationReadability tools: are they useful for medical writers?
Readability tools: are they useful for medical writers? John Dixon MedComms Networking Event, 4th October, 2017 www.medcommsnetworking.com Libra Communications Training As I sincerely aspire to successfully
More informationLanguage Model and Grammar Extraction Variation in Machine Translation
Language Model and Grammar Extraction Variation in Machine Translation Vladimir Eidelman, Chris Dyer, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department
More informationThe Internet as a Normative Corpus: Grammar Checking with a Search Engine
The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a
More informationAnalyzing Linguistically Appropriate IEP Goals in Dual Language Programs
Analyzing Linguistically Appropriate IEP Goals in Dual Language Programs 2016 Dual Language Conference: Making Connections Between Policy and Practice March 19, 2016 Framingham, MA Session Description
More informationConstructing Parallel Corpus from Movie Subtitles
Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing
More informationCross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels
Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels Jörg Tiedemann Uppsala University Department of Linguistics and Philology firstname.lastname@lingfil.uu.se Abstract
More informationNumeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C
Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C Using and applying mathematics objectives (Problem solving, Communicating and Reasoning) Select the maths to use in some classroom
More informationEvaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment
Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment Akiko Sakamoto, Kazuhiko Abe, Kazuo Sumita and Satoshi Kamatani Knowledge Media Laboratory,
More informationMultilingual and Cross-Lingual Complex Word Identification
Multilingual and Cross-Lingual Complex Word Identification Seid Muhie Yimam, Sanja Štajner, Martin Riedl, and Chris Biemann Language Technology Group, Department of Informatics, Universität Hamburg, Germany
More informationThe NICT Translation System for IWSLT 2012
The NICT Translation System for IWSLT 2012 Andrew Finch Ohnmar Htun Eiichiro Sumita Multilingual Translation Group MASTAR Project National Institute of Information and Communications Technology Kyoto,
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationGuru: A Computer Tutor that Models Expert Human Tutors
Guru: A Computer Tutor that Models Expert Human Tutors Andrew Olney 1, Sidney D'Mello 2, Natalie Person 3, Whitney Cade 1, Patrick Hays 1, Claire Williams 1, Blair Lehman 1, and Art Graesser 1 1 University
More informationTINE: A Metric to Assess MT Adequacy
TINE: A Metric to Assess MT Adequacy Miguel Rios, Wilker Aziz and Lucia Specia Research Group in Computational Linguistics University of Wolverhampton Stafford Street, Wolverhampton, WV1 1SB, UK {m.rios,
More informationImproving the impact of development projects in Sub-Saharan Africa through increased UK/Brazil cooperation and partnerships Held in Brasilia
Image: Brett Jordan Report Improving the impact of development projects in Sub-Saharan Africa through increased UK/Brazil cooperation and partnerships Thursday 17 Friday 18 November 2016 WP1492 Held in
More informationMYP Language A Course Outline Year 3
Course Description: The fundamental piece to learning, thinking, communicating, and reflecting is language. Language A seeks to further develop six key skill areas: listening, speaking, reading, writing,
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationMathematics process categories
Mathematics process categories All of the UK curricula define multiple categories of mathematical proficiency that require students to be able to use and apply mathematics, beyond simple recall of facts
More informationAustralian Journal of Basic and Applied Sciences
AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean
More informationDOES RETELLING TECHNIQUE IMPROVE SPEAKING FLUENCY?
DOES RETELLING TECHNIQUE IMPROVE SPEAKING FLUENCY? Noor Rachmawaty (itaw75123@yahoo.com) Istanti Hermagustiana (dulcemaria_81@yahoo.com) Universitas Mulawarman, Indonesia Abstract: This paper is based
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationRachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA
LANGUAGE AND SPEECH, 2009, 52 (4), 391 413 391 Variability in Word Duration as a Function of Probability, Speech Style, and Prosody Rachel E. Baker, Ann R. Bradlow Northwestern University, Evanston, IL,
More informationCHAPTER 4: REIMBURSEMENT STRATEGIES 24
CHAPTER 4: REIMBURSEMENT STRATEGIES 24 INTRODUCTION Once state level policymakers have decided to implement and pay for CSR, one issue they face is simply how to calculate the reimbursements to districts
More informationLecture 2: Quantifiers and Approximation
Lecture 2: Quantifiers and Approximation Case study: Most vs More than half Jakub Szymanik Outline Number Sense Approximate Number Sense Approximating most Superlative Meaning of most What About Counting?
More informationCEFR Overall Illustrative English Proficiency Scales
CEFR Overall Illustrative English Proficiency s CEFR CEFR OVERALL ORAL PRODUCTION Has a good command of idiomatic expressions and colloquialisms with awareness of connotative levels of meaning. Can convey
More informationCarnegie Mellon University Department of Computer Science /615 - Database Applications C. Faloutsos & A. Pavlo, Spring 2014.
Carnegie Mellon University Department of Computer Science 15-415/615 - Database Applications C. Faloutsos & A. Pavlo, Spring 2014 Homework 2 IMPORTANT - what to hand in: Please submit your answers in hard
More informationThe Effect of Syntactic Simplicity and Complexity on the Readability of the Text
ISSN 798-769 Journal of Language Teaching and Research, Vol., No., pp. 8-9, September 2 2 ACADEMY PUBLISHER Manufactured in Finland. doi:.3/jltr...8-9 The Effect of Syntactic Simplicity and Complexity
More informationLanguage Acquisition Chart
Language Acquisition Chart This chart was designed to help teachers better understand the process of second language acquisition. Please use this chart as a resource for learning more about the way people
More informationAn ICT environment to assess and support students mathematical problem-solving performance in non-routine puzzle-like word problems
An ICT environment to assess and support students mathematical problem-solving performance in non-routine puzzle-like word problems Angeliki Kolovou* Marja van den Heuvel-Panhuizen*# Arthur Bakker* Iliada
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationOne Stop Shop For Educators
Modern Languages Level II Course Description One Stop Shop For Educators The Level II language course focuses on the continued development of communicative competence in the target language and understanding
More informationTransportation Equity Analysis
2015-16 Transportation Equity Analysis Each year the Seattle Public Schools updates the Transportation Service Standards and bus walk zone boundaries for use in the upcoming school year. For the 2014-15
More informationCLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction
CLASSIFICATION OF PROGRAM Critical Elements Analysis 1 Program Name: Macmillan/McGraw Hill Reading 2003 Date of Publication: 2003 Publisher: Macmillan/McGraw Hill Reviewer Code: 1. X The program meets
More informationThe Effect of Extensive Reading on Developing the Grammatical. Accuracy of the EFL Freshmen at Al Al-Bayt University
The Effect of Extensive Reading on Developing the Grammatical Accuracy of the EFL Freshmen at Al Al-Bayt University Kifah Rakan Alqadi Al Al-Bayt University Faculty of Arts Department of English Language
More informationDiscriminative Learning of Beam-Search Heuristics for Planning
Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University
More informationThe Relationship Between Poverty and Achievement in Maine Public Schools and a Path Forward
The Relationship Between Poverty and Achievement in Maine Public Schools and a Path Forward Peer Learning Session MELMAC Education Foundation Dr. David L. Silvernail Director Applied Research, and Evaluation
More informationGreedy Decoding for Statistical Machine Translation in Almost Linear Time
in: Proceedings of HLT-NAACL 23. Edmonton, Canada, May 27 June 1, 23. This version was produced on April 2, 23. Greedy Decoding for Statistical Machine Translation in Almost Linear Time Ulrich Germann
More informationROSETTA STONE PRODUCT OVERVIEW
ROSETTA STONE PRODUCT OVERVIEW Method Rosetta Stone teaches languages using a fully-interactive immersion process that requires the student to indicate comprehension of the new language and provides immediate
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationPAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))
Ohio Academic Content Standards Grade Level Indicators (Grade 11) A. ACQUISITION OF VOCABULARY Students acquire vocabulary through exposure to language-rich situations, such as reading books and other
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More information2013 TRIAL URBAN DISTRICT ASSESSMENT (TUDA) RESULTS
3 TRIAL URBAN DISTRICT ASSESSMENT (TUDA) RESULTS Achievement and Accountability Office December 3 NAEP: The Gold Standard The National Assessment of Educational Progress (NAEP) is administered in reading
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationA heuristic framework for pivot-based bilingual dictionary induction
2013 International Conference on Culture and Computing A heuristic framework for pivot-based bilingual dictionary induction Mairidan Wushouer, Toru Ishida, Donghui Lin Department of Social Informatics,
More informationEffectiveness of McGraw-Hill s Treasures Reading Program in Grades 3 5. October 21, Research Conducted by Empirical Education Inc.
Effectiveness of McGraw-Hill s Treasures Reading Program in Grades 3 5 October 21, 2010 Research Conducted by Empirical Education Inc. Executive Summary Background. Cognitive demands on student knowledge
More informationSouth Carolina English Language Arts
South Carolina English Language Arts A S O F J U N E 2 0, 2 0 1 0, T H I S S TAT E H A D A D O P T E D T H E CO M M O N CO R E S TAT E S TA N DA R D S. DOCUMENTS REVIEWED South Carolina Academic Content
More informationColumbia University at DUC 2004
Columbia University at DUC 2004 Sasha Blair-Goldensohn, David Evans, Vasileios Hatzivassiloglou, Kathleen McKeown, Ani Nenkova, Rebecca Passonneau, Barry Schiffman, Andrew Schlaikjer, Advaith Siddharthan,
More informationLanguage Independent Passage Retrieval for Question Answering
Language Independent Passage Retrieval for Question Answering José Manuel Gómez-Soriano 1, Manuel Montes-y-Gómez 2, Emilio Sanchis-Arnal 1, Luis Villaseñor-Pineda 2, Paolo Rosso 1 1 Polytechnic University
More informationManaging Printing Services
Managing Printing Services A SPEC Kit compiled by Julia C. Blixrud Director of Information Services Association of Research Libraries December 1999 Series Editor: Lee Anne George Production Coordinator:
More informationKhairul Hisyam Kamarudin, PhD 22 Feb 2017 / UTM Kuala Lumpur
Khairul Hisyam Kamarudin, PhD 22 Feb 2017 / UTM Kuala Lumpur DISCLAIMER: What is literature review? Why literature review? Common misconception on literature review Producing a good literature review Scholarly
More informationUsing Moodle in ESOL Writing Classes
The Electronic Journal for English as a Second Language September 2010 Volume 13, Number 2 Title Moodle version 1.9.7 Using Moodle in ESOL Writing Classes Publisher Author Contact Information Type of product
More informationUsing Proportions to Solve Percentage Problems I
RP7-1 Using Proportions to Solve Percentage Problems I Pages 46 48 Standards: 7.RP.A. Goals: Students will write equivalent statements for proportions by keeping track of the part and the whole, and by
More informationA Metacognitive Approach to Support Heuristic Solution of Mathematical Problems
A Metacognitive Approach to Support Heuristic Solution of Mathematical Problems John TIONG Yeun Siew Centre for Research in Pedagogy and Practice, National Institute of Education, Nanyang Technological
More informationEnglish Language Arts Missouri Learning Standards Grade-Level Expectations
A Correlation of, 2017 To the Missouri Learning Standards Introduction This document demonstrates how myperspectives meets the objectives of 6-12. Correlation page references are to the Student Edition
More informationPrentice Hall Literature Common Core Edition Grade 10, 2012
A Correlation of Prentice Hall Literature Common Core Edition, 2012 To the New Jersey Model Curriculum A Correlation of Prentice Hall Literature Common Core Edition, 2012 Introduction This document demonstrates
More informationSystem Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering
More informationSpecial Edition. Starter Teacher s Pack. Adrian Doff, Sabina Ostrowska & Johanna Stirling With Rachel Thake, Cathy Brabben & Mark Lloyd
Special Edition A1 Starter Teacher s Pack Adrian Doff, Sabina Ostrowska & Johanna Stirling With Rachel Thake, Cathy Brabben & Mark Lloyd Acknowledgements Adrian Doff would like to thank Karen Momber and
More informationEnglish Grammar and Usage (ENGL )
Dr. Chris Healy HLG 250 482-5476 healy@louisiana.edu English Grammar and Usage (ENGL 352-002) Office Hours MWF 10:00 11:00 MW 1:00 2:30 and by appointment Spring 2015 MWF 11:00 11:50 a.m. HLG 131 COURSE
More informationFrom understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design
Rachel Baker From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design Organised session: Neil McHugh, Job van Exel Session outline
More informationOnline Updating of Word Representations for Part-of-Speech Tagging
Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org
More informationArizona s English Language Arts Standards th Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS
Arizona s English Language Arts Standards 11-12th Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS 11 th -12 th Grade Overview Arizona s English Language Arts Standards work together
More informationMULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY
MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract
More informationGEOG Introduction to GIS - Fall 2015
GEOG 3500 - Introduction to GIS - Fall 2015 Dr. Bruce Hunter Instructor hunter@unt.edu Office: ENV 320J Office Hours: Tues. 3:00 5:00P Lecture Section 001 Tues 6:00 6:50P, PHYS 104 Lab Section 301 Tues
More informationTrend Survey on Japanese Natural Language Processing Studies over the Last Decade
Trend Survey on Japanese Natural Language Processing Studies over the Last Decade Masaki Murata, Koji Ichii, Qing Ma,, Tamotsu Shirado, Toshiyuki Kanamaru,, and Hitoshi Isahara National Institute of Information
More informationExploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data
Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer
More informationRubric for Scoring English 1 Unit 1, Rhetorical Analysis
FYE Program at Marquette University Rubric for Scoring English 1 Unit 1, Rhetorical Analysis Writing Conventions INTEGRATING SOURCE MATERIAL 3 Proficient Outcome Effectively expresses purpose in the introduction
More informationThis Performance Standards include four major components. They are
Environmental Physics Standards The Georgia Performance Standards are designed to provide students with the knowledge and skills for proficiency in science. The Project 2061 s Benchmarks for Science Literacy
More informationVocabulary Usage and Intelligibility in Learner Language
Vocabulary Usage and Intelligibility in Learner Language Emi Izumi, 1 Kiyotaka Uchimoto 1 and Hitoshi Isahara 1 1. Introduction In verbal communication, the primary purpose of which is to convey and understand
More informationVirtual Seminar Courses: Issues from here to there
1 of 5 Virtual Seminar Courses: Issues from here to there by Sherry Markel, Ph.D. Northern Arizona University Abstract: This article is a brief examination of some of the benefits and concerns of virtual
More informationRunning head: LISTENING COMPREHENSION OF UNIVERSITY REGISTERS 1
Running head: LISTENING COMPREHENSION OF UNIVERSITY REGISTERS 1 Assessing Students Listening Comprehension of Different University Spoken Registers Tingting Kang Applied Linguistics Program Northern Arizona
More informationNumber of students enrolled in the program in Fall, 2011: 20. Faculty member completing template: Molly Dugan (Date: 1/26/2012)
Program: Journalism Minor Department: Communication Studies Number of students enrolled in the program in Fall, 2011: 20 Faculty member completing template: Molly Dugan (Date: 1/26/2012) Period of reference
More informationAN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)
B. PALTRIDGE, DISCOURSE ANALYSIS: AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC. 2012. PP. VI, 282) Review by Glenda Shopen _ This book is a revised edition of the author s 2006 introductory
More informationThe Language of Football England vs. Germany (working title) by Elmar Thalhammer. Abstract
The Language of Football England vs. Germany (working title) by Elmar Thalhammer Abstract As opposed to about fifteen years ago, football has now become a socially acceptable phenomenon in both Germany
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More information