Language Modelling. Marco Kuhlmann Department of Computer and Information Science Partially based on material developed by David Chiang
|
|
- Gwendoline Teresa Horn
- 6 years ago
- Views:
Transcription
1 TDDE09, 729A27 Natural Language Processing (2017) Language Modelling Marco Kuhlmann Department of Computer and Information Science Partially based on material developed by David Chiang This work is licensed under a Creative Commons Attribution 4.0 International License.
2 Language models A language model is a model of what words are more or less likely to be generated in some language. More specifically, it is a model that predicts what the next word will be, given the words so far. Instead of on words, language models can also be defined on characters (or signs, or symbols).
3 Text classification using language models The word probabilities in the Naive Bayes classifier define a simple language model: class-specific language model
4 When I look at an article in Russian, I say: This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode. Warren Weaver ( )
5 The Noisy Channel E R sender receiver noise language model Russian is noisy English. argmax E P(E R) = argmax E P(R E) P(E)
6 Autocomplete and autocorrect
7 Pinyin input method ping guo gong si ping
8 Shannon s game Shannon s game is like Hangman except that It s no fun at all. You may only guess one character at a time. When moving on to the next character, you lose all information about previously guessed characters. Claude Shannon ( ) Image source: Wikipedia
9 N-gram models
10 N-gram models An n-gram is a sequence of n words or characters. unigram, bigram, trigram, quadrigram An n-gram model is a language model where the probability of a word depends only on the n 1 immediately preceding words.
11 Unigram model A unigram language model is a bag-of-words model: Thus the probabilities of all words in the text are mutually independent.
12 Markov models The probability of each item depends only on the immediately preceding item. The probability of a sequence of items is the product of these conditional probabilities. For a well-defined model, we need to mark the beginning and the end of a sequence. Андрéй Мáрков ( ) Image source: Wikipedia
13 Probability of a sequence of words beginning-of-sentence end-of-sentence
14 Bigram models A bigram model is a Markov model on sequences of words: Thus the probability of a word depends only on the immediately preceding word.
15 Formal definition of an n-gram model n V the model s order (1 = unigram, 2 = bigram, ) a set of possible words (character); the vocabulary P(w u) a probability that specifies how likely it is to observe the word w after the context (n 1)-gram u one value for each combination of a word w and a context u
16 Simple uses of n-gram models Prediction To predict the next word, we can choose the word that has the highest probability among all possible words w: predicted word = argmax w P(w preceding words) Generation We can generate a random sequence of words where each word w is sampled with probability P(w preceding words).
17 P(w 1 w 1 ) w 1 P(w 1 BOS) P(EOS w 1 ) BOS P(w 2 w 1 ) P(w 1 w 2 ) EOS w 2 P(w 2 BOS) P(EOS w 2 ) P(w 2 w 2 )
18 Learning n-gram models
19 Estimating unigram probabilities P(Sherlock) c(sherlock) count of the unigram Sherlock N total number of unigrams (tokens)
20 Estimating bigram probabilities P(Holmes Sherlock) c(sherlock Holmes) count of the bigram Sherlock Holmes c(sherlock w) count of bigrams starting with Sherlock
21 Estimating unigram and bigram probabilities This is the count of the unigram w, divided by the total number of tokens. This is the count of the bigram uw, divided by the count of bigrams starting with u.
22 A problem with maximum likelihood estimation Shakespeare s collected works contain ca. 31,000 word types. There are 961 million different bigrams with these words. In his texts we only find 300,000 bigrams. This means that 99.97% of all theoretically possible bigrams have count 0. Under a bigram model, each sentence containing one of those bigrams will receive a probability of zero. Zero probabilities destroy information!
23 Notation N c(w) c(uw) c(u ) V V k+ V k number of word tokens, excluding BOS count of unigram (word) w count of bigram uw count of bigrams starting with u number of word types, including UNK number of word types seen at k times number of word types seen exactly k times Source: Chen and Goodman (1998)
24 Additive smoothing We can do add-k smoothing as for the Naive Bayes classifier: why?
25 A problem with additive smoothing Chiang looks at a sample of 55,708,861 words of English. In this sample, the word the appears 3,579,493 times. Add-one smoothing yields P(the) = How many times does the model expect the word the to appear in an equal-sized sample? Why is that a problem?
26 A problem with additive smoothing We have only a constant amount of probability mass that we can distribute among the word types. Therefore, although we are adding to the count of every word type, we are not adding to the probability of every word type. probabilities still need to sum to one We take away a certain percentage of the probability mass from each word type and redistribute it equally to all word types.
27 Additive smoothing for unigram probabilities The formula for add-k smoothing of unigrams, can be written as a mixture of the maximum-likelihood estimate and the uniform distribution over word types: where
28 Witten Bell smoothing for unigram probabilities Writing V 1+ for the number of word types seen at least once, where gives us Witten Bell smoothing. This is another form of additive smoothing and can also be written as where
29 Absolute discounting for unigram probabilities Intuitively, smoothing should not decrease the expected count of a word (relative to its empirical count) by more than about one. In absolute discounting, we subtract from the count of every seen word type and distribute the total gain equally to all types: where
30 Smoothing bigram probabilities When smoothing unigram probabilities, we took away from the ML estimate and gave back equally to all word types. For bigram probabilities, a better option is to give back to word types proportional to their unigram probability.
31 Smoothing bigram probabilities Witten Bell smoothing Absolute discounting
32 Unknown words In addition to new bigrams, a new text may even contain completely new words. For these, smoothing will not help. One way to deal with this is to introduce a special word type UNK, and smooth it like any other word type in the vocabulary. For additive smoothing: Hallucinate k occurrences of the unknown word. At test time, we replace every unknown word with UNK.
33 Evaluation of n-gram models
34 Intrinsic and extrinsic evaluation Intrinsic evaluation How does the method or model score with respect to a given evaluation measure? in classification: accuracy, precision, recall Extrinsic evaluation How much does the method or model help the application in which it is embedded? predictive input, machine translation, speech recognition
35 Intrinsic evaluation of language models, intuition Learn a language model from a set of training sentences and use it to compute the probability of a set of test sentences. If the language model is good, the probability of the test sentences should be high. This assumes that the training sample and the test sample are similar. In the following, for simplicity we assume that we have only one, potentially very long test sentence.
36 The problem with different sentence lengths Under a Markov model, the probability of a sentence is the product of the bigram probabilities. Therefore, all other things being equal, the probability of a sentence decreases with the sentence length. This makes it hard to compare the probability of the test data to the probability of the training data. Intuitively, we would like to average over sentence length.
37 Perplexity and entropy The perplexity of a language model P on a test sample x 1, x N is This measure is not easy to understand intuitively. We will therefore focus on the term in the exponent: This measure is known as the entropy of the test sample.
38 From probabilities to surprisal Instead of computing probabilities for the test sentence, we will compute negative log probabilities. P(w c) becomes log P(w c) Intuitively, this measures how surprised we are about seeing the test sentence, given our language model. high probability = low surprisal We can then simply divide by the number of words in the sentence to average over sentence length.
39 Negative log probabilities log p p
40 Entropy and smoothing When smoothing a language model, we are redistributing probability mass to observations we have never made. This leaves a smaller amount of the probablity mass to the observations that we actually did make during learning. When we evaluate the smoothed model on the training data, its entropy will therefore be higher than without smoothing.
41 The problem with unknown words The held-out data will in general contain unknown words words that we have not seen in the training data. Because we are multiplying probabilities, a single unknown word will bring down the probability of the held-out data to zero. Zero probabilities destroy information! The conclusion is that we should never compare language models with different vocabularies.
42 Edit distance
43 Autocomplete and autocorrect
44 Edit distance Many misspelled words are quite similar to the correctly spelled words; there are typically only a few mistakes. lingvisterma, word prefiction Given a misspelled word, we want to propose one or several similar words and propose the most probable one. This idea requires a measure of orthographic similarity between two words.
45 Edit operations We can measure the similarity between two words by the number of operations needed to transform one into the other. Here we assume three types of operations: insertion deletion substitution add a letter before or after another one delete a letter from the word substitute a letter for another one
46 Edit operations, example How many edits does it take to go from intention to execution? intention delete the letter i substitute e for n substitute x for t insert the letter c substitute u for n ntention etention exention execntion execution
47 Letter alignments i n t e * n t i o n * e x e c u t i o n i n t e n * t i o n e x * e c u t i o n
48 Levenshtein distance Each edit operation is assigned a cost: The cost for insertion and deletion is 1. The cost for substitution is 0 if the substituted letter is the same as the original one, and 1 in all other cases. The Levenshtein distance between two words is the minimal cost for transforming one word into the other.
49 Computing the Levenshtein distance We would like to find a sequence of operations which transforms one word into the other and has minimal cost. The search space for this problem is huge in fact, in theory there are infinitely many sequences of operations. However, if we are only interested in sequences with minimal cost, we can solve the problem using dynamic programming. Wagner Fisher algorithm
50 Wagner Fisher algorithm The Wagner Fisher algorithm is a dynamic programming algorithm that computes the Levenshtein distance of two words. dynamic programming = recursion + memoisation Its central data structure is a matrix L. Each cell in L will hold the Levenshtein distance for two prefixes of the input words. The cell is filled for longer and longer prefixes, from prefixes of length zero all the way up to complete words.
51 n o i t n e t n i # # e x e c u t i o n We want to transform intention into execution.
52 L(0, 0) n o i t n e t n i # 0 # e x e c u t i o n The cost of transforming the empty string into the empty string is zero.
53 L(i, 0) n 9 o 8 i 7 t 6 n 5 e 4 t 3 n 2 i 1 # 0 # e x e c u t i o n We can transform intention to the empty string by deleting all characters, one after the other.
54 L(4, 3) n o i t n e t n i # # e x e c u t i o n In the general case there are three possibilities. We want to pick the possibility that yields the minimal cost.
55 L(4, 3) n o i t n e t n i # # e x e c u t i o n How can we transform inte into exe? Possibility 1: Remove the last e and transform int into exe.
56 L(4, 3) n o i t n e t n i # # e x e c u t i o n How can we transform inte into exe? Possibility 2: Transform inte till ex and insert the final e.
57 L(4, 3) n o i t n e t n i # # e x e c u t i o n How can we transform inte into exe? Possibility 3: Substitute e for e and transform int till ex.
58 L(4, 3) n o i t n e t n i # # e x e c u t i o n Possibility 3 gives the minimal score. We store a back pointer to remember this.
59 L(9, 9) n o i t n e t n i # # e x e c u t i o n The Levenshtein distance for this pair of sentences is 5.
60 L(9, 9) n o i t n e t n i # # e x e c u t i o n To find a sequence of operations that witnesses this distance, we follow the backpointers.
61 Computational complexity Let m, n denote the lengths of the two words. The memory required by the Wagner Fisher algorithm is in O(mn); this corresponds to the size of the matrix L. Can be improved to O(max(m, n)). The runtime required by the Wagner Fisher algorithm is in O(mn); this is the number of cells that need to be filled.
62 Other measures of edit distance Practical systems for spelling correction typically use more finegrained weights than the ones that we use here. s instead of a is more probable than d instead of a We can still use the same algorithm for computing the Levenshtein distance; we only have to change the weights. An even more realistic measure is the Damerau Levenshteindistance, which even permits transposition, with cost 1. transposition = switching the positions of two adjacent characters
Lecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationDisambiguation of Thai Personal Name from Online News Articles
Disambiguation of Thai Personal Name from Online News Articles Phaisarn Sutheebanjard Graduate School of Information Technology Siam University Bangkok, Thailand mr.phaisarn@gmail.com Abstract Since online
More informationSwitchboard Language Model Improvement with Conversational Data from Gigaword
Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword
More informationGrade 6: Correlated to AGS Basic Math Skills
Grade 6: Correlated to AGS Basic Math Skills Grade 6: Standard 1 Number Sense Students compare and order positive and negative integers, decimals, fractions, and mixed numbers. They find multiples and
More informationDetecting English-French Cognates Using Orthographic Edit Distance
Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National
More informationIntroducing the New Iowa Assessments Mathematics Levels 12 14
Introducing the New Iowa Assessments Mathematics Levels 12 14 ITP Assessment Tools Math Interim Assessments: Grades 3 8 Administered online Constructed Response Supplements Reading, Language Arts, Mathematics
More informationOCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationDublin City Schools Mathematics Graded Course of Study GRADE 4
I. Content Standard: Number, Number Sense and Operations Standard Students demonstrate number sense, including an understanding of number systems and reasonable estimates using paper and pencil, technology-supported
More informationPage 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified
Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General Grade(s): None specified Unit: Creating a Community of Mathematical Thinkers Timeline: Week 1 The purpose of the Establishing a Community
More informationAGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS
AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic
More informationIterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages
Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer
More informationExperiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling
Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad
More informationSTUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH
STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH Don McAllaster, Larry Gillick, Francesco Scattone, Mike Newman Dragon Systems, Inc. 320 Nevada Street Newton, MA 02160
More informationTwitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationExploration. CS : Deep Reinforcement Learning Sergey Levine
Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?
More informationFocus of the Unit: Much of this unit focuses on extending previous skills of multiplication and division to multi-digit whole numbers.
Approximate Time Frame: 3-4 weeks Connections to Previous Learning: In fourth grade, students fluently multiply (4-digit by 1-digit, 2-digit by 2-digit) and divide (4-digit by 1-digit) using strategies
More informationArtificial Neural Networks written examination
1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14
More informationGrade 5 + DIGITAL. EL Strategies. DOK 1-4 RTI Tiers 1-3. Flexible Supplemental K-8 ELA & Math Online & Print
Standards PLUS Flexible Supplemental K-8 ELA & Math Online & Print Grade 5 SAMPLER Mathematics EL Strategies DOK 1-4 RTI Tiers 1-3 15-20 Minute Lessons Assessments Consistent with CA Testing Technology
More informationStatewide Framework Document for:
Statewide Framework Document for: 270301 Standards may be added to this document prior to submission, but may not be removed from the framework to meet state credit equivalency requirements. Performance
More informationNumeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C
Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C Using and applying mathematics objectives (Problem solving, Communicating and Reasoning) Select the maths to use in some classroom
More informationMath 96: Intermediate Algebra in Context
: Intermediate Algebra in Context Syllabus Spring Quarter 2016 Daily, 9:20 10:30am Instructor: Lauri Lindberg Office Hours@ tutoring: Tutoring Center (CAS-504) 8 9am & 1 2pm daily STEM (Math) Center (RAI-338)
More informationIntroduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationInvestigation on Mandarin Broadcast News Speech Recognition
Investigation on Mandarin Broadcast News Speech Recognition Mei-Yuh Hwang 1, Xin Lei 1, Wen Wang 2, Takahiro Shinozaki 1 1 Univ. of Washington, Dept. of Electrical Engineering, Seattle, WA 98195 USA 2
More informationESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly
ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly Inflected Languages Classical Approaches to Tagging The slides are posted on the web. The url is http://chss.montclair.edu/~feldmana/esslli10/.
More informationPre-Algebra A. Syllabus. Course Overview. Course Goals. General Skills. Credit Value
Syllabus Pre-Algebra A Course Overview Pre-Algebra is a course designed to prepare you for future work in algebra. In Pre-Algebra, you will strengthen your knowledge of numbers as you look to transition
More informationOPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS
OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,
More informationK5 Math Practice. Free Pilot Proposal Jan -Jun Boost Confidence Increase Scores Get Ahead. Studypad, Inc.
K5 Math Practice Boost Confidence Increase Scores Get Ahead Free Pilot Proposal Jan -Jun 2017 Studypad, Inc. 100 W El Camino Real, Ste 72 Mountain View, CA 94040 Table of Contents I. Splash Math Pilot
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationThis scope and sequence assumes 160 days for instruction, divided among 15 units.
In previous grades, students learned strategies for multiplication and division, developed understanding of structure of the place value system, and applied understanding of fractions to addition and subtraction
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationExtending Place Value with Whole Numbers to 1,000,000
Grade 4 Mathematics, Quarter 1, Unit 1.1 Extending Place Value with Whole Numbers to 1,000,000 Overview Number of Instructional Days: 10 (1 day = 45 minutes) Content to Be Learned Recognize that a digit
More informationLecture 10: Reinforcement Learning
Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation
More informationAP Chemistry
AP Chemistry 2016-2017 Welcome to AP Chemistry! I am so excited to have you in this course next year! To get geared up for the class, there are some things that you need to do this summer. None of it is
More informationThe Good Judgment Project: A large scale test of different methods of combining expert predictions
The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationSystem Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering
More informationEDEXCEL FUNCTIONAL SKILLS PILOT TEACHER S NOTES. Maths Level 2. Chapter 4. Working with measures
EDEXCEL FUNCTIONAL SKILLS PILOT TEACHER S NOTES Maths Level 2 Chapter 4 Working with measures SECTION G 1 Time 2 Temperature 3 Length 4 Weight 5 Capacity 6 Conversion between metric units 7 Conversion
More informationClickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models
Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models Jianfeng Gao Microsoft Research One Microsoft Way Redmond, WA 98052 USA jfgao@microsoft.com Xiaodong He Microsoft
More informationSouth Carolina College- and Career-Ready Standards for Mathematics. Standards Unpacking Documents Grade 5
South Carolina College- and Career-Ready Standards for Mathematics Standards Unpacking Documents Grade 5 South Carolina College- and Career-Ready Standards for Mathematics Standards Unpacking Documents
More informationMachine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler
Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina
More informationRadius STEM Readiness TM
Curriculum Guide Radius STEM Readiness TM While today s teens are surrounded by technology, we face a stark and imminent shortage of graduates pursuing careers in Science, Technology, Engineering, and
More informationTA Script of Student Test Directions
TA Script of Student Test Directions SMARTER BALANCED PAPER-PENCIL Spring 2017 ELA Grade 6 Paper Summative Assessment School Test Coordinator Contact Information Name: Email: Phone: ( ) Cell: ( ) Visit
More informationReducing Features to Improve Bug Prediction
Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science
More informationNCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches
NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches Yu-Chun Wang Chun-Kai Wu Richard Tzong-Han Tsai Department of Computer Science
More informationAlgebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview
Algebra 1, Quarter 3, Unit 3.1 Line of Best Fit Overview Number of instructional days 6 (1 day assessment) (1 day = 45 minutes) Content to be learned Analyze scatter plots and construct the line of best
More informationProbability Therefore (25) (1.33)
Probability We have intentionally included more material than can be covered in most Student Study Sessions to account for groups that are able to answer the questions at a faster rate. Use your own judgment,
More informationTransfer Learning Action Models by Measuring the Similarity of Different Domains
Transfer Learning Action Models by Measuring the Similarity of Different Domains Hankui Zhuo 1, Qiang Yang 2, and Lei Li 1 1 Software Research Institute, Sun Yat-sen University, Guangzhou, China. zhuohank@gmail.com,lnslilei@mail.sysu.edu.cn
More informationLearning From the Past with Experiment Databases
Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University
More informationAre You Ready? Simplify Fractions
SKILL 10 Simplify Fractions Teaching Skill 10 Objective Write a fraction in simplest form. Review the definition of simplest form with students. Ask: Is 3 written in simplest form? Why 7 or why not? (Yes,
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationNetpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models
Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models 1 Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models James B.
More informationCROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2
1 CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 Peter A. Chew, Brett W. Bader, Ahmed Abdelali Proceedings of the 13 th SIGKDD, 2007 Tiago Luís Outline 2 Cross-Language IR (CLIR) Latent Semantic Analysis
More informationRendezvous with Comet Halley Next Generation of Science Standards
Next Generation of Science Standards 5th Grade 6 th Grade 7 th Grade 8 th Grade 5-PS1-3 Make observations and measurements to identify materials based on their properties. MS-PS1-4 Develop a model that
More informationInformatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy
Informatics 2A: Language Complexity and the Chomsky Hierarchy September 28, 2010 Starter 1 Is there a finite state machine that recognises all those strings s from the alphabet {a, b} where the difference
More informationClerical Skills Level II
Passaic County Technical Institute Clerical Skills Level II School of Business Submitted by: Marie Easton Maria Matano June 2010 1 CLERICAL SKILLS II I. RATIONALE Clerical Skills II covers a variety of
More informationMulti-Lingual Text Leveling
Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency
More informationFourth Grade. Reporting Student Progress. Libertyville School District 70. Fourth Grade
Fourth Grade Libertyville School District 70 Reporting Student Progress Fourth Grade A Message to Parents/Guardians: Libertyville Elementary District 70 teachers of students in kindergarten-5 utilize a
More informationRANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S
N S ER E P S I M TA S UN A I S I T VER RANKING AND UNRANKING LEFT SZILARD LANGUAGES Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A-1997-2 UNIVERSITY OF TAMPERE DEPARTMENT OF
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationMath-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade
Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade The third grade standards primarily address multiplication and division, which are covered in Math-U-See
More informationAbbreviated text input. The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters.
Abbreviated text input The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation Published Version Accessed Citable Link Terms
More informationExcel Intermediate
Instructor s Excel 2013 - Intermediate Multiple Worksheets Excel 2013 - Intermediate (103-124) Multiple Worksheets Quick Links Manipulating Sheets Pages EX5 Pages EX37 EX38 Grouping Worksheets Pages EX304
More informationCorrective Feedback and Persistent Learning for Information Extraction
Corrective Feedback and Persistent Learning for Information Extraction Aron Culotta a, Trausti Kristjansson b, Andrew McCallum a, Paul Viola c a Dept. of Computer Science, University of Massachusetts,
More informationA New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation
A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick
More informationEdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,
More informationNumber Line Moves Dash -- 1st Grade. Michelle Eckstein
Number Line Moves Dash -- 1st Grade Michelle Eckstein Common Core Standards CCSS.MATH.CONTENT.1.NBT.C.4 Add within 100, including adding a two-digit number and a one-digit number, and adding a two-digit
More informationConstructing Parallel Corpus from Movie Subtitles
Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing
More informationCal s Dinner Card Deals
Cal s Dinner Card Deals Overview: In this lesson students compare three linear functions in the context of Dinner Card Deals. Students are required to interpret a graph for each Dinner Card Deal to help
More informationNoisy Channel Models for Corrupted Chinese Text Restoration and GB-to-Big5 Conversion
Computational Linguistics and Chinese Language Processing vol. 3, no. 2, August 1998, pp. 79-92 79 Computational Linguistics Society of R.O.C. Noisy Channel Models for Corrupted Chinese Text Restoration
More informationSenior Stenographer / Senior Typist Series (including equivalent Secretary titles)
New York State Department of Civil Service Committed to Innovation, Quality, and Excellence A Guide to the Written Test for the Senior Stenographer / Senior Typist Series (including equivalent Secretary
More informationCHAPTER 4: REIMBURSEMENT STRATEGIES 24
CHAPTER 4: REIMBURSEMENT STRATEGIES 24 INTRODUCTION Once state level policymakers have decided to implement and pay for CSR, one issue they face is simply how to calculate the reimbursements to districts
More informationVersion Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18
Version Space Javier Béjar cbea LSI - FIB Term 2012/2013 Javier Béjar cbea (LSI - FIB) Version Space Term 2012/2013 1 / 18 Outline 1 Learning logical formulas 2 Version space Introduction Search strategy
More informationDistant Supervised Relation Extraction with Wikipedia and Freebase
Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational
More informationAustralian Journal of Basic and Applied Sciences
AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean
More informationLanguage and Computers. Writers Aids. Introduction. Non-word error detection. Dictionaries. N-gram analysis. Isolated-word error correction
Spelling & grammar We are all familiar with spelling & grammar correctors They are used to improve document quality They are not typically used to provide feedback L245 (Based on Dickinson, Brew, & Meurers
More informationSchool of Innovative Technologies and Engineering
School of Innovative Technologies and Engineering Department of Applied Mathematical Sciences Proficiency Course in MATLAB COURSE DOCUMENT VERSION 1.0 PCMv1.0 July 2012 University of Technology, Mauritius
More informationLarge vocabulary off-line handwriting recognition: A survey
Pattern Anal Applic (2003) 6: 97 121 DOI 10.1007/s10044-002-0169-3 ORIGINAL ARTICLE A. L. Koerich, R. Sabourin, C. Y. Suen Large vocabulary off-line handwriting recognition: A survey Received: 24/09/01
More informationEvidence for Reliability, Validity and Learning Effectiveness
PEARSON EDUCATION Evidence for Reliability, Validity and Learning Effectiveness Introduction Pearson Knowledge Technologies has conducted a large number and wide variety of reliability and validity studies
More informationWhat the National Curriculum requires in reading at Y5 and Y6
What the National Curriculum requires in reading at Y5 and Y6 Word reading apply their growing knowledge of root words, prefixes and suffixes (morphology and etymology), as listed in Appendix 1 of the
More information2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases
POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationUnsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model
Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.
More informationarxiv: v1 [math.at] 10 Jan 2016
THE ALGEBRAIC ATIYAH-HIRZEBRUCH SPECTRAL SEQUENCE OF REAL PROJECTIVE SPECTRA arxiv:1601.02185v1 [math.at] 10 Jan 2016 GUOZHEN WANG AND ZHOULI XU Abstract. In this note, we use Curtis s algorithm and the
More informationIndian Institute of Technology, Kanpur
Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar
More informationCalibration of Confidence Measures in Speech Recognition
Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE
More informationUsing Web Searches on Important Words to Create Background Sets for LSI Classification
Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationThe 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X
The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,
More informationGrade 4. Common Core Adoption Process. (Unpacked Standards)
Grade 4 Common Core Adoption Process (Unpacked Standards) Grade 4 Reading: Literature RL.4.1 Refer to details and examples in a text when explaining what the text says explicitly and when drawing inferences
More informationMontana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011
Montana Content Standards for Mathematics Grade 3 Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011 Contents Standards for Mathematical Practice: Grade
More informationRover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes
Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes WHAT STUDENTS DO: Establishing Communication Procedures Following Curiosity on Mars often means roving to places with interesting
More informationCOPING WITH LANGUAGE DATA SPARSITY: SEMANTIC HEAD MAPPING OF COMPOUND WORDS
COPING WITH LANGUAGE DATA SPARSITY: SEMANTIC HEAD MAPPING OF COMPOUND WORDS Joris Pelemans 1, Kris Demuynck 2, Hugo Van hamme 1, Patrick Wambacq 1 1 Dept. ESAT, Katholieke Universiteit Leuven, Belgium
More informationIf we want to measure the amount of cereal inside the box, what tool would we use: string, square tiles, or cubes?
String, Tiles and Cubes: A Hands-On Approach to Understanding Perimeter, Area, and Volume Teaching Notes Teacher-led discussion: 1. Pre-Assessment: Show students the equipment that you have to measure
More informationMeasurement. When Smaller Is Better. Activity:
Measurement Activity: TEKS: When Smaller Is Better (6.8) Measurement. The student solves application problems involving estimation and measurement of length, area, time, temperature, volume, weight, and
More informationThe Strong Minimalist Thesis and Bounded Optimality
The Strong Minimalist Thesis and Bounded Optimality DRAFT-IN-PROGRESS; SEND COMMENTS TO RICKL@UMICH.EDU Richard L. Lewis Department of Psychology University of Michigan 27 March 2010 1 Purpose of this
More informationBRAZOSPORT COLLEGE LAKE JACKSON, TEXAS SYLLABUS. POFI 1301: COMPUTER APPLICATIONS I (File Management/PowerPoint/Word/Excel)
BRAZOSPORT COLLEGE LAKE JACKSON, TEXAS SYLLABUS POFI 1301: COMPUTER APPLICATIONS I (File Management/PowerPoint/Word/Excel) COMPUTER TECHNOLOGY & OFFICE ADMINISTRATION DEPARTMENT CATALOG DESCRIPTION POFI
More information