Rule Learning (1): Classification Rules
|
|
- Martin Merritt
- 6 years ago
- Views:
Transcription
1 14s1: COMP9417 Machine Learning and Data Mining Rule Learning (1): Classification Rules March 19, 2014
2 Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill, and the book Data Mining, Ian H. Witten and Eibe Frank, Morgan Kauffman,
3 Aims This lecture will enable you to describe machine learning approaches to the problem of discovering rules from data. Following it you should be able to: define a representation for rules describe the decision table and 1R approaches outline overfitting avoidance in rule learning using pruning reproduce the basic sequential covering algorithm Relevant WEKA programs: OneR, ZeroR, DecisionTable, DecisionStump, PART, Prism, JRip, Ridor COMP9417: March 19, 2014 Classification Rule Learning: Slide 1
4 Introduction Machine Learning specialists often prefer certain models of data decision-trees neural networks nearest-neighbour... Potential Machine Learning users often prefer certain models of data spreadsheets 2D-plots OLAP... COMP9417: March 19, 2014 Classification Rule Learning: Slide 2
5 Introduction In applications of machine learning, specialists may find that users: find it hard to understand what some representations for models mean expect to see in models similar types of patterns to those they can find using manual methods have other ideas about kinds of representations for models they think would help them Message: very simple models may be useful at first to help users understand what is going on in the data. Later, can use representations for models which may allow for greater predictive accuracy. COMP9417: March 19, 2014 Classification Rule Learning: Slide 3
6 Data set for Weather outlook temperature humidity windy play sunny hot high false no sunny hot high true no overcast hot high false yes rainy mild high false yes rainy cool normal false yes rainy cool normal true no overcast cool normal true yes sunny mild high false no sunny cool normal false yes rainy mild normal false yes sunny mild normal true yes overcast mild high true yes overcast hot normal false yes rainy mild high true no COMP9417: March 19, 2014 Classification Rule Learning: Slide 4
7 Decision Tables Simple representation for model is to use same format as input - a decision table. Just look up the attribute values of an instance in the table to find the class value. This is rote learning or memorization - no generalization! However, by selecting a subset of the attributes we can compress the table and classify new instances. Decision table: 1. a schema, set of attributes 2. a body, multiset of labelled instances, each has value for each attribute and for label A multiset is a set which can have repeated elements. COMP9417: March 19, 2014 Classification Rule Learning: Slide 5
8 Learning Decision Tables Best-first search for schema giving decision table with least error. 1. i := 0 2. attribute set A i := A 3. schema S i := 4. Do Find the best attribute a A i to add to S i by minimising crossvalidation estimation of error E i A i := A i \{a} S i := S i {a} i := i While E i is reducing COMP9417: March 19, 2014 Classification Rule Learning: Slide 6
9 LOOCV Leave-one-out cross-validation. Given a data set, we often wish to estimate the error on new data of a model learned from this data set. What can we do? We can use a holdout set, a subset of the data set which is NOT used for training but is used in testing our model. Often use a 2:1 split of training:test data. BUT this means only 2 3 of the data set is available to learn our model... So in LOOCV, for n examples, we repeatedly leave 1 out and train on the remaining n 1 examples. Doing this n times, the mean error of all the train-and-test iterations is our estimate of the true error of our model. COMP9417: March 19, 2014 Classification Rule Learning: Slide 7
10 k-fold Cross-Validation A problem with LOOCV - have to learn a model n times for n examples in our data set. Is this really necessary? Partition data set into k equal size disjoint subsets. Each of these k subsets in turn is used as the test set while the remainder are used as the training set. The mean error of all the train-and-test iterations is our estimate of the true error of our model. k = 10 is a reasonable choice (or k =3if the learning takes a long time). Ensuring the class distribution in each subset is the same as that of the complete data set is called stratification. We ll see cross-validation again... COMP9417: March 19, 2014 Classification Rule Learning: Slide 8
11 Decision Table for play Best first search for feature set, terminated after 5 non improving subsets. Evaluation (for feature selection): CV (leave one out) Rules: ================================== outlook humidity play ================================== sunny normal yes overcast normal yes rainy normal yes rainy high yes overcast high yes sunny high no ================================== COMP9417: March 19, 2014 Classification Rule Learning: Slide 9
12 Decision Table for play Unfortunately, not particularly good at predicting play... === Stratified cross-validation === Correctly Classified Instances % Incorrectly Classified Instances % However, on a number of real-world domains has been shown to give predictive accuracy competitive with C4.5 decision-tree learner and uses a simpler model representation. COMP9417: March 19, 2014 Classification Rule Learning: Slide 10
13 Representing Rules General form of a rule: Antecedent Consequent Antecedent (pre-condition) is a series of tests or constraints on attributes (like the tests at decision tree nodes) Consequent (post-condition or conclusion) gives class value or probability distribution on class values (like leaf nodes of a decision tree) Rules of this form (with a single conclusion) are classification rules Antecedent is true if logical conjunction of constraints is true Rule fires and gives the class in the consequent Also has a procedural interpretation: If antecedent Then consequent COMP9417: March 19, 2014 Classification Rule Learning: Slide 11
14 Sets of Rules Rule1 Rule2... Think of set of rules as a logical disjunction. A problem: can give rise to conflicts: Rule1: att1=red att2= circle yes Rule2: att2=circle att3= heavy no Instance red, circle, heavy classified as both yes and no! Either give no conclusion, or conclusion of rule with highest coverage. Another problem: some instances may not be covered by rules: Either give no conclusion, or majority class of training set. COMP9417: March 19, 2014 Classification Rule Learning: Slide 12
15 Rules vs. Trees Can solve both problems on previous slide by using ordered rules with a default class, e.g. decision list. If Then Else If Then... However, essentially back to trees (which don t suffer from these problems due to fixed order of execution) So why not just use trees? Rules can be modular (independent nuggets of information) whereas trees are not (easily) made of independent components. Rules can be more compact than trees see lecture on Decision Tree Learning. COMP9417: March 19, 2014 Classification Rule Learning: Slide 13
16 Rules vs. Trees How would you represent these rules as a tree if each attribute w, x, y and z can have values 1, 2 or 3? If x = 1 and y = 1 Then class = a If z = 1 and w = 1 Then class = a Otherwise class = b COMP9417: March 19, 2014 Classification Rule Learning: Slide 14
17 1R A simple rule-learner which has nonetheless proved very competitive in some domains. Called 1R or OneR for 1-rule, it is a one-level decision-tree (aka DecisionStump) expressed as a set of rules that test one attribute. For each attribute a For each value v of a make a rule: count how often each class appears find most frequent class c set rule to assign class c for attribute-value a = v Calculate error rate of rules for a Choose set of rules with lowest error rate COMP9417: March 19, 2014 Classification Rule Learning: Slide 15
18 1R on play attribute rules errors total errors outlook sunny no 2/5 4/14 overcast yes 0/4 rainy yes 2/5 temperature hot no 2/4 5/14 mild yes 2/6 cool yes 1/4 humidity high no 3/7 4/14 normal yes 1/7 windy false yes 2/8 5/14 true no 3/6 COMP9417: March 19, 2014 Classification Rule Learning: Slide 16
19 1R on play Two rules tie with the smallest number of errors, the first one is: outlook: sunny -> no overcast -> yes rainy -> yes (10/14 instances correct) COMP9417: March 19, 2014 Classification Rule Learning: Slide 17
20 1R on play More complicated with missing or numeric attributes: treat missing as a separate value discretize numeric attributes by choosing breakpoints for threshold tests However, too many breakpoints causes overfitting, so parameter to specify minimum number of examples lying between two thresholds. humidity: < > yes < > no >= > yes (11/14 instances correct) COMP9417: March 19, 2014 Classification Rule Learning: Slide 18
21 ZeroR What is this? Simply the 1R method but testing zero attributes instead of one. What does it do? Predicts majority class in training set (mean if numerical prediction). What is the point? Use a baseline for comparing classifier performance. Stop and think about it it is a most-general classifier, having no constraints on attributes. Usually, it will be too general (e.g. always play ). So we could try 1R, which is less general (more specific)... What does this process of moving from ZeroR to 1R resemble? COMP9417: March 19, 2014 Classification Rule Learning: Slide 19
22 Learning Disjunctive Sets of Rules Method 1: Learn decision tree, convert to rules can be slow for large and noisy datasets improvements: e.g. C5.0, Weka PART Method 2: Sequential covering algorithm: 1. Learn one rule with high accuracy, any coverage 2. Remove positive examples covered by this rule 3. Repeat COMP9417: March 19, 2014 Classification Rule Learning: Slide 20
23 Sequential Covering Algorithm Sequential-covering(T arget attribute, Attributes, Examples, T hreshold) Learned rules {} Rule learn-one-rule(target attribute, Attributes, Examples) while performance(rule, Examples) >Threshold, do Learned rules Learned rules + Rule Examples Examples {examples correctly classified by Rule} Rule learn-one-rule(target attribute, Attributes, Examples) Learned rules sort Learned rules accord to performance over Examples return Learned rules COMP9417: March 19, 2014 Classification Rule Learning: Slide 21
24 Learn One Rule IF THEN PlayTennis=yes IF Wind=weak THEN PlayTennis=yes IF Wind=strong THEN PlayTennis=no IF Humidity=normal THEN PlayTennis=yes IF Humidity=high THEN PlayTennis=no... IF Humidity=normal Wind=weak THEN PlayTennis=yes IF Humidity=normal Wind=strong THEN PlayTennis=yes IF Humidity=normal Outlook=sunny THEN PlayTennis=yes IF Humidity=normal Outlook=rain THEN PlayTennis=yes... COMP9417: March 19, 2014 Classification Rule Learning: Slide 22
25 Algorithm Learn One Rule Learn-One-Rule(Target attribute, Attributes, Examples) // Returns a single rule which covers some of the // positive examples and none of the negatives. Pos := positive Examples Neg := negative Examples BestRule := if Pos do N ewante := most general rule antecedent possible NewRuleNeg := Neg while NewRuleNeg do for ClassV al in Target attribute values do NewCons := Target attribute = ClassV al COMP9417: March 19, 2014 Classification Rule Learning: Slide 23
26 Algorithm Learn One Rule // Add a new literal to specialize NewAnte, i.e. possible // constraints of the form att = val for att Attributes Candidate literals generate candidates Best literal argmax L Candidate literals P erf ormance(specializeante(n ewante, L) N ewcons) add Best literal to NewAnte NewRule := NewAnte NewCons if P erformance(newrule) > P erformance(bestrule) then BestRule := NewRule endif NewRuleNeg := subset of NewRuleNeg that satisfies NewAnte endfor endif return BestRule COMP9417: March 19, 2014 Classification Rule Learning: Slide 24
27 Learn One Rule Called a covering approach because at each stage a rule is identified that covers some of the instances the evaluation function P erformance(rule) is unspecified a simple measure would be the number of negatives not covered by the antecedent, i.e. Neg NewRuleNeg the consequent could then be the most frequent value of the target attribute among the examples covered by the antecedent this is sure not to be the best measure of performance! COMP9417: March 19, 2014 Classification Rule Learning: Slide 25
28 Example: generating a rule y b b b b b b b b b a a a a b b b a a b b If true then class = a x y b b b b b b b b b 1 2 a a a a b b b a b a b x If x > 1.2 then class = a y 2 6 b b b b b b b b b a a a a b b b a b a b If x > 1.2 and y > 2.6 then class = a 1 2 x COMP9417: March 19, 2014 Classification Rule Learning: Slide 26
29 Subtleties: Learn One Rule 1. May use beam search 2. Easily generalizes to multi-valued target functions 3. Choose evaluation function to guide search: Entropy (i.e., information gain) Sample accuracy: n c n where n c = correct rule predictions, n = all predictions m estimate: n c + mp n + m think of this as an approximation to a Bayesian evaluation function COMP9417: March 19, 2014 Classification Rule Learning: Slide 27
30 Aspects of Sequential Covering Algorithms Sequential Covering learns rules singly. Decision Tree induction learns all disjuncts simultaneously. Sequential Covering chooses between all att-val pairs at each specialisation step (i.e. between subsets of the examples covered). Decision Tree induction only chooses between all attributes (i.e. between partitions of the examples w.r.t. the added attribute). Assuming final rule-set contains on average n rules with k conditions, sequential covering requires n k primitive selection decisions. Choosing an attribute at the internal node of a decision tree equates to choosing att-val pairs for the conditions of all corresponding rules. If data is plentiful, then the greater flexibility for choosing att-val pairs might be desired and might lead to better performance. COMP9417: March 19, 2014 Classification Rule Learning: Slide 28
31 Aspects of Sequential Covering Algorithms If a general-to-specific search is chosen, then start from a single node. If a specific-to-general search is chosen, then for a set of examples, need to determine what are the starting nodes. Depending on the number of conditions expected for rules relative to the number of conditions in the examples, most general rules may be closer to the target than most specific rules. General-to-specific sequential covering is a generate-and-test approach. All syntactically permitted specialisations are generated and tested against the data. Specific-to-general is typically example-driven, constraining the hypotheses generated. Variations on performance evaluation are often implemented: entropy, m-estimate, relative frequency, significance tests (e.g. likelihood ratio). COMP9417: March 19, 2014 Classification Rule Learning: Slide 29
32 Rules with exceptions Idea: allow rules to have exceptions Example: rule for iris data If petal-length 2.45 and petal-length < 4.45 then Iris-versicolor New instance: Sepal Sepal Petal Petal Type length width length width Iris-setosa Modified rule: If petal-length 2.45 and petal-length < 4.45 then Iris-versicolor EXCEPT if petal-width < 1.0 then Iris-setosa COMP9417: March 19, 2014 Classification Rule Learning: Slide 30
33 Exceptions to exceptions to exceptions... default: Iris-setosa except if petal-length 2.45 and petal-length < and petal-width < 1.75 then Iris-versicolor except if petal-length 4.95 and petal-width < 1.55 then Iris-virginica else if sepal-length < 4.95 and sepal-width 2.45 then Iris-virginica else if petal-length 3.35 then Iris-virginica except if petal-length < 4.85 and sepal-length < 5.95 then Iris-versicolor COMP9417: March 19, 2014 Classification Rule Learning: Slide 31
34 Advantages of using exceptions Rules can be updated incrementally Easy to incorporate new data Easy to incorporate domain knowledge People often think in terms of exceptions Each conclusion can be considered just in the context of rules and exceptions that lead to it Locality property is important for understanding large rule sets Normal rule sets don t offer this advantage COMP9417: March 19, 2014 Classification Rule Learning: Slide 32
35 Advantages of using exceptions Default...except if...then... is logically equivalent to if...then...else where the else specifies the default. But: exceptions offer a psychological advantage Assumption: defaults and tests early on apply more widely than exceptions further down Exceptions reflect special cases COMP9417: March 19, 2014 Classification Rule Learning: Slide 33
36 Induct-RDR Gaines & Compton (1995) Learns Ripple-Down Rules from examples INDUCT s significance measure for a rule: Probability of completely random rule with same Random rule R selects t cases at random from the data set How likely is it that p of these belong to the correct class? Probability given by hypergeometric distribution see next slide approximated by incomplete beta function works well if target function suits rules-with-exceptions bias COMP9417: March 19, 2014 Classification Rule Learning: Slide 34
37 Induct-RDR Hypergeometric test for rule induction Witten & Gaines COMP9417: March 19, 2014 Classification Rule Learning: Slide 35
38 Issues for Classification Rule Learning Programs Sequential or simultaneous covering of data? General specific, or specific general? Generate-and-test, or example-driven? Whether and how to post-prune? What statistical evaluation function? COMP9417: March 19, 2014 Classification Rule Learning: Slide 36
39 Summary of Classification Rule Learning A major class of representations (AI, business rules, RuleML,... ) Rule interpretation may need care Many common learning issues: search, evaluation, overfitting, etc. Can be related to numeric prediction by threshold functions Lifted to first-order representations in Inductive Logic Programming COMP9417: March 19, 2014 Classification Rule Learning: Slide 37
Lecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationChapter 2 Rule Learning in a Nutshell
Chapter 2 Rule Learning in a Nutshell This chapter gives a brief overview of inductive rule learning and may therefore serve as a guide through the rest of the book. Later chapters will expand upon the
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationRule Learning with Negation: Issues Regarding Effectiveness
Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX
More informationLearning From the Past with Experiment Databases
Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University
More informationIntroduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and
More informationLecture 1: Basic Concepts of Machine Learning
Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010
More informationIT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University
IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University 06.11.16 13.11.16 Hannover Our group from Peter the Great St. Petersburg
More informationIterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages
Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer
More informationExperiment Databases: Towards an Improved Experimental Methodology in Machine Learning
Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning Hendrik Blockeel and Joaquin Vanschoren Computer Science Dept., K.U.Leuven, Celestijnenlaan 200A, 3001 Leuven, Belgium
More information(Sub)Gradient Descent
(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationWE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT
WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working
More informationA Version Space Approach to Learning Context-free Grammars
Machine Learning 2: 39~74, 1987 1987 Kluwer Academic Publishers, Boston - Manufactured in The Netherlands A Version Space Approach to Learning Context-free Grammars KURT VANLEHN (VANLEHN@A.PSY.CMU.EDU)
More informationGrade 6: Correlated to AGS Basic Math Skills
Grade 6: Correlated to AGS Basic Math Skills Grade 6: Standard 1 Number Sense Students compare and order positive and negative integers, decimals, fractions, and mixed numbers. They find multiples and
More informationRule discovery in Web-based educational systems using Grammar-Based Genetic Programming
Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de
More informationProof Theory for Syntacticians
Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 5, 2012 Logics for Linguistics Many different kinds of logic are directly applicable to formalizing theories in syntax
More informationAGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS
AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic
More informationApplications of data mining algorithms to analysis of medical data
Master Thesis Software Engineering Thesis no: MSE-2007:20 August 2007 Applications of data mining algorithms to analysis of medical data Dariusz Matyja School of Engineering Blekinge Institute of Technology
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationPOLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance
POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance Cristina Conati, Kurt VanLehn Intelligent Systems Program University of Pittsburgh Pittsburgh, PA,
More informationNetpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models
Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models 1 Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models James B.
More informationThe lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.
Name: Partner(s): Lab #1 The Scientific Method Due 6/25 Objective The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.
More informationSoftprop: Softmax Neural Network Backpropagation Learning
Softprop: Softmax Neural Networ Bacpropagation Learning Michael Rimer Computer Science Department Brigham Young University Provo, UT 84602, USA E-mail: mrimer@axon.cs.byu.edu Tony Martinez Computer Science
More informationSoftware Maintenance
1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories
More informationEvaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation
Multimodal Technologies and Interaction Article Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation Kai Xu 1, *,, Leishi Zhang 1,, Daniel Pérez 2,, Phong
More informationProbability and Statistics Curriculum Pacing Guide
Unit 1 Terms PS.SPMJ.3 PS.SPMJ.5 Plan and conduct a survey to answer a statistical question. Recognize how the plan addresses sampling technique, randomization, measurement of experimental error and methods
More informationSystem Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering
More informationAnalyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio
SCSUG Student Symposium 2016 Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio Praneth Guggilla, Tejaswi Jha, Goutam Chakraborty, Oklahoma State
More informationA Neural Network GUI Tested on Text-To-Phoneme Mapping
A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationWhy Did My Detector Do That?!
Why Did My Detector Do That?! Predicting Keystroke-Dynamics Error Rates Kevin Killourhy and Roy Maxion Dependable Systems Laboratory Computer Science Department Carnegie Mellon University 5000 Forbes Ave,
More informationBeyond the Pipeline: Discrete Optimization in NLP
Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We
More informationVersion Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18
Version Space Javier Béjar cbea LSI - FIB Term 2012/2013 Javier Béjar cbea (LSI - FIB) Version Space Term 2012/2013 1 / 18 Outline 1 Learning logical formulas 2 Version space Introduction Search strategy
More informationRule-based Expert Systems
Rule-based Expert Systems What is knowledge? is a theoretical or practical understanding of a subject or a domain. is also the sim of what is currently known, and apparently knowledge is power. Those who
More informationOCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationReducing Features to Improve Bug Prediction
Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationTruth Inference in Crowdsourcing: Is the Problem Solved?
Truth Inference in Crowdsourcing: Is the Problem Solved? Yudian Zheng, Guoliang Li #, Yuanbing Li #, Caihua Shan, Reynold Cheng # Department of Computer Science, Tsinghua University Department of Computer
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationQuickStroke: An Incremental On-line Chinese Handwriting Recognition System
QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationLearning goal-oriented strategies in problem solving
Learning goal-oriented strategies in problem solving Martin Možina, Timotej Lazar, Ivan Bratko Faculty of Computer and Information Science University of Ljubljana, Ljubljana, Slovenia Abstract The need
More informationWord learning as Bayesian inference
Word learning as Bayesian inference Joshua B. Tenenbaum Department of Psychology Stanford University jbt@psych.stanford.edu Fei Xu Department of Psychology Northeastern University fxu@neu.edu Abstract
More informationMachine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler
Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina
More informationTwitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationMath-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade
Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade The third grade standards primarily address multiplication and division, which are covered in Math-U-See
More informationThe 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X
The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationModel Ensemble for Click Prediction in Bing Search Ads
Model Ensemble for Click Prediction in Bing Search Ads Xiaoliang Ling Microsoft Bing xiaoling@microsoft.com Hucheng Zhou Microsoft Research huzho@microsoft.com Weiwei Deng Microsoft Bing dedeng@microsoft.com
More informationImproving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called
Improving Simple Bayes Ron Kohavi Barry Becker Dan Sommereld Data Mining and Visualization Group Silicon Graphics, Inc. 2011 N. Shoreline Blvd. Mountain View, CA 94043 fbecker,ronnyk,sommdag@engr.sgi.com
More informationThe stages of event extraction
The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks
More informationCalibration of Confidence Measures in Speech Recognition
Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE
More informationSARDNET: A Self-Organizing Feature Map for Sequences
SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu
More informationMachine Learning from Garden Path Sentences: The Application of Computational Linguistics
Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,
More informationConstructive Induction-based Learning Agents: An Architecture and Preliminary Experiments
Proceedings of the First International Workshop on Intelligent Adaptive Systems (IAS-95) Ibrahim F. Imam and Janusz Wnek (Eds.), pp. 38-51, Melbourne Beach, Florida, 1995. Constructive Induction-based
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationAbstractions and the Brain
Abstractions and the Brain Brian D. Josephson Department of Physics, University of Cambridge Cavendish Lab. Madingley Road Cambridge, UK. CB3 OHE bdj10@cam.ac.uk http://www.tcm.phy.cam.ac.uk/~bdj10 ABSTRACT
More informationSwitchboard Language Model Improvement with Conversational Data from Gigaword
Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword
More informationNumeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C
Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C Using and applying mathematics objectives (Problem solving, Communicating and Reasoning) Select the maths to use in some classroom
More informationFunctional Skills Mathematics Level 2 assessment
Functional Skills Mathematics Level 2 assessment www.cityandguilds.com September 2015 Version 1.0 Marking scheme ONLINE V2 Level 2 Sample Paper 4 Mark Represent Analyse Interpret Open Fixed S1Q1 3 3 0
More informationTHE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS
THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS ELIZABETH ANNE SOMERS Spring 2011 A thesis submitted in partial
More informationOPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS
OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,
More informationMining Association Rules in Student s Assessment Data
www.ijcsi.org 211 Mining Association Rules in Student s Assessment Data Dr. Varun Kumar 1, Anupama Chadha 2 1 Department of Computer Science and Engineering, MVN University Palwal, Haryana, India 2 Anupama
More informationAn Online Handwriting Recognition System For Turkish
An Online Handwriting Recognition System For Turkish Esra Vural, Hakan Erdogan, Kemal Oflazer, Berrin Yanikoglu Sabanci University, Tuzla, Istanbul, Turkey 34956 ABSTRACT Despite recent developments in
More informationIntroduction to Simulation
Introduction to Simulation Spring 2010 Dr. Louis Luangkesorn University of Pittsburgh January 19, 2010 Dr. Louis Luangkesorn ( University of Pittsburgh ) Introduction to Simulation January 19, 2010 1 /
More informationTransfer Learning Action Models by Measuring the Similarity of Different Domains
Transfer Learning Action Models by Measuring the Similarity of Different Domains Hankui Zhuo 1, Qiang Yang 2, and Lei Li 1 1 Software Research Institute, Sun Yat-sen University, Guangzhou, China. zhuohank@gmail.com,lnslilei@mail.sysu.edu.cn
More informationPhysics 270: Experimental Physics
2017 edition Lab Manual Physics 270 3 Physics 270: Experimental Physics Lecture: Lab: Instructor: Office: Email: Tuesdays, 2 3:50 PM Thursdays, 2 4:50 PM Dr. Uttam Manna 313C Moulton Hall umanna@ilstu.edu
More informationDefragmenting Textual Data by Leveraging the Syntactic Structure of the English Language
Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Nathaniel Hayes Department of Computer Science Simpson College 701 N. C. St. Indianola, IA, 50125 nate.hayes@my.simpson.edu
More informationPredicting Students Performance with SimStudent: Learning Cognitive Skills from Observation
School of Computer Science Human-Computer Interaction Institute Carnegie Mellon University Year 2007 Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation Noboru Matsuda
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationPage 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified
Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General Grade(s): None specified Unit: Creating a Community of Mathematical Thinkers Timeline: Week 1 The purpose of the Establishing a Community
More informationComparison of EM and Two-Step Cluster Method for Mixed Data: An Application
International Journal of Medical Science and Clinical Inventions 4(3): 2768-2773, 2017 DOI:10.18535/ijmsci/ v4i3.8 ICV 2015: 52.82 e-issn: 2348-991X, p-issn: 2454-9576 2017, IJMSCI Research Article Comparison
More informationRover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes
Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes WHAT STUDENTS DO: Establishing Communication Procedures Following Curiosity on Mars often means roving to places with interesting
More informationCS 446: Machine Learning
CS 446: Machine Learning Introduction to LBJava: a Learning Based Programming Language Writing classifiers Christos Christodoulopoulos Parisa Kordjamshidi Motivation 2 Motivation You still have not learnt
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationCommon Core State Standards
Common Core State Standards Common Core State Standards 7.NS.3 Solve real-world and mathematical problems involving the four operations with rational numbers. Mathematical Practices 1, 3, and 4 are aspects
More informationInterpreting ACER Test Results
Interpreting ACER Test Results This document briefly explains the different reports provided by the online ACER Progressive Achievement Tests (PAT). More detailed information can be found in the relevant
More informationData Stream Processing and Analytics
Data Stream Processing and Analytics Vincent Lemaire Thank to Alexis Bondu, EDF Outline Introduction on data-streams Supervised Learning Conclusion 2 3 Big Data what does that mean? Big Data Analytics?
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationActive Learning. Yingyu Liang Computer Sciences 760 Fall
Active Learning Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven,
More informationUsing Web Searches on Important Words to Create Background Sets for LSI Classification
Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract
More informationImpact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees
Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees Mariusz Łapczy ski 1 and Bartłomiej Jefma ski 2 1 The Chair of Market Analysis and Marketing Research,
More informationCooperative evolutive concept learning: an empirical study
Cooperative evolutive concept learning: an empirical study Filippo Neri University of Piemonte Orientale Dipartimento di Scienze e Tecnologie Avanzate Piazza Ambrosoli 5, 15100 Alessandria AL, Italy Abstract
More informationEdexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE
Edexcel GCSE Statistics 1389 Paper 1H June 2007 Mark Scheme Edexcel GCSE Statistics 1389 NOTES ON MARKING PRINCIPLES 1 Types of mark M marks: method marks A marks: accuracy marks B marks: unconditional
More informationLearning Methods for Fuzzy Systems
Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8
More informationRadius STEM Readiness TM
Curriculum Guide Radius STEM Readiness TM While today s teens are surrounded by technology, we face a stark and imminent shortage of graduates pursuing careers in Science, Technology, Engineering, and
More informationDublin City Schools Mathematics Graded Course of Study GRADE 4
I. Content Standard: Number, Number Sense and Operations Standard Students demonstrate number sense, including an understanding of number systems and reasonable estimates using paper and pencil, technology-supported
More informationWhat is a Mental Model?
Mental Models for Program Understanding Dr. Jonathan I. Maletic Computer Science Department Kent State University What is a Mental Model? Internal (mental) representation of a real system s behavior,
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationStacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes
Stacks Teacher notes Activity description (Interactive not shown on this sheet.) Pupils start by exploring the patterns generated by moving counters between two stacks according to a fixed rule, doubling
More informationMining Student Evolution Using Associative Classification and Clustering
Mining Student Evolution Using Associative Classification and Clustering 19 Mining Student Evolution Using Associative Classification and Clustering Kifaya S. Qaddoum, Faculty of Information, Technology
More informationExtending Place Value with Whole Numbers to 1,000,000
Grade 4 Mathematics, Quarter 1, Unit 1.1 Extending Place Value with Whole Numbers to 1,000,000 Overview Number of Instructional Days: 10 (1 day = 45 minutes) Content to Be Learned Recognize that a digit
More informationIssues in the Mining of Heart Failure Datasets
International Journal of Automation and Computing 11(2), April 2014, 162-179 DOI: 10.1007/s11633-014-0778-5 Issues in the Mining of Heart Failure Datasets Nongnuch Poolsawad 1 Lisa Moore 1 Chandrasekhar
More informationLip reading: Japanese vowel recognition by tracking temporal changes of lip shape
Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Koshi Odagiri 1, and Yoichi Muraoka 1 1 Graduate School of Fundamental/Computer Science and Engineering, Waseda University,
More informationMYCIN. The MYCIN Task
MYCIN Developed at Stanford University in 1972 Regarded as the first true expert system Assists physicians in the treatment of blood infections Many revisions and extensions over the years The MYCIN Task
More informationObjectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition
Chapter 2: The Representation of Knowledge Expert Systems: Principles and Programming, Fourth Edition Objectives Introduce the study of logic Learn the difference between formal logic and informal logic
More informationAnalysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion
More information