Lecture 3: Transcripts - Basic Concepts (1) and Decision Trees (1)
|
|
- Claire McGee
- 5 years ago
- Views:
Transcription
1 Lecture 3: Transcripts - Basic Concepts (1) and Decision Trees (1) Basic concepts 1. Welcome to Lecture 3. We will start Lecture 3 by introducing some basic notions and basic terminology. 2. These are the references for this presentation. 3. Let s start with the concept of induction. In simple words, we can say that induction is the process of reaching a general conclusion from specific examples. Induction is the process that allows us to generalize on specific examples. We said previously that generalization is a key concept of machine learning. More specifically 4. The goal of inductive ML is to use data to induce, to work out, a model. The goodness of this model will be evaluated on unseen data that has not been used during the model construction. If the model has a high performance on unseen data, we can say that the model generalizes well. 5. Let s see a pictorial representation of induction: we have input data that have been annotated with a label say by humans. From this data, we try to figure out which are the best features for our classification problem. Our goal is to predict the label of unlabelled data. We work out a feature representation, and extract the features. We feed a machine leaning algorithm which induces a model from the input features. This induced model should be capable of predicting the right label of unseen data, tha is data that does not belong to the input data. 6. This is another way to visualize the input data, in this case our problem is to predict the class of an iris flower. So we have a dataset containing features, feature values and class labels. Based on these training examples, we want to induce a model that can give us a reliable prediction of the flower instance on the top of the slide. 7. In order to measure the performance of our classifier, we can use several techniques. One technique is to divide the data that we have into 2 parts: a training set and a test set. Suppose we have 1000 examples of iris flowers, we might select 800 of these as training data, and we set aside 200 as test data. We induce our model only from the 800 examples and then we run the induced model on the 200 examples separately and compute our test error on these 200 examples. -- Note that when we split up our data, the examples in test set have a class label, but these labels are not used to learn. They are used only to see if the predictions made by the induced model are correct. The performance of the model on these 200 examples is indicative of how well the model will do in the future on unseen non-labelled examples. Statistics tells us that if the sample is large enough, our data is representative and can get a reliable solution of our problem. Commonly used splits are 80% of the data used as training data and 20% as test data. Or 90% training and 10% test data; We can also see other proportions, for ex 50% and 50% which usually not recommended... There is no mathematical rule to decide about the split. It depends on your data. NEVER EVER MANIPULATE test data: if you do this, your results will be invalid. Remember also that the test data must belong to the same class distribution as the training data. You cannot just use data that have a different class distribution because the induced model can be confused. 8. ML uses formal models that might perform well on our data. In this course we are going to study some of these models. The choice of using one model rather than another is our choice. A model tells us what sort of things we can learn. A model tells us what our inductive bias is. We said that the inductive bias is the a-priori assumption that govern a model. The inductive bias is the set of assumptions the
2 learning algorithm makes that allow it to learn. For example, the inductive bias of decision trees is that we can split data into branches and nodes and that the root node is the most similar to the things we want to learn. The inductive bias of the perceptron is that data must be linearly separable and so on. 9. Learning algorithms have parameters associated to them. A parameter is a kind of setting. For ex in a decision tree, a parameter can regulate the order in which questions are asked. Models can have many parameters and finding the best combination of parameters is not trivial. Parameters are usually adjusted based on the training data. 10. Learning algorithms have also additional settings called hyperparameters. A hyperparameter is a parameter that controls other parameters of the model. Hyperparameters cannot be directly adjusted using training data. The process is more complex. Split your data into 70% training data, 10% development data and 20% test data. 11. For each possible setting of the hyperparameters: Train a model using that setting on the training data; compute the model error rate on the development data; from the above collection of medels, choose the one that achieve the lowest error rate on development data. Evaluate that model on the test data to estimate future test performance. 12. Accuracy measures the percentage of correct results that a classifier has achieved. Accuracy is the proportion or percentage of correctly predicted labels over all predictions. Accuracy alone is sometimes quite misleading as you may have a model with relatively 'high' accuracy with the model predicting the most frequent class labels fairly accurately but the model may be making all sorts of mistakes on the classes that are actually critical to the application. However, we can always compute precision and recall for each class label and analyze the individual performance on class labels or average the values to get the overall precision and recall. 13. Machine Learning has borrowed some terminology from IR. On the screen you can see definitions used in IR. We have 4 labels to categorize the results. Let s take a binary classification problem: the spam filter: is an a spam yes or no. So our classifier must predict if an is spam or not and the class label that we use are respectively yes and no. he categories that we can use to categorize the results are: TP=True positive, ie the number of positive spam examples that have been the yes label. TN=True Negative, ie the number of negative examples that have the no label. FP=False Positive the number of negative examples that have been labelled as positive. FN=False negative the number of positive example that have been labelled as negative by our model. 14. Given these four numbers, we can define the following metrics: precision, recall and f-measure. In a classification task, the precision for a class is the number of true positives (i.e. the number of items correctly labeled as belonging to the positive class) divided by the total number of elements labeled as belonging to the positive class (i.e. the sum of true positives and false positives, which are items incorrectly labeled as belonging to the class). Recall in this context is defined as the number of true positives divided by the total number of elements that actually belong to the positive class (i.e. the sum of true positives and false negatives, which are items which were not labeled as belonging to the positive class but should have been). Precision: Given all the predicted labels (for a given class X), how many instances were correctly predicted? Recall: For all instances that should have a label X, how many of these were correctly captured? The F-Measure (or F-Score), which combines
3 the precision and recall to give a single score, is defined to be the harmonic mean of the precision and recall 15. This a list of the metrics 16. The confusion matrix is a handy tool to see what kind of mistakes a classifier makes and how often it makes them. It is a useful table that presents both the class distribution in the data and the classifiers predicted class distribution with a breakdown of error types. Usually, the rows are the observed/actual class labels and the columns the predicted class labels. Each cell contains the number of predictions made by the classifier that fall into that cell. 17. If a classification system has been trained to distinguish between cats, dogs and rabbits, a confusion matrix will summarize the results of testing the algorithm for further inspection. Assuming a sample of 27 animals 8 cats, 6 dogs, and 13 rabbits, the resulting confusion matrix could look like the table on the screen. In this confusion matrix, of the 8 actual cats, the system predicted that three were dogs, and of the six dogs, it predicted that one was a rabbit and two were cats. We can see from the matrix that the system in question has trouble distinguishing between cats and dogs, but can make the distinction between rabbits and other types of animals pretty well. All correct guesses are located in the diagonal of the table, so it's easy to visually inspect the table for errors, as they will be represented by values outside the diagonal. 18. We said we could use development data to set hyper parameters. The main disadvantage is that we use up some training data just for one or two hyperparameter estimation. An alternative is to use cross validation. In 10-fold cross-validation you break you training data up into 10 equally-sized partitions. You train a learning algorithm on 9 of them and test it on the remaining 1. You do this 10 times, each holding out a different partition as the test data. Typical choices for n- fold are 2, 5, fold cross validation is the most common. After running cross validation, you can use the hyperparameters selected by cross-validation 19. Leave One Out (or LOO) is a simple cross-validation. Each learning set is created by taking all the samples except one, the test set being the sample left out. 20. Suppose we are training a classifier to predict which of 2 classes, C1 and C2, examples belong to. Suppose we have one sample randomly drawn from the original population. We divide the sample up into a training set and a test set. Suppose it turns out that most of the examples in the training set belong to c1 and most of those in the test set to C2. This is not good. We must ensure that the proportion of each class in the sets is the same as the proportion in the original sample. This is called: stratification 21. This is screen shot from the weka package where you can choose the different testing options. 22. This screen shot shows the kind of output you get from weka. In this example a classifier called ZeroR has classified the iris dataset (on left hand side).on the right hand side, you can see the results. In this video clip we have learned the accuracy (correctly classified instances), Precision, recall, f-mesures and the confusion matrix. We will learn about other metrics in the next lectures. 23. Underfitting: the model has not learned enough from the data and is unable to generalize Overfitting: the model has learned too many idiosyncrasies (noise) and is unable to generalize 24. Our goal when we choose a machine learning model is that it does well on future, unseen data. The way in which we measure performance should depend on the
4 problem we are trying to solve. There should be a strong relationship between the data that our algorithm sees at training time and the data it sees at test time. 25. Not everything is learnable: Noise at feature level; Noise at class label level; Features are insufficiently representative; Labels are controversial; Inductive bias not appropriate for the kind of problem we try to learn 26. Now some simple quizzes. Decision Trees 1 1. Part 1a: In this video clip we are going to talk about a simple and intuitive learning model: the decision tree. 2. There will be 2 lectures on decision trees. In today s lecture, I will explain how a decision tree works and I will cover some basic characteristics of this model, such as Greediness, the Divide and Conquer notion, the Inductive Bias, the Loss function, the Expected loss, the Empirical error, and at the end we will summarize the induction. 3. We said previously that we could simplify the concept on learning in the field of ML by saying that we want to make informed guesses about the future. The past is represented by the examples stored in the training set, and the future is represented by the unseen examples. We can evaluate the generalization ability of our learner by using a test set. In the figure we have classified examples of iris flowers divided into three classes (setosa, versicolor, virginica), each example is represented by measurements. The purpose of a machine learning model would then be to guess correctly the class of a previously unseen iris flower based only on its measurements, that might differ in some respects from the measurements stored in the training set. So we want to make a good prediction based on our previous experience of irises. Our experience is formalized in the dataset. 4. Now, let s make a more specific example by using the same problem that is presented in Daume s book. Our problem is now to predict if a student will like a course or not based on his/her ratings on previous courses. We could make predictions by asking yes-no questions to the student. For instance, does the new course belong to the Systems program? Has the student liked most previous Systems courses, etc.? And we could build a diagram, a tree-like diagram like the one you see on the slide. 5. When we build our supervised decision tree learning model, we do not ask questions directly to the students. Instead, we use training data in order to answer the questions. Essentially, we have a dataset similar to that on the screen where each row is an example paired with the correct answer. In the dataset on the screen, the column Rating is the class. Interpret the classes as Like (meaning that the student liked the course) if the rating is 0, +1, +2, and Hate if the student did not like the course and ranked it using -2 and So the ratings in this specific dataset are the class labels, the column names are questions and they are our features, the responses, yes and no, are the feature values. So we have here a feature representation that we assume is useful to solve our classification problem. With this data, we could build many possible trees. Since we do not want to spend months in deciding which of these possible trees is the best tree, we proceed greedily. 7. Being greedy, in this context means: if you could ask only one question, which question would you ask? Which is the most useful question? One can start depicting the usefulness of questions in histograms. Look at the histograms on the screen. Each histogram shows the frequency of like/hate labels for each possible value of a
5 feature. From these histograms, you can see that asking the first question (that is, is it easy or not?) is not useful because there is no clear divide between yes and no. On the contrary, asking the question is this a Systems course (the fourth histogram on the screen) is useful because if the value is no, you can be sure that students liked the course, if the value is yes students hated the course. Now, pick up a random example from this dataset, and ask this question. If you get the answer no, you would possibly be inclined to say that the class label of the example is like. On the contrary, if you would get the answer yes to this questions, you would be inclined to think the class label is hate. Try and use this feature and our assumptions to make informed guesses on the examples of the dataset. You will see that you will guess right many times. So, if you choose this feature you can make reliable informed guesses. Repeat the computation for each of the available features, and score them. When you have to choose which feature to consider first, you choose the one with the highest score. In this way you choose the ROOT node of the decision tree. 8. How do we choose subsequent features? Here is where the notion of divide and conquer is applied. When you ask the first question Is the course a Systems course? you can partition the data into 2 sets: the no set and the yes set. This is the divide step: you get 2 partitions. In the Conquer step, repeat the same process you have applied to choose the first feature on the examples listed under the no branch and the yes branch of the tree. 9. At one point, we realize that asking additional questions becomes redundant, or that we have run out of questions. In both cases, we create a LEAF NODE and we guess the most prevalent answer based on the training data you are looking at. 10. The goal of the decision tree learning model is to figure out what questions to ask in what order, what answer to predict once you have asked enough questions. The inductive bias of decision trees assumes that the things that we want to learn to predict are more like the root node and less like the other branch nodes. 11. We will talk more about the basic characteristics of decision trees in the next video clip. 12. Part 1b: Welcome back to decision trees part Let s now start with an informal definition of the decision tree model. A decision tree is a flow-chart-like structure, where each internal (non-terminal) node denotes a test on an attribute, each branch represents the outcome of a test, and each leaf (or terminal) node holds a class label. The topmost node in a tree is the root node. 14. Let s now formalize the definition. We know that the performance of a learning algorithm should be measured on unseen data. We can use a function to measure the performance and we call it Loss function: The loss function is the price paid for inaccuracy of predictions in classification problems. Loss is this case means misclassifications or wrong predictions How bad is our system s predictions in comparison to the truth? In particular, if y is the truth and y-hat is the system s prediction, then the function l(y, yˆ) is a measure of error. Note that the loss function is something that we must decide on based on the goals of learning. There are many loss functions that we could use. Let s use the simplest here: the zero-one loss. If y is equal to y hat, the system s classification is correct so we have 0 errors. If y is not equal to y hat, the classification is incorrect, so we have to count one error.:
6 15. Distribution: Now that we have defined our loss function, we need to consider where the data (training and test) comes from. We talked about distribution before and we focussed on normal distribution. We now know that normal distribution is a bellshaped distribution of data. if we know a priori what your data generating distribution is, our learning problem becomes easier. In this case, we are not making any assumptions about what the distribution D looks like. We are assuming that we do not know what D is. Perhaps the hardest thing about machine learning is that we don t know what D is: all we get is a random sample from it. This random sample is our training data. We can say that the Data Generating Distribution is a probability distribution D over input/output pairs. If we write x for the input (examples/instances) and y for the output (the rating), then D is a distribution over (x, y) pairs. Remember that our problem is guess the rating of an unseen example. A useful way to think about D (Data Generating Distribution) is that it gives high probability to reasonable (x, y) pairs, and low probability to unreasonable (x, y) pairs. 16. Expected Loss: We are given access to training data, which is a random sample of input/output pairs drawn from D. Based on this training data, we need to induce a function f that maps new inputs to corresponding prediction. The key property that f should obey is that it should do well on future examples that are also drawn from D. Formally, its expected loss (epsilon) over the distribution (D) with respect to l should be as small as possible, meaning that we should minimize the expected loss, meaning that we should make as few error as possible. : 17. Now let s read and anlyse the formulae: Epsilon is equal by definition to blackboard-bold E sub the pair x y over script D of l of the pair y f of x. All this corresponds to: Sum (big sigma means sum) over all the pairs in script D of x and y times l of y and f of x. This is exactly the weighted average loss over all the pairs x and y in D, weighted by their probability under the distribution D. In practical terms, this formula accounts for the average loss if we draw a bunch of xy pairs for a distribution D. 18. Training error: The difficulty in minimizing our expected loss formula is that we don t know anything about the distribution D. but we know that in our training data we have certain number of xy pairs. So in order to compute our training error epsilon-hat (which is an average, hat indicated an), we divide the expected loss (the formula is explained in the previous slide) by the number of training examples, 1 over capital N. And we get the formula that you see on the screen: the training error epsilon-hat (the hat means that it is an estimate) is equal by definition to 1 over N of the Sum from n=1 to capital N of l of y and f of x. That is, our training error is simply our average error over the training data. The challenge for our learned function needs to generalize beyond the
7 training data to some future data that it might not have seen yet. the training error epsilon-hat is equal by definition to 1 over N of the Sum from n=1 to capital N of l of y and f of x. 19. The training error is sometimes called empirical error. Remember that terminology can be confusing sometimes. The empirical error can be called the training error, test error, or observed error depending on whether it is the error on a training set, test set, or a more general set. What out! Formulae can be written using different notation styles. For example, the formula on this slide is the formula for the empirical error given by Alpaydin: the empirical error is the proportion of training instances where the predictions of h (the hypothesis = the informed guess) do not match the required values given in big X (the training set). The formula should be read in this way: the empirical error of the hypothesis h given the training set X is the sum of the training instances (small x) where the hypothesis on the class label r fails. 20. Induction: So, putting it all together, we get a formal definition of induction machine learning: Given a loss function l and a sample small d from some unknown distribution capital D, you must compute a function f that has low expected error epsilon over D with respect to l. 21. Ok. We stop here today. Thank you for your attention. Termininology DEFINITION OF 'DISCRETE DISTRIBUTION' The statistical or probabilistic properties of observable (either finite or countably infinite) pre-defined values. Unlike a continuous distribution, which has an infinite number of outcomes, a discrete distribution is characterized by a limited number of possible observations. Discrete distribution is frequently used in statistical modeling and computer programming. Also known as a "discrete probability distribution". BREAKING DOWN 'DISCRETE DISTRIBUTION'
8 Examples of discrete probability distributions include binomial distribution (with a finite set of values) and Poisson distribution (with an countably infinite set of values). The concept of probability distrubtions and the random variables they describe are the underpinnnings of probability theory and statistical analysis. Terminology: Ordered Pairs: And here is another way to think about functions: Write the input and output of a function as an "ordered pair", such as (4,16). They are called ordered pairs because the input always comes first, and the output second: (input, output) So it looks like this: (x, f(x)).
CS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More information(Sub)Gradient Descent
(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include
More informationIntroduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationRule Learning with Negation: Issues Regarding Effectiveness
Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationOPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS
OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,
More informationThe Good Judgment Project: A large scale test of different methods of combining expert predictions
The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania
More informationChapter 2 Rule Learning in a Nutshell
Chapter 2 Rule Learning in a Nutshell This chapter gives a brief overview of inductive rule learning and may therefore serve as a guide through the rest of the book. Later chapters will expand upon the
More informationArtificial Neural Networks written examination
1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationMachine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler
Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina
More informationEdexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE
Edexcel GCSE Statistics 1389 Paper 1H June 2007 Mark Scheme Edexcel GCSE Statistics 1389 NOTES ON MARKING PRINCIPLES 1 Types of mark M marks: method marks A marks: accuracy marks B marks: unconditional
More informationNumeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C
Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C Using and applying mathematics objectives (Problem solving, Communicating and Reasoning) Select the maths to use in some classroom
More informationShockwheat. Statistics 1, Activity 1
Statistics 1, Activity 1 Shockwheat Students require real experiences with situations involving data and with situations involving chance. They will best learn about these concepts on an intuitive or informal
More informationMathematics process categories
Mathematics process categories All of the UK curricula define multiple categories of mathematical proficiency that require students to be able to use and apply mathematics, beyond simple recall of facts
More informationGenerating Test Cases From Use Cases
1 of 13 1/10/2007 10:41 AM Generating Test Cases From Use Cases by Jim Heumann Requirements Management Evangelist Rational Software pdf (155 K) In many organizations, software testing accounts for 30 to
More informationInstructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100
San Diego State University School of Social Work 610 COMPUTER APPLICATIONS FOR SOCIAL WORK PRACTICE Statistical Package for the Social Sciences Office: Hepner Hall (HH) 100 Instructor: Mario D. Garrett,
More informationAlgebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview
Algebra 1, Quarter 3, Unit 3.1 Line of Best Fit Overview Number of instructional days 6 (1 day assessment) (1 day = 45 minutes) Content to be learned Analyze scatter plots and construct the line of best
More informationProbability and Statistics Curriculum Pacing Guide
Unit 1 Terms PS.SPMJ.3 PS.SPMJ.5 Plan and conduct a survey to answer a statistical question. Recognize how the plan addresses sampling technique, randomization, measurement of experimental error and methods
More informationA Version Space Approach to Learning Context-free Grammars
Machine Learning 2: 39~74, 1987 1987 Kluwer Academic Publishers, Boston - Manufactured in The Netherlands A Version Space Approach to Learning Context-free Grammars KURT VANLEHN (VANLEHN@A.PSY.CMU.EDU)
More informationActivities, Exercises, Assignments Copyright 2009 Cem Kaner 1
Patterns of activities, iti exercises and assignments Workshop on Teaching Software Testing January 31, 2009 Cem Kaner, J.D., Ph.D. kaner@kaner.com Professor of Software Engineering Florida Institute of
More informationAGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS
AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic
More informationExperiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling
Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad
More informationClassify: by elimination Road signs
WORK IT Road signs 9-11 Level 1 Exercise 1 Aims Practise observing a series to determine the points in common and the differences: the observation criteria are: - the shape; - what the message represents.
More informationActive Learning. Yingyu Liang Computer Sciences 760 Fall
Active Learning Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven,
More informationOCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationProbability estimates in a scenario tree
101 Chapter 11 Probability estimates in a scenario tree An expert is a person who has made all the mistakes that can be made in a very narrow field. Niels Bohr (1885 1962) Scenario trees require many numbers.
More informationCS 446: Machine Learning
CS 446: Machine Learning Introduction to LBJava: a Learning Based Programming Language Writing classifiers Christos Christodoulopoulos Parisa Kordjamshidi Motivation 2 Motivation You still have not learnt
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationSystem Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering
More informationIntroduction to Causal Inference. Problem Set 1. Required Problems
Introduction to Causal Inference Problem Set 1 Professor: Teppei Yamamoto Due Friday, July 15 (at beginning of class) Only the required problems are due on the above date. The optional problems will not
More informationLecture 1: Basic Concepts of Machine Learning
Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010
More informationCPS122 Lecture: Identifying Responsibilities; CRC Cards. 1. To show how to use CRC cards to identify objects and find responsibilities
Objectives: CPS122 Lecture: Identifying Responsibilities; CRC Cards last revised March 16, 2015 1. To show how to use CRC cards to identify objects and find responsibilities Materials: 1. ATM System example
More informationLearning From the Past with Experiment Databases
Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University
More informationCHAPTER 4: REIMBURSEMENT STRATEGIES 24
CHAPTER 4: REIMBURSEMENT STRATEGIES 24 INTRODUCTION Once state level policymakers have decided to implement and pay for CSR, one issue they face is simply how to calculate the reimbursements to districts
More informationGrade 6: Correlated to AGS Basic Math Skills
Grade 6: Correlated to AGS Basic Math Skills Grade 6: Standard 1 Number Sense Students compare and order positive and negative integers, decimals, fractions, and mixed numbers. They find multiples and
More informationSwitchboard Language Model Improvement with Conversational Data from Gigaword
Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword
More informationGACE Computer Science Assessment Test at a Glance
GACE Computer Science Assessment Test at a Glance Updated May 2017 See the GACE Computer Science Assessment Study Companion for practice questions and preparation resources. Assessment Name Computer Science
More informationRadius STEM Readiness TM
Curriculum Guide Radius STEM Readiness TM While today s teens are surrounded by technology, we face a stark and imminent shortage of graduates pursuing careers in Science, Technology, Engineering, and
More informationTruth Inference in Crowdsourcing: Is the Problem Solved?
Truth Inference in Crowdsourcing: Is the Problem Solved? Yudian Zheng, Guoliang Li #, Yuanbing Li #, Caihua Shan, Reynold Cheng # Department of Computer Science, Tsinghua University Department of Computer
More informationGCE. Mathematics (MEI) Mark Scheme for June Advanced Subsidiary GCE Unit 4766: Statistics 1. Oxford Cambridge and RSA Examinations
GCE Mathematics (MEI) Advanced Subsidiary GCE Unit 4766: Statistics 1 Mark Scheme for June 2013 Oxford Cambridge and RSA Examinations OCR (Oxford Cambridge and RSA) is a leading UK awarding body, providing
More informationCSL465/603 - Machine Learning
CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am
More informationTwitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationIT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University
IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University 06.11.16 13.11.16 Hannover Our group from Peter the Great St. Petersburg
More informationNotes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1
Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial
More informationCPS122 Lecture: Identifying Responsibilities; CRC Cards. 1. To show how to use CRC cards to identify objects and find responsibilities
Objectives: CPS122 Lecture: Identifying Responsibilities; CRC Cards last revised February 7, 2012 1. To show how to use CRC cards to identify objects and find responsibilities Materials: 1. ATM System
More informationUNIT ONE Tools of Algebra
UNIT ONE Tools of Algebra Subject: Algebra 1 Grade: 9 th 10 th Standards and Benchmarks: 1 a, b,e; 3 a, b; 4 a, b; Overview My Lessons are following the first unit from Prentice Hall Algebra 1 1. Students
More informationIterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages
Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer
More informationIndividual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION
L I S T E N I N G Individual Component Checklist for use with ONE task ENGLISH VERSION INTRODUCTION This checklist has been designed for use as a practical tool for describing ONE TASK in a test of listening.
More informationUsing the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT
The Journal of Technology, Learning, and Assessment Volume 6, Number 6 February 2008 Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationTHE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS
THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS ELIZABETH ANNE SOMERS Spring 2011 A thesis submitted in partial
More informationName: Class: Date: ID: A
Name: Class: _ Date: _ Test Review Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Members of a high school club sold hamburgers at a baseball game to
More informationACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014
UNSW Australia Business School School of Risk and Actuarial Studies ACTL5103 Stochastic Modelling For Actuaries Course Outline Semester 2, 2014 Part A: Course-Specific Information Please consult Part B
More information12- A whirlwind tour of statistics
CyLab HT 05-436 / 05-836 / 08-534 / 08-734 / 19-534 / 19-734 Usable Privacy and Security TP :// C DU February 22, 2016 y & Secu rivac rity P le ratory bo La Lujo Bauer, Nicolas Christin, and Abby Marsh
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationProof Theory for Syntacticians
Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 5, 2012 Logics for Linguistics Many different kinds of logic are directly applicable to formalizing theories in syntax
More informationGCSE. Mathematics A. Mark Scheme for January General Certificate of Secondary Education Unit A503/01: Mathematics C (Foundation Tier)
GCSE Mathematics A General Certificate of Secondary Education Unit A503/0: Mathematics C (Foundation Tier) Mark Scheme for January 203 Oxford Cambridge and RSA Examinations OCR (Oxford Cambridge and RSA)
More informationHoughton Mifflin Online Assessment System Walkthrough Guide
Houghton Mifflin Online Assessment System Walkthrough Guide Page 1 Copyright 2007 by Houghton Mifflin Company. All Rights Reserved. No part of this document may be reproduced or transmitted in any form
More informationVersion Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18
Version Space Javier Béjar cbea LSI - FIB Term 2012/2013 Javier Béjar cbea (LSI - FIB) Version Space Term 2012/2013 1 / 18 Outline 1 Learning logical formulas 2 Version space Introduction Search strategy
More informationEnd-of-Module Assessment Task
Student Name Date 1 Date 2 Date 3 Topic E: Decompositions of 9 and 10 into Number Pairs Topic E Rubric Score: Time Elapsed: Topic F Topic G Topic H Materials: (S) Personal white board, number bond mat,
More informationAssociation Between Categorical Variables
Student Outcomes Students use row relative frequencies or column relative frequencies to informally determine whether there is an association between two categorical variables. Lesson Notes In this lesson,
More informationarxiv: v1 [math.at] 10 Jan 2016
THE ALGEBRAIC ATIYAH-HIRZEBRUCH SPECTRAL SEQUENCE OF REAL PROJECTIVE SPECTRA arxiv:1601.02185v1 [math.at] 10 Jan 2016 GUOZHEN WANG AND ZHOULI XU Abstract. In this note, we use Curtis s algorithm and the
More informationTest How To. Creating a New Test
Test How To Creating a New Test From the Control Panel of your course, select the Test Manager link from the Assessments box. The Test Manager page lists any tests you have already created. From this screen
More informationWord learning as Bayesian inference
Word learning as Bayesian inference Joshua B. Tenenbaum Department of Psychology Stanford University jbt@psych.stanford.edu Fei Xu Department of Psychology Northeastern University fxu@neu.edu Abstract
More informationINTERMEDIATE ALGEBRA Course Syllabus
INTERMEDIATE ALGEBRA Course Syllabus This syllabus gives a detailed explanation of the course procedures and policies. You are responsible for this information - ask your instructor if anything is unclear.
More informationCertified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt
Certification Singapore Institute Certified Six Sigma Professionals Certification Courses in Six Sigma Green Belt ly Licensed Course for Process Improvement/ Assurance Managers and Engineers Leading the
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationCritical Thinking in Everyday Life: 9 Strategies
Critical Thinking in Everyday Life: 9 Strategies Most of us are not what we could be. We are less. We have great capacity. But most of it is dormant; most is undeveloped. Improvement in thinking is like
More informationCONSTRUCTION OF AN ACHIEVEMENT TEST Introduction One of the important duties of a teacher is to observe the student in the classroom, laboratory and
CONSTRUCTION OF AN ACHIEVEMENT TEST Introduction One of the important duties of a teacher is to observe the student in the classroom, laboratory and in other settings. He may also make use of tests in
More informationExploration. CS : Deep Reinforcement Learning Sergey Levine
Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?
More informationFunctional Skills Mathematics Level 2 assessment
Functional Skills Mathematics Level 2 assessment www.cityandguilds.com September 2015 Version 1.0 Marking scheme ONLINE V2 Level 2 Sample Paper 4 Mark Represent Analyse Interpret Open Fixed S1Q1 3 3 0
More informationMULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Ch 2 Test Remediation Work Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Provide an appropriate response. 1) High temperatures in a certain
More informationIf we want to measure the amount of cereal inside the box, what tool would we use: string, square tiles, or cubes?
String, Tiles and Cubes: A Hands-On Approach to Understanding Perimeter, Area, and Volume Teaching Notes Teacher-led discussion: 1. Pre-Assessment: Show students the equipment that you have to measure
More informationAnalyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio
SCSUG Student Symposium 2016 Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio Praneth Guggilla, Tejaswi Jha, Goutam Chakraborty, Oklahoma State
More informationLecture 10: Reinforcement Learning
Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation
More informationMachine Learning from Garden Path Sentences: The Application of Computational Linguistics
Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,
More informationIntroduction to Simulation
Introduction to Simulation Spring 2010 Dr. Louis Luangkesorn University of Pittsburgh January 19, 2010 Dr. Louis Luangkesorn ( University of Pittsburgh ) Introduction to Simulation January 19, 2010 1 /
More informationMaximizing Learning Through Course Alignment and Experience with Different Types of Knowledge
Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February
More informationUsing computational modeling in language acquisition research
Chapter 8 Using computational modeling in language acquisition research Lisa Pearl 1. Introduction Language acquisition research is often concerned with questions of what, when, and how what children know,
More informationUniversity of Groningen. Systemen, planning, netwerken Bosman, Aart
University of Groningen Systemen, planning, netwerken Bosman, Aart IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationMajor Milestones, Team Activities, and Individual Deliverables
Major Milestones, Team Activities, and Individual Deliverables Milestone #1: Team Semester Proposal Your team should write a proposal that describes project objectives, existing relevant technology, engineering
More informationChunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.
NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and
More informationLinking the Ohio State Assessments to NWEA MAP Growth Tests *
Linking the Ohio State Assessments to NWEA MAP Growth Tests * *As of June 2017 Measures of Academic Progress (MAP ) is known as MAP Growth. August 2016 Introduction Northwest Evaluation Association (NWEA
More informationInformatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy
Informatics 2A: Language Complexity and the Chomsky Hierarchy September 28, 2010 Starter 1 Is there a finite state machine that recognises all those strings s from the alphabet {a, b} where the difference
More informationExtending Place Value with Whole Numbers to 1,000,000
Grade 4 Mathematics, Quarter 1, Unit 1.1 Extending Place Value with Whole Numbers to 1,000,000 Overview Number of Instructional Days: 10 (1 day = 45 minutes) Content to Be Learned Recognize that a digit
More informationLecturing Module
Lecturing: What, why and when www.facultydevelopment.ca Lecturing Module What is lecturing? Lecturing is the most common and established method of teaching at universities around the world. The traditional
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationE-3: Check for academic understanding
Respond instructively After you check student understanding, it is time to respond - through feedback and follow-up questions. Doing this allows you to gauge how much students actually comprehend and push
More informationAn Empirical and Computational Test of Linguistic Relativity
An Empirical and Computational Test of Linguistic Relativity Kathleen M. Eberhard* (eberhard.1@nd.edu) Matthias Scheutz** (mscheutz@cse.nd.edu) Michael Heilman** (mheilman@nd.edu) *Department of Psychology,
More informationStatewide Framework Document for:
Statewide Framework Document for: 270301 Standards may be added to this document prior to submission, but may not be removed from the framework to meet state credit equivalency requirements. Performance
More informationMathematics Success Grade 7
T894 Mathematics Success Grade 7 [OBJECTIVE] The student will find probabilities of compound events using organized lists, tables, tree diagrams, and simulations. [PREREQUISITE SKILLS] Simple probability,
More informationBackwards Numbers: A Study of Place Value. Catherine Perez
Backwards Numbers: A Study of Place Value Catherine Perez Introduction I was reaching for my daily math sheet that my school has elected to use and in big bold letters in a box it said: TO ADD NUMBERS
More information