Combining Multiple Models

Size: px
Start display at page:

Download "Combining Multiple Models"

Transcription

1 Combining Multiple Models Lecture Outline: Combining Multiple Models Bagging Boosting Stacking Using Unlabeled Data Reading: Chapters 7.5 Witten and Frank, 2nd ed. Nigam, McCallum, Thrun & Mitchell. Text Classification from Labeled and Unlabeled Data using EM. Machine Learning, 39, pp , COM3250 /

2 Combining Multiple Models When making critical decisions people usually consult several experts, rather than just one A model generated by an ML technique over some training data can be viewed as an expert Natural to ask: Can we combine judgements of multiple models to get a decision that is more reliable than that of any single one on its own? Answer is: yes! (though not always) Disadvantage is that resulting combined models may be hard to understand/analyse COM3250 /

3 Why Combining Models Works Suppose (ideally) we have an infinite number of independent training sets of the same size from which we train an infinite number of classifiers (using one learning scheme) which are used to classify a test instance via majority vote COM3250 /

4 Why Combining Models Works Suppose (ideally) we have an infinite number of independent training sets of the same size from which we train an infinite number of classifiers (using one learning scheme) which are used to classify a test instance via majority vote Such a combined classifier will still make errors, depending on how well the ML method fits the problem and on noise in the data COM3250 / a

5 Why Combining Models Works Suppose (ideally) we have an infinite number of independent training sets of the same size from which we train an infinite number of classifiers (using one learning scheme) which are used to classify a test instance via majority vote Such a combined classifier will still make errors, depending on how well the ML method fits the problem and on noise in the data If we were to average the error rate of the combined classifier across an infinite number of independently chosen test examples, we arrive at the bias of the learning algorithm for the learning problem the residual error that cannot be eliminated regardless of the number of training sets COM3250 / b

6 Why Combining Models Works Suppose (ideally) we have an infinite number of independent training sets of the same size from which we train an infinite number of classifiers (using one learning scheme) which are used to classify a test instance via majority vote Such a combined classifier will still make errors, depending on how well the ML method fits the problem and on noise in the data If we were to average the error rate of the combined classifier across an infinite number of independently chosen test examples, we arrive at the bias of the learning algorithm for the learning problem the residual error that cannot be eliminated regardless of the number of training sets A second source of error arises from the use, in practice, of finite data sets, which inevitably are not fully representative of entire instance population COM3250 / c

7 Why Combining Models Works Suppose (ideally) we have an infinite number of independent training sets of the same size from which we train an infinite number of classifiers (using one learning scheme) which are used to classify a test instance via majority vote Such a combined classifier will still make errors, depending on how well the ML method fits the problem and on noise in the data If we were to average the error rate of the combined classifier across an infinite number of independently chosen test examples, we arrive at the bias of the learning algorithm for the learning problem the residual error that cannot be eliminated regardless of the number of training sets A second source of error arises from the use, in practice, of finite data sets, which inevitably are not fully representative of entire instance population The average of this error over all training sets of a given size and all test sets is the variance of the learning method for the problem COM3250 / d

8 Why Combining Models Works Suppose (ideally) we have an infinite number of independent training sets of the same size from which we train an infinite number of classifiers (using one learning scheme) which are used to classify a test instance via majority vote Such a combined classifier will still make errors, depending on how well the ML method fits the problem and on noise in the data If we were to average the error rate of the combined classifier across an infinite number of independently chosen test examples, we arrive at the bias of the learning algorithm for the learning problem the residual error that cannot be eliminated regardless of the number of training sets A second source of error arises from the use, in practice, of finite data sets, which inevitably are not fully representative of entire instance population The average of this error over all training sets of a given size and all test sets is the variance of the learning method for the problem Total error is sum of bias and variance (bias-variance decomposition) COM3250 / e

9 Why Combining Models Works Suppose (ideally) we have an infinite number of independent training sets of the same size from which we train an infinite number of classifiers (using one learning scheme) which are used to classify a test instance via majority vote Such a combined classifier will still make errors, depending on how well the ML method fits the problem and on noise in the data If we were to average the error rate of the combined classifier across an infinite number of independently chosen test examples, we arrive at the bias of the learning algorithm for the learning problem the residual error that cannot be eliminated regardless of the number of training sets A second source of error arises from the use, in practice, of finite data sets, which inevitably are not fully representative of entire instance population The average of this error over all training sets of a given size and all test sets is the variance of the learning method for the problem Total error is sum of bias and variance (bias-variance decomposition) Combining classifiers reduces the variance component of the error COM3250 / f

10 Bagging Stands for bootstrap aggregating COM3250 /

11 Bagging Stands for bootstrap aggregating A process whereby a single classifier is constructed from a number of classifiers COM3250 / a

12 Bagging Stands for bootstrap aggregating A process whereby a single classifier is constructed from a number of classifiers Each classifier is learned by applying a single learning scheme to multiple artificial training datasets that are derived from a single, original training dataset COM3250 / b

13 Bagging Stands for bootstrap aggregating A process whereby a single classifier is constructed from a number of classifiers Each classifier is learned by applying a single learning scheme to multiple artificial training datasets that are derived from a single, original training dataset The artificial datasets are obtained by randomly sampling with replacement from the original dataset, creating new datasets of the same size COM3250 / c

14 Bagging Stands for bootstrap aggregating A process whereby a single classifier is constructed from a number of classifiers Each classifier is learned by applying a single learning scheme to multiple artificial training datasets that are derived from a single, original training dataset The artificial datasets are obtained by randomly sampling with replacement from the original dataset, creating new datasets of the same size The sampling procedure deletes some instances and replicates others E.g. a decison tree learner could be applied to k artificial datasets derived by this random sampling procedure, resulting in k decision trees COM3250 / d

15 Bagging Stands for bootstrap aggregating A process whereby a single classifier is constructed from a number of classifiers Each classifier is learned by applying a single learning scheme to multiple artificial training datasets that are derived from a single, original training dataset The artificial datasets are obtained by randomly sampling with replacement from the original dataset, creating new datasets of the same size The sampling procedure deletes some instances and replicates others E.g. a decison tree learner could be applied to k artificial datasets derived by this random sampling procedure, resulting in k decision trees The combined classifier works by applying each of the learned classifiers (e.g. the k decision trees) to novel instances and deciding their classification by majority vote COM3250 / e

16 Bagging Stands for bootstrap aggregating A process whereby a single classifier is constructed from a number of classifiers Each classifier is learned by applying a single learning scheme to multiple artificial training datasets that are derived from a single, original training dataset The artificial datasets are obtained by randomly sampling with replacement from the original dataset, creating new datasets of the same size The sampling procedure deletes some instances and replicates others E.g. a decison tree learner could be applied to k artificial datasets derived by this random sampling procedure, resulting in k decision trees The combined classifier works by applying each of the learned classifiers (e.g. the k decision trees) to novel instances and deciding their classification by majority vote For numeric prediction final values are determined by averaging classifier outputs COM3250 / f

17 A Bagging Algorithm Model Generation Let n be the number of instances in the training data For each of t iterations: Sample n instances with replacement from training data Apply the learning algorithm to the sample Store the resulting model Classification For each of the t models: Predict class of instance using model Return class that has been predicted most often Bagging produces a combined model that often performs significantly better than a single model built from the original data set and never performs substantially worse COM3250 /

18 Randomisation Bagging generates an ensemble of classifiers by introducing randomness into the learner s input COM3250 /

19 Randomisation Bagging generates an ensemble of classifiers by introducing randomness into the learner s input Some learning algorithms have randomness built-in For example, perceptrons start out with randomly assigned connections weights which are then adjusted during training One way to make such algorithms more stable is to run them several times with different random number seeds and combine classifier predictions by voting/averaging COM3250 / a

20 Randomisation Bagging generates an ensemble of classifiers by introducing randomness into the learner s input Some learning algorithms have randomness built-in For example, perceptrons start out with randomly assigned connections weights which are then adjusted during training One way to make such algorithms more stable is to run them several times with different random number seeds and combine classifier predictions by voting/averaging A random element can be added into most learning algorithms E.g. for decision trees instead of picking best attribute to split on at each node, randomly pick one of the best n attributes COM3250 / b

21 Randomisation Bagging generates an ensemble of classifiers by introducing randomness into the learner s input Some learning algorithms have randomness built-in For example, perceptrons start out with randomly assigned connections weights which are then adjusted during training One way to make such algorithms more stable is to run them several times with different random number seeds and combine classifier predictions by voting/averaging A random element can be added into most learning algorithms E.g. for decision trees instead of picking best attribute to split on at each node, randomly pick one of the best n attributes Randomisation requires more work than bagging, because the learning algorithm has to be modified; however it can be applied to a wider range of learners. For example: Bagging fails with stable learners those whose output is insensitive to small changes in input, such as knn However, randomisation can be applied by, e.g., selecting different randomly chosen subsets of attributes on which to base the classifiers COM3250 / c

22 Boosting Like bagging works by combining, via voting or averaging, multiple models produced by a single learning scheme COM3250 /

23 Boosting Like bagging works by combining, via voting or averaging, multiple models produced by a single learning scheme Unlike bagging does not derive models from artificially produced datasets generated by random sampling Instead builds models iteratively each model takes into account performance of the last COM3250 / a

24 Boosting Like bagging works by combining, via voting or averaging, multiple models produced by a single learning scheme Unlike bagging does not derive models from artificially produced datasets generated by random sampling Instead builds models iteratively each model takes into account performance of the last Boosting encourages subsequent models to emphasize examples badly handled by earlier ones builds classifiers whose strengths complement each other COM3250 / b

25 Boosting Like bagging works by combining, via voting or averaging, multiple models produced by a single learning scheme Unlike bagging does not derive models from artificially produced datasets generated by random sampling Instead builds models iteratively each model takes into account performance of the last Boosting encourages subsequent models to emphasize examples badly handled by earlier ones builds classifiers whose strengths complement each other In AdaBoost.M1 this is achieved by using the notion of weighted instance: Error is computed by taking into account the weights of misclassified instances rather than just the proportion of misclassified instances By increasing the weight of misclassified instances following the training of one model, the next model can be made to attend to these instances Final classification is determined by weighted voting across all the classifiers, where weighting is based on classifier performance in the AdaBoost.M1 case by error of the individual classifiers COM3250 / c

26 The ADABOOST.M1 algorithm Model Generation Assign equal weight to each training instance For each of t iterations: Apply learning algorithm to weighted dataset and store resulting model Compute error e of model on weighted dataset and store error If e = 0 or e 0.5 Then terminate model generation For each instance i in dataset If i classified correctly by model Then weight i weight i e/(1 e) Normalise weight of all instances so that their summed weight remains constant Classification Assign weight of zero to all classes For each of the t (or less) models: Add log(e/(1 e)) to weight of each class predicted by model Return class with highest weight COM3250 /

27 Boosting: Observations Boosting often performs substantially better than bagging COM3250 /

28 Boosting: Observations Boosting often performs substantially better than bagging Unlike bagging which never produces a combined classifier which is substantially worse than a single classifier built from the same data, boosting can sometimes do so (overfitting) COM3250 / a

29 Boosting: Observations Boosting often performs substantially better than bagging Unlike bagging which never produces a combined classifier which is substantially worse than a single classifier built from the same data, boosting can sometimes do so (overfitting) Interestingly performing more boosting iterations after error on training data has dropped to zero, can further improve performance on new test data Seems to contradict Occam s razor (prefer simpler hypothesis), since more iterations lead to more complex hypothesis which does not explain training data any better However, more iterations improves classifier s confidence in its predictions difference between estimated probability of true class and that of most likely predicted class other than true class (called the margin) COM3250 / b

30 Boosting: Observations Boosting often performs substantially better than bagging Unlike bagging which never produces a combined classifier which is substantially worse than a single classifier built from the same data, boosting can sometimes do so (overfitting) Interestingly performing more boosting iterations after error on training data has dropped to zero, can further improve performance on new test data Seems to contradict Occam s razor (prefer simpler hypothesis), since more iterations lead to more complex hypothesis which does not explain training data any better However, more iterations improves classifier s confidence in its predictions difference between estimated probability of true class and that of most likely predicted class other than true class (called the margin) Boosting allows powerful combined classifiers to be built from simple ones (provided they achieve < 50% error on reweighted data) Such simple learners are called weak learners Examples are learners such as decision stumps (one level decision trees) or OneR (single conjunctive rule) COM3250 / c

31 Boosting: Observations Boosting often performs substantially better than bagging Unlike bagging which never produces a combined classifier which is substantially worse than a single classifier built from the same data, boosting can sometimes do so (overfitting) Interestingly performing more boosting iterations after error on training data has dropped to zero, can further improve performance on new test data Seems to contradict Occam s razor (prefer simpler hypothesis), since more iterations lead to more complex hypothesis which does not explain training data any better However, more iterations improves classifier s confidence in its predictions difference between estimated probability of true class and that of most likely predicted class other than true class (called the margin) Boosting allows powerful combined classifiers to be built from simple ones (provided they achieve < 50% error on reweighted data) Such simple learners are called weak learners Examples are learners such as decision stumps (one level decision trees) or OneR (single conjunctive rule) Good example: Weka decision stump on mushroom data try without, then with, boosting COM3250 / d

32 Stacking (1) Bagging and boosting combine multiple models produced by one learning scheme COM3250 /

33 Stacking (1) Bagging and boosting combine multiple models produced by one learning scheme Stacking is normally used to combine models built by different learning algorithms COM3250 / a

34 Stacking (1) Bagging and boosting combine multiple models produced by one learning scheme Stacking is normally used to combine models built by different learning algorithms Rather than simply voting, stacking attempts to learn which are the reliable classifiers using a metalearner COM3250 / b

35 Stacking (1) Bagging and boosting combine multiple models produced by one learning scheme Stacking is normally used to combine models built by different learning algorithms Rather than simply voting, stacking attempts to learn which are the reliable classifiers using a metalearner Inputs to the metalearner are instances built of the outputs of level 0, or base level models COM3250 / c

36 Stacking (1) Bagging and boosting combine multiple models produced by one learning scheme Stacking is normally used to combine models built by different learning algorithms Rather than simply voting, stacking attempts to learn which are the reliable classifiers using a metalearner Inputs to the metalearner are instances built of the outputs of level 0, or base level models These level 1 instances consist of one attribute for each level 0 learner the class the level 0 learner predicts for the level instance 1 COM3250 / d

37 Stacking (1) Bagging and boosting combine multiple models produced by one learning scheme Stacking is normally used to combine models built by different learning algorithms Rather than simply voting, stacking attempts to learn which are the reliable classifiers using a metalearner Inputs to the metalearner are instances built of the outputs of level 0, or base level models These level 1 instances consist of one attribute for each level 0 learner the class the level 0 learner predicts for the level instance 1 From these instances the level 1 model makes the final prediction COM3250 / e

38 Stacking (1) Bagging and boosting combine multiple models produced by one learning scheme Stacking is normally used to combine models built by different learning algorithms Rather than simply voting, stacking attempts to learn which are the reliable classifiers using a metalearner Inputs to the metalearner are instances built of the outputs of level 0, or base level models These level 1 instances consist of one attribute for each level 0 learner the class the level 0 learner predicts for the level instance 1 From these instances the level 1 model makes the final prediction During training the level 1 model is given instances which are the level 0 predictions for level 0 instances plus the actual class of the instance However, if the predictions of the level 0 learners over the data they were trained on are used the result will be a metalearner trained to prefer classifiers that overfit the training data COM3250 / f

39 Stacking (2) In order to avoid overfitting the level 1 instances must either be formed from level 0 predictions over instances that were held out from level 0 training; or from predictions on the instances in the test folds, if cross-validation was used for training at level 0 COM3250 /

40 Stacking (2) In order to avoid overfitting the level 1 instances must either be formed from level 0 predictions over instances that were held out from level 0 training; or from predictions on the instances in the test folds, if cross-validation was used for training at level 0 Stacking can be extended to deal with level 0 classifiers that produce probability distributions over output class labels numeric prediction rather than classification COM3250 / a

41 Stacking (2) In order to avoid overfitting the level 1 instances must either be formed from level 0 predictions over instances that were held out from level 0 training; or from predictions on the instances in the test folds, if cross-validation was used for training at level 0 Stacking can be extended to deal with level 0 classifiers that produce probability distributions over output class labels numeric prediction rather than classification While any ML algorithms could be used at level 1, simple level 1 algorithms such as linear regression have proved best COM3250 / b

42 Using Unlabeled Data Labeled training data i.e. data with associated target class is always limited frequently requires extensive/expensive manual annnotation or cleaning However, large amounts of unlabeled data may be readily available pre-classified text hard to get (e.g. catalogued news articles) unclassified text very easy to get Is there any way we can utilise unlabeled training data to improve a classifier? COM3250 /

43 Using Unlabeled Data: Clustering for Classification One possibility is to couple a probabilistic classifier, such as Naïve Bayes classification, with Expectation-Maximisation (EM) iterative probabilistic clustering COM3250 /

44 Using Unlabeled Data: Clustering for Classification One possibility is to couple a probabilistic classifier, such as Naïve Bayes classification, with Expectation-Maximisation (EM) iterative probabilistic clustering Suppose we have labelled training data L + unlabelled training data U. Proceed as follows: train Naïve Bayes classifier on L repeat until convergence (E-step) Use current classsifer to estimate component mixture for each instance in U (i.e. probability that each mixture component generated each instance) (M-step) re-estimate the classifier using the estimated component mixture for each instance in L + U output a classifier that predicts labels for unlabelled instances (after Nigam et al. 2000) COM3250 / a

45 Using Unlabeled Data: Clustering for Classification One possibility is to couple a probabilistic classifier, such as Naïve Bayes classification, with Expectation-Maximisation (EM) iterative probabilistic clustering Suppose we have labelled training data L + unlabelled training data U. Proceed as follows: train Naïve Bayes classifier on L repeat until convergence (E-step) Use current classsifer to estimate component mixture for each instance in U (i.e. probability that each mixture component generated each instance) (M-step) re-estimate the classifier using the estimated component mixture for each instance in L + U output a classifier that predicts labels for unlabelled instances (after Nigam et al. 2000) Experiments show such a learner can attain equivalent performance to a traditional learner using < 1/3 the labeled training examples together with 5 times as many unlabeled examples COM3250 / b

46 Using Unlabeled Data: Co-training Suppose there are two independent perspectives or views (feature sets) on a classification task. E.g. for web page classification: the web page s content links to the web page from other pages COM3250 /

47 Using Unlabeled Data: Co-training Suppose there are two independent perspectives or views (feature sets) on a classification task. E.g. for web page classification: the web page s content links to the web page from other pages Co-training exploits these two perspectives: Train model A using perspective 1 on labelled data Train model B using perspective 2 on labelled data Label the unlabeled data using model A and model B separately For each model select the example it most confidently labels positively and the one it most confidently labels negatively and add these to pool of labeled examples Repeat the whole process training both models on the augmented pool of labeled examples until there are no more unlabeled examples COM3250 / b

48 Using Unlabeled Data: Co-training Suppose there are two independent perspectives or views (feature sets) on a classification task. E.g. for web page classification: the web page s content links to the web page from other pages Co-training exploits these two perspectives: Train model A using perspective 1 on labelled data Train model B using perspective 2 on labelled data Label the unlabeled data using model A and model B separately For each model select the example it most confidently labels positively and the one it most confidently labels negatively and add these to pool of labeled examples Repeat the whole process training both models on the augmented pool of labeled examples until there are no more unlabeled examples There is some experimental evidence to indicate co-training using Naïve Bayes as learner outperforms an approach which learns a single model using all features from both perspectives COM3250 / c

49 Using Unlabeled Data: Co-EM Co-EM trains model A using perspective 1 on the labeled data uses model A to probabilistically label all the unlabeled data trains model B using perspective 2 on the original labeled data + the unlabeled data tenatively labeled using model A uses model B to probabilistically relabel all the data for use in retraining model A the process iterates until the classifiers converge COM3250 /

50 Using Unlabeled Data: Co-EM Co-EM trains model A using perspective 1 on the labeled data uses model A to probabilistically label all the unlabeled data trains model B using perspective 2 on the original labeled data + the unlabeled data tenatively labeled using model A uses model B to probabilistically relabel all the data for use in retraining model A the process iterates until the classifiers converge Co-EM appears to perform consistently better than co-training (because it does not commit to class labels, but re-estimates their probabilities at each iteration) COM3250 / a

51 Using Unlabeled Data: Co-EM Co-EM trains model A using perspective 1 on the labeled data uses model A to probabilistically label all the unlabeled data trains model B using perspective 2 on the original labeled data + the unlabeled data tenatively labeled using model A uses model B to probabilistically relabel all the data for use in retraining model A the process iterates until the classifiers converge Co-EM appears to perform consistently better than co-training (because it does not commit to class labels, but re-estimates their probabilities at each iteration) Co-training/co-EM limited to applications where multiple perspectives on the data are available some recent evidence that this split perspective can be artificially manufactured (e.g. random selection of features, though feature independence preferred) some recent arguments/evidence that co-training using models derived by different classifiers (instead of from different feature sets) also works COM3250 / b

52 Summary Multiple learned models may be combined in various ways to produce classifiers whose performance is superior to a single model on its own: COM3250 /

53 Summary Multiple learned models may be combined in various ways to produce classifiers whose performance is superior to a single model on its own: Bagging trains multiple models using a single learning scheme trained on multiple training sets artificially derived from a single data set through random deletion and repitition of instances; final classification is arrived at by simple majority voting COM3250 / a

54 Summary Multiple learned models may be combined in various ways to produce classifiers whose performance is superior to a single model on its own: Bagging trains multiple models using a single learning scheme trained on multiple training sets artificially derived from a single data set through random deletion and repitition of instances; final classification is arrived at by simple majority voting Boosting builds multiple models using a single learning scheme iteratively over a single data set where the instances are re-weighted between iterations so that subsequent models pay more attention to instances misclassified by earlier models; final classification is arrived at by weighted voting of all classifiers, each vote weighted by classifier performance COM3250 / b

55 Summary Multiple learned models may be combined in various ways to produce classifiers whose performance is superior to a single model on its own: Bagging trains multiple models using a single learning scheme trained on multiple training sets artificially derived from a single data set through random deletion and repitition of instances; final classification is arrived at by simple majority voting Boosting builds multiple models using a single learning scheme iteratively over a single data set where the instances are re-weighted between iterations so that subsequent models pay more attention to instances misclassified by earlier models; final classification is arrived at by weighted voting of all classifiers, each vote weighted by classifier performance Stacking combines the models built by different learning schemes by training a metalearner to decide amongst the predictions of the base level learners COM3250 / c

56 Summary Multiple learned models may be combined in various ways to produce classifiers whose performance is superior to a single model on its own: Bagging trains multiple models using a single learning scheme trained on multiple training sets artificially derived from a single data set through random deletion and repitition of instances; final classification is arrived at by simple majority voting Boosting builds multiple models using a single learning scheme iteratively over a single data set where the instances are re-weighted between iterations so that subsequent models pay more attention to instances misclassified by earlier models; final classification is arrived at by weighted voting of all classifiers, each vote weighted by classifier performance Stacking combines the models built by different learning schemes by training a metalearner to decide amongst the predictions of the base level learners Unlabeled data can be utilised to improve the performance of classifiers, or to allow them to attain equivalent performance using less labeled (expensive) training data. Approaches include: Learning over probabilistically clustered unlabeled data (Naïve Bayes + EM) co-learning and co-em which assume different perspectives (feature views) over the same data with models/estimates iteratively improved over the unlabeled data COM3250 / d

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and

More information

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina

More information

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

(Sub)Gradient Descent

(Sub)Gradient Descent (Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

More information

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer

More information

Softprop: Softmax Neural Network Backpropagation Learning

Softprop: Softmax Neural Network Backpropagation Learning Softprop: Softmax Neural Networ Bacpropagation Learning Michael Rimer Computer Science Department Brigham Young University Provo, UT 84602, USA E-mail: mrimer@axon.cs.byu.edu Tony Martinez Computer Science

More information

Semi-Supervised Face Detection

Semi-Supervised Face Detection Semi-Supervised Face Detection Nicu Sebe, Ira Cohen 2, Thomas S. Huang 3, Theo Gevers Faculty of Science, University of Amsterdam, The Netherlands 2 HP Research Labs, USA 3 Beckman Institute, University

More information

Discriminative Learning of Beam-Search Heuristics for Planning

Discriminative Learning of Beam-Search Heuristics for Planning Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Lecture 1: Basic Concepts of Machine Learning

Lecture 1: Basic Concepts of Machine Learning Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010

More information

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Active Learning. Yingyu Liang Computer Sciences 760 Fall Active Learning Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven,

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18 Version Space Javier Béjar cbea LSI - FIB Term 2012/2013 Javier Béjar cbea (LSI - FIB) Version Space Term 2012/2013 1 / 18 Outline 1 Learning logical formulas 2 Version space Introduction Search strategy

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

CSL465/603 - Machine Learning

CSL465/603 - Machine Learning CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Using Web Searches on Important Words to Create Background Sets for LSI Classification Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract

More information

Cooperative evolutive concept learning: an empirical study

Cooperative evolutive concept learning: an empirical study Cooperative evolutive concept learning: an empirical study Filippo Neri University of Piemonte Orientale Dipartimento di Scienze e Tecnologie Avanzate Piazza Ambrosoli 5, 15100 Alessandria AL, Italy Abstract

More information

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Jana Kitzmann and Dirk Schiereck, Endowed Chair for Banking and Finance, EUROPEAN BUSINESS SCHOOL, International

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning Hendrik Blockeel and Joaquin Vanschoren Computer Science Dept., K.U.Leuven, Celestijnenlaan 200A, 3001 Leuven, Belgium

More information

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4 Chapters 1-5 Cumulative Assessment AP Statistics Name: November 2008 Gillespie, Block 4 Part I: Multiple Choice This portion of the test will determine 60% of your overall test grade. Each question is

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

CS 446: Machine Learning

CS 446: Machine Learning CS 446: Machine Learning Introduction to LBJava: a Learning Based Programming Language Writing classifiers Christos Christodoulopoulos Parisa Kordjamshidi Motivation 2 Motivation You still have not learnt

More information

Universidade do Minho Escola de Engenharia

Universidade do Minho Escola de Engenharia Universidade do Minho Escola de Engenharia Universidade do Minho Escola de Engenharia Dissertação de Mestrado Knowledge Discovery is the nontrivial extraction of implicit, previously unknown, and potentially

More information

Reinforcement Learning by Comparing Immediate Reward

Reinforcement Learning by Comparing Immediate Reward Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate

More information

Activity Recognition from Accelerometer Data

Activity Recognition from Accelerometer Data Activity Recognition from Accelerometer Data Nishkam Ravi and Nikhil Dandekar and Preetham Mysore and Michael L. Littman Department of Computer Science Rutgers University Piscataway, NJ 08854 {nravi,nikhild,preetham,mlittman}@cs.rutgers.edu

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

Chapter 2 Rule Learning in a Nutshell

Chapter 2 Rule Learning in a Nutshell Chapter 2 Rule Learning in a Nutshell This chapter gives a brief overview of inductive rule learning and may therefore serve as a guide through the rest of the book. Later chapters will expand upon the

More information

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Exploration. CS : Deep Reinforcement Learning Sergey Levine Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?

More information

Model Ensemble for Click Prediction in Bing Search Ads

Model Ensemble for Click Prediction in Bing Search Ads Model Ensemble for Click Prediction in Bing Search Ads Xiaoliang Ling Microsoft Bing xiaoling@microsoft.com Hucheng Zhou Microsoft Research huzho@microsoft.com Weiwei Deng Microsoft Bing dedeng@microsoft.com

More information

A survey of multi-view machine learning

A survey of multi-view machine learning Noname manuscript No. (will be inserted by the editor) A survey of multi-view machine learning Shiliang Sun Received: date / Accepted: date Abstract Multi-view learning or learning with multiple distinct

More information

School Size and the Quality of Teaching and Learning

School Size and the Quality of Teaching and Learning School Size and the Quality of Teaching and Learning An Analysis of Relationships between School Size and Assessments of Factors Related to the Quality of Teaching and Learning in Primary Schools Undertaken

More information

Algebra 2- Semester 2 Review

Algebra 2- Semester 2 Review Name Block Date Algebra 2- Semester 2 Review Non-Calculator 5.4 1. Consider the function f x 1 x 2. a) Describe the transformation of the graph of y 1 x. b) Identify the asymptotes. c) What is the domain

More information

Generative models and adversarial training

Generative models and adversarial training Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?

More information

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models 1 Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models James B.

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

A Version Space Approach to Learning Context-free Grammars

A Version Space Approach to Learning Context-free Grammars Machine Learning 2: 39~74, 1987 1987 Kluwer Academic Publishers, Boston - Manufactured in The Netherlands A Version Space Approach to Learning Context-free Grammars KURT VANLEHN (VANLEHN@A.PSY.CMU.EDU)

More information

Online Updating of Word Representations for Part-of-Speech Tagging

Online Updating of Word Representations for Part-of-Speech Tagging Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org

More information

arxiv: v1 [cs.lg] 15 Jun 2015

arxiv: v1 [cs.lg] 15 Jun 2015 Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy arxiv:1506.04477v1 [cs.lg] 15 Jun 2015 Sang-Woo Lee Min-Oh Heo School of Computer Science and

More information

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion

More information

Probability and Statistics Curriculum Pacing Guide

Probability and Statistics Curriculum Pacing Guide Unit 1 Terms PS.SPMJ.3 PS.SPMJ.5 Plan and conduct a survey to answer a statistical question. Recognize how the plan addresses sampling technique, randomization, measurement of experimental error and methods

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE Pierre Foy TIMSS Advanced 2015 orks User Guide for the International Database Pierre Foy Contributors: Victoria A.S. Centurino, Kerry E. Cotter,

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

Optimizing to Arbitrary NLP Metrics using Ensemble Selection Optimizing to Arbitrary NLP Metrics using Ensemble Selection Art Munson, Claire Cardie, Rich Caruana Department of Computer Science Cornell University Ithaca, NY 14850 {mmunson, cardie, caruana}@cs.cornell.edu

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Applications of data mining algorithms to analysis of medical data

Applications of data mining algorithms to analysis of medical data Master Thesis Software Engineering Thesis no: MSE-2007:20 August 2007 Applications of data mining algorithms to analysis of medical data Dariusz Matyja School of Engineering Blekinge Institute of Technology

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview Algebra 1, Quarter 3, Unit 3.1 Line of Best Fit Overview Number of instructional days 6 (1 day assessment) (1 day = 45 minutes) Content to be learned Analyze scatter plots and construct the line of best

More information

Lecture 10: Reinforcement Learning

Lecture 10: Reinforcement Learning Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation

More information

Truth Inference in Crowdsourcing: Is the Problem Solved?

Truth Inference in Crowdsourcing: Is the Problem Solved? Truth Inference in Crowdsourcing: Is the Problem Solved? Yudian Zheng, Guoliang Li #, Yuanbing Li #, Caihua Shan, Reynold Cheng # Department of Computer Science, Tsinghua University Department of Computer

More information

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE EE-589 Introduction to Neural Assistant Prof. Dr. Turgay IBRIKCI Room # 305 (322) 338 6868 / 139 Wensdays 9:00-12:00 Course Outline The course is divided in two parts: theory and practice. 1. Theory covers

More information

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017 Instructor Syed Zahid Ali Room No. 247 Economics Wing First Floor Office Hours Email szahid@lums.edu.pk Telephone Ext. 8074 Secretary/TA TA Office Hours Course URL (if any) Suraj.lums.edu.pk FINN 321 Econometrics

More information

Calibration of Confidence Measures in Speech Recognition

Calibration of Confidence Measures in Speech Recognition Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

More information

Multi-label classification via multi-target regression on data streams

Multi-label classification via multi-target regression on data streams Mach Learn (2017) 106:745 770 DOI 10.1007/s10994-016-5613-5 Multi-label classification via multi-target regression on data streams Aljaž Osojnik 1,2 Panče Panov 1 Sašo Džeroski 1,2,3 Received: 26 April

More information

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working

More information

Content-based Image Retrieval Using Image Regions as Query Examples

Content-based Image Retrieval Using Image Regions as Query Examples Content-based Image Retrieval Using Image Regions as Query Examples D. N. F. Awang Iskandar James A. Thom S. M. M. Tahaghoghi School of Computer Science and Information Technology, RMIT University Melbourne,

More information

Introduction to Simulation

Introduction to Simulation Introduction to Simulation Spring 2010 Dr. Louis Luangkesorn University of Pittsburgh January 19, 2010 Dr. Louis Luangkesorn ( University of Pittsburgh ) Introduction to Simulation January 19, 2010 1 /

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Ch 2 Test Remediation Work Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Provide an appropriate response. 1) High temperatures in a certain

More information

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute

More information

STAT 220 Midterm Exam, Friday, Feb. 24

STAT 220 Midterm Exam, Friday, Feb. 24 STAT 220 Midterm Exam, Friday, Feb. 24 Name Please show all of your work on the exam itself. If you need more space, use the back of the page. Remember that partial credit will be awarded when appropriate.

More information

arxiv: v2 [cs.cv] 30 Mar 2017

arxiv: v2 [cs.cv] 30 Mar 2017 Domain Adaptation for Visual Applications: A Comprehensive Survey Gabriela Csurka arxiv:1702.05374v2 [cs.cv] 30 Mar 2017 Abstract The aim of this paper 1 is to give an overview of domain adaptation and

More information

FRAMEWORK FOR IDENTIFYING THE MOST LIKELY SUCCESSFUL UNDERPRIVILEGED TERTIARY STUDY BURSARY APPLICANTS

FRAMEWORK FOR IDENTIFYING THE MOST LIKELY SUCCESSFUL UNDERPRIVILEGED TERTIARY STUDY BURSARY APPLICANTS South African Journal of Industrial Engineering August 2017 Vol 28(2), pp 59-77 FRAMEWORK FOR IDENTIFYING THE MOST LIKELY SUCCESSFUL UNDERPRIVILEGED TERTIARY STUDY BURSARY APPLICANTS R. Steynberg 1 * #,

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Probability estimates in a scenario tree

Probability estimates in a scenario tree 101 Chapter 11 Probability estimates in a scenario tree An expert is a person who has made all the mistakes that can be made in a very narrow field. Niels Bohr (1885 1962) Scenario trees require many numbers.

More information

Laboratorio di Intelligenza Artificiale e Robotica

Laboratorio di Intelligenza Artificiale e Robotica Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad

More information

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees Mariusz Łapczy ski 1 and Bartłomiej Jefma ski 2 1 The Chair of Market Analysis and Marketing Research,

More information

Universityy. The content of

Universityy. The content of WORKING PAPER #31 An Evaluation of Empirical Bayes Estimation of Value Added Teacher Performance Measuress Cassandra M. Guarino, Indianaa Universityy Michelle Maxfield, Michigan State Universityy Mark

More information

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF Read Online and Download Ebook ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF Click link bellow and free register to download

More information

Team Formation for Generalized Tasks in Expertise Social Networks

Team Formation for Generalized Tasks in Expertise Social Networks IEEE International Conference on Social Computing / IEEE International Conference on Privacy, Security, Risk and Trust Team Formation for Generalized Tasks in Expertise Social Networks Cheng-Te Li Graduate

More information

Analysis of Enzyme Kinetic Data

Analysis of Enzyme Kinetic Data Analysis of Enzyme Kinetic Data To Marilú Analysis of Enzyme Kinetic Data ATHEL CORNISH-BOWDEN Directeur de Recherche Émérite, Centre National de la Recherche Scientifique, Marseilles OXFORD UNIVERSITY

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

Indian Institute of Technology, Kanpur

Indian Institute of Technology, Kanpur Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar

More information

Using focal point learning to improve human machine tacit coordination

Using focal point learning to improve human machine tacit coordination DOI 10.1007/s10458-010-9126-5 Using focal point learning to improve human machine tacit coordination InonZuckerman SaritKraus Jeffrey S. Rosenschein The Author(s) 2010 Abstract We consider an automated

More information

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point. STT 231 Test 1 Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point. 1. A professor has kept records on grades that students have earned in his class. If he

More information

Beyond the Pipeline: Discrete Optimization in NLP

Beyond the Pipeline: Discrete Optimization in NLP Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

Introduction to Causal Inference. Problem Set 1. Required Problems

Introduction to Causal Inference. Problem Set 1. Required Problems Introduction to Causal Inference Problem Set 1 Professor: Teppei Yamamoto Due Friday, July 15 (at beginning of class) Only the required problems are due on the above date. The optional problems will not

More information

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS L. Descalço 1, Paula Carvalho 1, J.P. Cruz 1, Paula Oliveira 1, Dina Seabra 2 1 Departamento de Matemática, Universidade de Aveiro (PORTUGAL)

More information

Postprint.

Postprint. http://www.diva-portal.org Postprint This is the accepted version of a paper presented at CLEF 2013 Conference and Labs of the Evaluation Forum Information Access Evaluation meets Multilinguality, Multimodality,

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

NCEO Technical Report 27

NCEO Technical Report 27 Home About Publications Special Topics Presentations State Policies Accommodations Bibliography Teleconferences Tools Related Sites Interpreting Trends in the Performance of Special Education Students

More information

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH Don McAllaster, Larry Gillick, Francesco Scattone, Mike Newman Dragon Systems, Inc. 320 Nevada Street Newton, MA 02160

More information

AUTHORITATIVE SOURCES ADULT AND COMMUNITY LEARNING LEARNING PROGRAMMES

AUTHORITATIVE SOURCES ADULT AND COMMUNITY LEARNING LEARNING PROGRAMMES AUTHORITATIVE SOURCES ADULT AND COMMUNITY LEARNING LEARNING PROGRAMMES AUGUST 2001 Contents Sources 2 The White Paper Learning to Succeed 3 The Learning and Skills Council Prospectus 5 Post-16 Funding

More information

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention Damien Teney 1, Peter Anderson 2*, David Golub 4*, Po-Sen Huang 3, Lei Zhang 3, Xiaodong He 3, Anton van den Hengel 1 1

More information

Gender and socioeconomic differences in science achievement in Australia: From SISS to TIMSS

Gender and socioeconomic differences in science achievement in Australia: From SISS to TIMSS Gender and socioeconomic differences in science achievement in Australia: From SISS to TIMSS, Australian Council for Educational Research, thomson@acer.edu.au Abstract Gender differences in science amongst

More information

Evolution of Symbolisation in Chimpanzees and Neural Nets

Evolution of Symbolisation in Chimpanzees and Neural Nets Evolution of Symbolisation in Chimpanzees and Neural Nets Angelo Cangelosi Centre for Neural and Adaptive Systems University of Plymouth (UK) a.cangelosi@plymouth.ac.uk Introduction Animal communication

More information