# Linear Regression. Chapter Introduction

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Chapter 9 Linear Regression 9.1 Introduction In this class, we have looked at a variety of di erent models and learning methods, such as finite state machines, sequence models, and classification methods. However, in all of the cases that we have looked at so far, the unobserved variables we have been trying to predict have been finite and discrete. When we looked at Naive Bayes, we tried to predict if something was in a positive or a negative class. When we looked at Hidden Markov Models and Conditional Random Fields, we tried to figure out the part-ofspeech tags of individual words. These methods have been attempting to predict which class a particular latent variable belongs to. For Naive Bayes, we motivated out problem with a decision between two classes. Though that number is arbitrary and you could easily add a third class called neutral, or have a problem that naturally has dozens of classes. For our part-of-speech sequence tagging in English, we often choose from a set of 36 tags. The common thread is that in all of these methods, we are trying to choose a set number of classes. We now move on to look at what happens when we care about predicting a value that is not from a discrete set of possibilities, but rather, is continuous. For instance, if we are given an essay written in English, rather than predicting a letter grade, could we instead predict the percent score? Or, given a movie review, could we predict the average number of stars users rate it, rather than just saying if the review is positive or negative? 1

2 2 CHAPTER 9. LINEAR REGRESSION 9.2 Linear Regression Linear regression is a powerful tool used for predicting continuous variable outputs. In general, to predict an output that is continuous, the goal is to find a function that transforms an input into an output. Linear regression is simply the method that finds a solution to this problem by finding a linear function. Generally, given an input x, we are trying to find a function, f(x), that predicts an output y. This is the same way to formulate many of the learning methods that we have discussed in this class. We would like to find f(x) = y for some observed data x that predicts unknown data y. In methods discussed previously, like Naive Bayes, y is often a class, such as 0 or 1, and cannot take any other values. For linear regression, we still would like to find a function, but now y can take on a range of values Supervised Learning The majority of the models that we have discussed in this class are supervised, and linear regression is no di erent. Supervised learning models are simply methods that are trained using a data set where the values we are trying to predict (y) are known. After training on data where the values are known, the model tries to predict values for data that is unknown. In the classification tasks we have looked at (Naive Bayes, Logistic Regression, the Perceptron, and Topic Modeling) all of them are Supervised Learning methods except for Topic Modeling. In your homework, you have always been given supervised learning problems where the data contains a training file. You then report your predicted output compared to the the goldstandard labels in your testing file. Linear Regression is no di erent. Again, it is a method for predicting unseen values on a test set, having been given known values in a training set. The di erence is that the output values are continuous are our learned function is linear. 9.3 Linear Models Formally, we define Linear Regression as predicting a value ŷ 2 R from m di erent features x 2 R m. We would like to learn a function, f(x) =y that is in a linear form. Specifically, we are interested in finding: ŷ = 0 + x

3 9.3. LINEAR MODELS 3 Given training n training examples hx i,y i i for 1 apple i apple n, where each x has m features, our goal is to estimate the parameters h 0, i Parameter Estimation The goal in linear regression is to find a line that generalizes our data well. In other words, we would like to find h 0, i that minimize the error in our training set. We do this by minimizing the sum of the squared errors: 1 ˆ = arg min h 0, i2n = nx (y i ( 0 + x i )) 2 i=1 The intuition here is that we would like to minimize the di erence between y i and the value predicted by 0 + x i. In other words, x i is linearly transformed by and 0 to give a predicted value for y i. Across all n training examples, we want the sum of the di erence between our predicted and actual values to be at a minimum. We need to estimate the values of our. To do this, we take the cost function we just defined and apply the gradient descent algorithm to our problem. This requires taking a partial derivative with respect to each of our m number of s. After some algebraic manipulation we are left with the LMS or Least Mean Squares update rule. For more information on gradient descent and deriving the update rules, the text book Pattern Recognition and Machine Learning has some nice explanations beyond the scope of this lecture (Bishop 2009).

4 4 CHAPTER 9. LINEAR REGRESSION Features In our definition, we said that we would like to find y given m features. We must define a set number of features and the assumption in linear regression is that they are independent. Linear regression then finds a linear combination of these features that best predicts our training data. We have defined our general function to be: ŷ = 0 + x For the case where m = 1, or in other words, we only have one feature, our function is merely: ŷ = 0 + x 1 1 This is simply the function for a line y = mx + b where m = 1 and b = 0. Simply put we are mapping a single dimensional input to another dimension (x to y). We defined x 2 R m and for the case where m = 1, we are simply defining x to take a real value. For the slightly more di cult case where m = 2, we are now mapping values from a 2-dimensional space into another dimension. In other words, with 2 features for each data point, our function reduces to: ŷ = 0 + x x 2 2 This is also a line (hence the term linear regression). However, this is now in 3-dimensional space. In general, we are mapping from an m-dimensional space to a single dimensional, continuous value space. NLP Features In some form or another, many of you will be familiar with linear regression, even if that is from a high school science class where you fit a line to some data points in an experiment. However, most of you will be familiar with this problem from an application di erent from Natural Language Processing, particularly one where your x values are real numbers, in a continuous space themselves. To make use of linear regression in NLP, we must also make sure that our values are real numbers. That is why we defined x 2 R m. Mapping our features to real values is actually not a di cult problem, and many features we are interested in are already real numbers. For instance, getting computers to automatically grade essays is a topic of increasing interest, especially for standardized tests. We may hypothesize

6 6 CHAPTER 9. LINEAR REGRESSION or a movie review from 0-10, we could possibly get values that are negative or even above our range. That is just an issue we need to be aware of. As a simple post-processing step we can clip the value using min and max functions to make sure we stay in the appropriate ranges. However, in general, if our training data is representative of our testing data, it should only be an issue for a very small number of cases. Along these lines, even though linear regression predicts a real valued output, we can still use it in some classification tasks. If there is any ordinality among the classes, we can use linear regression and then map the output to a class. For instance, let s say that we have training essays that are given grades of A, B, C, and D. We can define A to be scores of , B as 80-89, C as 70-79, and D as We then take our training data and take the midpoints of these ranges, so any A essay would be 95. Assuming our data is distributed rather evenly, or that grades average to the middle of the range (the teacher thought the work was A material on average for A grades, not that all A s were just borderline B s), we can use this as our training data. We have mapped classes (grades) into numerical scores. We can then use linear regression to predict values for testing data, with a simple post-processing step of mapping the numerical score back to a grade letter. In general, you can do this with any sort of data that is ordinal, regardless of if it is a classification task, and depending on your problem, it may actually yield better results than some classification methods. 9.4 Overfitting In general, when we are trying to learn models, we need to worry about overfitting in our data. Recall that overfitting is when we make our model fit our training data so perfectly that it does not generalize well to our testing data. Most real world data sets are noisy. If our trained model fits the data too well, we have modeled our parameters as if there was no noise, so that our model will not fit our testing data as well. There are multiple di erent ways to deal with the overfitting problem and regularization is one very common method of doing so. When learning a model from data, we always have to be careful about overfitting, but the problem is particularly prevalent in Natural Language Processing. We have formally defined this problem such that we have m different features for our x values. If m gets too large relative to the n training examples we have, we will necessarily overfit our data as each parameter we learn for x will tend towards fitting just one of our n examples perfectly.

7 9.4. OVERFITTING 7 So, we should make sure to always choose m<<n. The reason the potential to overfit is so prevalent in Natural Language Processing is due to how we often choose features. For instance, in many of the methods that we have looked at so far, we choose a vocabulary size and treat each word as an independent feature. It is not uncommon to have tens of thousands of unique words in even a modest sized training corpus. If we assign each word a unique feature, our features (m) can easily outgrow the number of training examples (n). Regardless of any other ways we try to prevent overfitting, such as regularization, we must be aware of our feature set size and choose an appropriate feature set initially Regularization We have talked about regularization earlier in this class when we discussed Logistic Regression. Regularization attempts to prevent overfitting our training data, when learning a model, by adding an additional penalty term to our objective function. This added term acts orthogonally to the first term in our objective function which is modeled on the data. There are whole classes of regularizers, but in general, they aim to impose penalties on our parameters. Often these have the goal of driving as many of our parameters to 0 (or as close to it as possible) without degrading our performance on our training data. To implement regularization, when defining our objective function for parameter estimation, we include a regularization term. Generally, we give it a weight which is either given (often through trial and error) or tuned,

8 8 CHAPTER 9. LINEAR REGRESSION often with a held out set of data or cross validation. We choose a regularization term based on some desired properties. l 2 regularization is one of the most commonly used regularization methods. l 1 is also frequently chosen as a regularizer due to its property of driving many of the parameters to 0. l 2 Regularization l 2 regularization is simply the Euclidean distance between the origin to a point in our m dimensional space. The value of each of our parameters is squared, summed together, and finally the square root is taken. This can easily be interpreted as the distance metric commonly taught in grade school. Here s our objective function modified with the addition of an l 2 regularizer. 1 ˆ = arg min h 0, i2n = v nx uut X m (y i ( 0 + x i )) i=1 j=1 2 j l 1 Regularization l 1 regularization is the Taxicab or Manhattan distance. It is the distance if you can only move along a single axis at a time. Think of it as a taxi in Manhattan that must drive down an avenue and then down a street rather than going diagonal through a block. This regularizer has the nice property of making many of our parameters go to 0. In linear regression, we have made the assumption that our features are independent. One of the intuitions behind l 1 regularization is that if two features are actually dependent, one of them will be driven to zero. Again, here is our updated objective function with l 1 regularization. References 1 ˆ = arg min h 0, i2n = nx (y i ( 0 + x i )) 2 + i=1 mx j j=1 Bishop, Christopher M. (2009). Pattern Recognition and Machine Learning 8th edition. Springer Publishing.

### COMP 551 Applied Machine Learning Lecture 11: Ensemble learning

COMP 551 Applied Machine Learning Lecture 11: Ensemble learning Instructor: Herke van Hoof (herke.vanhoof@mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp551

### COMP 551 Applied Machine Learning Lecture 12: Ensemble learning

COMP 551 Applied Machine Learning Lecture 12: Ensemble learning Associate Instructor: Herke van Hoof (herke.vanhoof@mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551

### Machine Learning and Applications in Finance

Machine Learning and Applications in Finance Christian Hesse 1,2,* 1 Autobahn Equity Europe, Global Markets Equity, Deutsche Bank AG, London, UK christian-a.hesse@db.com 2 Department of Computer Science,

### Linear Models Continued: Perceptron & Logistic Regression

Linear Models Continued: Perceptron & Logistic Regression CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Linear Models for Classification Feature function

### Session 1: Gesture Recognition & Machine Learning Fundamentals

IAP Gesture Recognition Workshop Session 1: Gesture Recognition & Machine Learning Fundamentals Nicholas Gillian Responsive Environments, MIT Media Lab Tuesday 8th January, 2013 My Research My Research

### CS534 Machine Learning

CS534 Machine Learning Spring 2013 Lecture 1: Introduction to ML Course logistics Reading: The discipline of Machine learning by Tom Mitchell Course Information Instructor: Dr. Xiaoli Fern Kec 3073, xfern@eecs.oregonstate.edu

### Classification of News Articles Using Named Entities with Named Entity Recognition by Neural Network

Classification of News Articles Using Named Entities with Named Entity Recognition by Neural Network Nick Latourette and Hugh Cunningham 1. Introduction Our paper investigates the use of named entities

### 10701/15781 Machine Learning, Spring 2005: Homework 1

10701/15781 Machine Learning, Spring 2005: Homework 1 Due: Monday, February 6, beginning of the class 1 [15 Points] Probability and Regression [Stano] 1 1.1 [10 Points] The Matrix Strikes Back The Matrix

### Stay Alert!: Creating a Classifier to Predict Driver Alertness in Real-time

Stay Alert!: Creating a Classifier to Predict Driver Alertness in Real-time Aditya Sarkar, Julien Kawawa-Beaudan, Quentin Perrot Friday, December 11, 2014 1 Problem Definition Driving while drowsy inevitably

### Big Data Analytics Clustering and Classification

E6893 Big Data Analytics Lecture 4: Big Data Analytics Clustering and Classification Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science September 28th, 2017 1

### The Generalized Delta Rule and Practical Considerations

The Generalized Delta Rule and Practical Considerations Introduction to Neural Networks : Lecture 6 John A. Bullinaria, 2004 1. Training a Single Layer Feed-forward Network 2. Deriving the Generalized

### Lecture 1: Machine Learning Basics

1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

### Linear Regression: Predicting House Prices

Linear Regression: Predicting House Prices I am big fan of Kalid Azad writings. He has a knack of explaining hard mathematical concepts like Calculus in simple words and helps the readers to get the intuition

### P(A, B) = P(A B) = P(A) + P(B) - P(A B)

AND Probability P(A, B) = P(A B) = P(A) + P(B) - P(A B) P(A B) = P(A) + P(B) - P(A B) Area = Probability of Event AND Probability P(A, B) = P(A B) = P(A) + P(B) - P(A B) If, and only if, A and B are independent,

### Machine Learning 2nd Edition

INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

### Detection of Insults in Social Commentary

Detection of Insults in Social Commentary CS 229: Machine Learning Kevin Heh December 13, 2013 1. Introduction The abundance of public discussion spaces on the Internet has in many ways changed how we

### Machine Learning for Computer Vision

Prof. Daniel Cremers Machine Learning for Computer PD Dr. Rudolph Triebel Lecturers PD Dr. Rudolph Triebel rudolph.triebel@in.tum.de Room number 02.09.058 (Fridays) Main lecture MSc. Ioannis John Chiotellis

### STA 414/2104 Statistical Methods for Machine Learning and Data Mining

STA 414/2104 Statistical Methods for Machine Learning and Data Mining Radford M. Neal, University of Toronto, 2014 Week 1 What are Machine Learning and Data Mining? Typical Machine Learning and Data Mining

### CSE 258 Lecture 3. Web Mining and Recommender Systems. Supervised learning Classification

CSE 258 Lecture 3 Web Mining and Recommender Systems Supervised learning Classification Last week Last week we started looking at supervised learning problems Last week We studied linear regression, in

### Lecture 1. Introduction Bastian Leibe Visual Computing Institute RWTH Aachen University

Advanced Machine Learning Lecture 1 Introduction 20.10.2015 Bastian Leibe Visual Computing Institute RWTH Aachen University http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de Organization Lecturer

(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

### Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

### Exploration vs. Exploitation. CS 473: Artificial Intelligence Reinforcement Learning II. How to Explore? Exploration Functions

CS 473: Artificial Intelligence Reinforcement Learning II Exploration vs. Exploitation Dieter Fox / University of Washington [Most slides were taken from Dan Klein and Pieter Abbeel / CS188 Intro to AI

### Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

### Unsupervised Learning: Clustering

Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning

### CS545 Machine Learning

Machine learning and related fields CS545 Machine Learning Course Introduction Machine learning: the construction and study of systems that learn from data. Pattern recognition: the same field, different

### Ensemble Learning CS534

Ensemble Learning CS534 Ensemble Learning How to generate ensembles? There have been a wide range of methods developed We will study to popular approaches Bagging Boosting Both methods take a single (base)

### Statistical methods in NLP Classication

Statistical methods in NLP Classication UNIVERSITY OF Richard Johansson February 4, 2016 overview of today's lecture classication: general ideas Naive Bayes recap formulation, estimation Naive Bayes as

### COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection.

COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection. Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551

### Programming Social Robots for Human Interaction. Lecture 4: Machine Learning and Pattern Recognition

Programming Social Robots for Human Interaction Lecture 4: Machine Learning and Pattern Recognition Zheng-Hua Tan Dept. of Electronic Systems, Aalborg Univ., Denmark zt@es.aau.dk, http://kom.aau.dk/~zt

### Machine Learning : Hinge Loss

Machine Learning Hinge Loss 16/01/2014 Machine Learning : Hinge Loss Recap tasks considered before Let a training dataset be given with (i) data and (ii) classes The goal is to find a hyper plane that

### Session 4: Regularization (Chapter 7)

Session 4: Regularization (Chapter 7) Tapani Raiko Aalto University 30 September 2015 Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September 2015 1 / 27 Table of Contents Background

### Classification with Deep Belief Networks. HussamHebbo Jae Won Kim

Classification with Deep Belief Networks HussamHebbo Jae Won Kim Table of Contents Introduction... 3 Neural Networks... 3 Perceptron... 3 Backpropagation... 4 Deep Belief Networks (RBM, Sigmoid Belief

### 20.3 The EM algorithm

20.3 The EM algorithm Many real-world problems have hidden (latent) variables, which are not observable in the data that are available for learning Including a latent variable into a Bayesian network may

### Learning Agents: Introduction

Learning Agents: Introduction S Luz luzs@cs.tcd.ie October 28, 2014 Learning in agent architectures Agent Learning in agent architectures Agent Learning in agent architectures Agent perception Learning

### TTIC 31190: Natural Language Processing

TTIC 31190: Natural Language Processing Kevin Gimpel Winter 2016 Lecture 15: Introduction to Machine Translation Announcements Assignment 3 due Monday email me to sign up for your (10-minute) class presentation

### Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and

### Word Sense Determination from Wikipedia. Data Using a Neural Net

1 Word Sense Determination from Wikipedia Data Using a Neural Net CS 297 Report Presented to Dr. Chris Pollett Department of Computer Science San Jose State University By Qiao Liu May 2017 Word Sense Determination

### A Characterization of Prediction Errors

A Characterization of Prediction Errors Christopher Meek Microsoft Research One Microsoft Way Redmond, WA 98052 Abstract Understanding prediction errors and determining how to fix them is critical to building

### W4240 Data Mining. Frank Wood. September 6, 2010

W4240 Data Mining Frank Wood September 6, 2010 Introduction Data mining is the search for patterns in large collections of data Learning models Applying models to large quantities of data Pattern recognition

### Inductive Learning and Decision Trees

Inductive Learning and Decision Trees Doug Downey EECS 349 Spring 2017 with slides from Pedro Domingos, Bryan Pardo Outline Announcements Homework #1 was assigned on Monday (due in five days!) Inductive

### Speech Accent Classification

Speech Accent Classification Corey Shih ctshih@stanford.edu 1. Introduction English is one of the most prevalent languages in the world, and is the one most commonly used for communication between native

### Python Machine Learning

Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

### Modelling Student Knowledge as a Latent Variable in Intelligent Tutoring Systems: A Comparison of Multiple Approaches

Modelling Student Knowledge as a Latent Variable in Intelligent Tutoring Systems: A Comparison of Multiple Approaches Qandeel Tariq, Alex Kolchinski, Richard Davis December 6, 206 Introduction This paper

### 18 LEARNING FROM EXAMPLES

18 LEARNING FROM EXAMPLES An intelligent agent may have to learn, for instance, the following components: A direct mapping from conditions on the current state to actions A means to infer relevant properties

### Computer Vision for Card Games

Computer Vision for Card Games Matias Castillo matiasct@stanford.edu Benjamin Goeing bgoeing@stanford.edu Jesper Westell jesperw@stanford.edu Abstract For this project, we designed a computer vision program

### Automatic Text Summarization for Annotating Images

Automatic Text Summarization for Annotating Images Gediminas Bertasius November 24, 2013 1 Introduction With an explosion of image data on the web, automatic image annotation has become an important area

### Article from. Predictive Analytics and Futurism December 2015 Issue 12

Article from Predictive Analytics and Futurism December 2015 Issue 12 The Third Generation of Neural Networks By Jeff Heaton Neural networks are the phoenix of artificial intelligence. Right now neural

### CS 510: Lecture 8. Deep Learning, Fairness, and Bias

CS 510: Lecture 8 Deep Learning, Fairness, and Bias Next Week All Presentations, all the time Upload your presentation before class if using slides Sign up for a timeslot google doc, if you haven t already

### Vector Space Models (VSM) and Information Retrieval (IR)

Vector Space Models (VSM) and Information Retrieval (IR) T-61.5020 Statistical Natural Language Processing 24 Feb 2016 Mari-Sanna Paukkeri, D. Sc. (Tech.) Lecture 3: Agenda Vector space models word-document

### Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. January 11, 2011

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 11, 2011 Today: What is machine learning? Decision tree learning Course logistics Readings: The Discipline

### A Review on Classification Techniques in Machine Learning

A Review on Classification Techniques in Machine Learning R. Vijaya Kumar Reddy 1, Dr. U. Ravi Babu 2 1 Research Scholar, Dept. of. CSE, Acharya Nagarjuna University, Guntur, (India) 2 Principal, DRK College

### CSC-272 Exam #2 March 20, 2015

CSC-272 Exam #2 March 20, 2015 Name Questions are weighted as indicated. Show your work and state your assumptions for partial credit consideration. Unless explicitly stated, there are NO intended errors

### CS 224N: Natural Language Processing Final Project Report

STANFORD UNIVERSITY CS 224N: Natural Language Processing Final Project Report Sander Parawira 6/5/2010 In this final project we built a Part of Speech Tagger using Hidden Markov Model. We determined the

### Multi-Class Sentiment Analysis with Clustering and Score Representation

Multi-Class Sentiment Analysis with Clustering and Score Representation Mohsen Farhadloo Erik Rolland mfarhadloo@ucmerced.edu 1 CONTENT Introduction Applications Related works Our approach Experimental

### Calibration of teachers scores

Calibration of teachers scores Bruce Brown & Anthony Kuk Department of Statistics & Applied Probability 1. Introduction. In the ranking of the teaching effectiveness of staff members through their student

### In-depth: Deep learning (one lecture) Applied to both SL and RL above Code examples

Introduction to machine learning (two lectures) Supervised learning Reinforcement learning (lab) In-depth: Deep learning (one lecture) Applied to both SL and RL above Code examples 2017-09-30 2 1 To enable

### Lecture 22: Introduction to Natural Language Processing (NLP)

Lecture 22: Introduction to Natural Language Processing (NLP) Traditional NLP Statistical approaches Statistical approaches used for processing Internet documents If we have time: hidden variables COMP-424,

### Jeff Howbert Introduction to Machine Learning Winter

Classification Ensemble e Methods 1 Jeff Howbert Introduction to Machine Learning Winter 2012 1 Ensemble methods Basic idea of ensemble methods: Combining predictions from competing models often gives

### Part-of-Speech Tagging & Sequence Labeling. Hongning Wang

Part-of-Speech Tagging & Sequence Labeling Hongning Wang CS@UVa What is POS tagging Tag Set NNP: proper noun CD: numeral JJ: adjective POS Tagger Raw Text Pierre Vinken, 61 years old, will join the board

### Government of Russian Federation. Federal State Autonomous Educational Institution of High Professional Education

Government of Russian Federation Federal State Autonomous Educational Institution of High Professional Education National Research University Higher School of Economics Syllabus for the course Advanced

### MT Summit IX, New Orleans, Sep , 2003 Panel Discussion HAVE WE FOUND THE HOLY GRAIL? Hermann Ney

MT Summit IX, New Orleans, Sep. 23-27, 2003 Panel Discussion HAVE WE FOUND THE HOLY GRAIL? Hermann Ney Human Language Technology and Pattern Recognition Lehrstuhl für Informatik VI Computer Science Department

### Ensemble Learning CS534

Ensemble Learning CS534 Ensemble Learning How to generate ensembles? There have been a wide range of methods developed We will study some popular approaches Bagging ( and Random Forest, a variant that

### Deep Dictionary Learning vs Deep Belief Network vs Stacked Autoencoder: An Empirical Analysis

Target Target Deep Dictionary Learning vs Deep Belief Network vs Stacked Autoencoder: An Empirical Analysis Vanika Singhal, Anupriya Gogna and Angshul Majumdar Indraprastha Institute of Information Technology,

### AP STATISTICS 2006 SCORING GUIDELINES (Form B) Question 2

2006 SCORING GUIDELINES (Form B) Question 2 Intent of Question The primary goals of this question are to evaluate a student s ability to: (1) identify and compute an appropriate confidence interval, after

### Machine Learning. Basic Concepts. Joakim Nivre. Machine Learning 1(24)

Machine Learning Basic Concepts Joakim Nivre Uppsala University and Växjö University, Sweden E-mail: nivre@msi.vxu.se Machine Learning 1(24) Machine Learning Idea: Synthesize computer programs by learning

### Dudon Wai Georgia Institute of Technology CS 7641: Machine Learning Atlanta, GA

Adult Income and Letter Recognition - Supervised Learning Report An objective look at classifier performance for predicting adult income and Letter Recognition Dudon Wai Georgia Institute of Technology

### MTH 547/647: Applied Regression Analysis. Fall 2017

MTH 547/647: Applied Regression Analysis Fall 2017 Instructor: Songfeng (Andy) Zheng Email: SongfengZheng@MissouriState.edu Phone: 417-836-6037 Room and Time: Cheek 173, 11:15am 12:05pm, MWF Office and

### Under the hood of Neural Machine Translation. Vincent Vandeghinste

Under the hood of Neural Machine Translation Vincent Vandeghinste Recipe for (data-driven) machine translation Ingredients 1 (or more) Parallel corpus 1 (or more) Trainable MT engine + Decoder Statistical

### INTRODUCTION TO DATA SCIENCE

DATA11001 INTRODUCTION TO DATA SCIENCE EPISODE 6: MACHINE LEARNING TODAY S MENU 1. WHAT IS ML? 2. CLASSIFICATION AND REGRESSSION 3. EVALUATING PERFORMANCE & OVERFITTING WHAT IS MACHINE LEARNING? Definition:

### A Few Useful Things to Know about Machine Learning. Pedro Domingos Department of Computer Science and Engineering University of Washington" 2012"

A Few Useful Things to Know about Machine Learning Pedro Domingos Department of Computer Science and Engineering University of Washington 2012 A Few Useful Things to Know about Machine Learning Machine

### COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection.

COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection. Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise

### Sentiment Analysis. wine_sentiment.r

Sentiment Analysis 39 wine_sentiment.r Dictionary Methods Count the usage of words from specified lists Example LWIC Tausczik and Pennebake (2010), The Psychological Meaning of Words, Journal of Language

### ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

### A brief tutorial on reinforcement learning: The game of Chung Toi

A brief tutorial on reinforcement learning: The game of Chung Toi Christopher J. Gatti 1, Jonathan D. Linton 2, and Mark J. Embrechts 1 1- Rensselaer Polytechnic Institute Department of Industrial and

### Adaptive Quality Estimation for Machine Translation

Adaptive Quality Estimation for Machine Translation Antonis Advisors: Yanis Maistros 1, Marco Turchi 2, Matteo Negri 2 1 School of Electrical and Computer Engineering, NTUA, Greece 2 Fondazione Bruno Kessler,

### Binary decision trees

Binary decision trees A binary decision tree ultimately boils down to taking a majority vote within each cell of a partition of the feature space (learned from the data) that looks something like this

### Inductive Learning and Decision Trees

Inductive Learning and Decision Trees Doug Downey EECS 349 Winter 2014 with slides from Pedro Domingos, Bryan Pardo Outline Announcements Homework #1 assigned Have you completed it? Inductive learning

### Generalizing Detection of Gaming the System Across a Tutoring Curriculum

Generalizing Detection of Gaming the System Across a Tutoring Curriculum Ryan S.J.d. Baker 1, Albert T. Corbett 2, Kenneth R. Koedinger 2, Ido Roll 2 1 Learning Sciences Research Institute, University

### Student Modeling Method Integrating Knowledge Tracing and IRT with Decay Effect

Student Modeling Method Integrating Knowledge Tracing and IRT with Decay Effect Shinichi Oeda 1 and Kouta Asai 2 1 Department of Information and Computer Engineering, National Institute of Technology,

### Improving Paragraph2Vec

000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

### Machine Learning B, Fall 2016

Machine Learning 10-601 B, Fall 2016 Decision Trees (Summary) Lecture 2, 08/31/ 2016 Maria-Florina (Nina) Balcan Learning Decision Trees. Supervised Classification. Useful Readings: Mitchell, Chapter 3

### ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning Topics: Classification: Naïve Bayes Readings: Barber 10.1-10.3 Stefan Lee Virginia Tech Administrativia HW2 Due: Friday 09/28, 10/3, 11:55pm Implement linear

### Supervised learning can be done by choosing the hypothesis that is most probable given the data: = arg max ) = arg max

The learning problem is called realizable if the hypothesis space contains the true function; otherwise it is unrealizable On the other hand, in the name of better generalization ability it may be sensible

### IAI : Machine Learning

IAI : Machine Learning John A. Bullinaria, 2005 1. What is Machine Learning? 2. The Need for Learning 3. Learning in Neural and Evolutionary Systems 4. Problems Facing Expert Systems 5. Learning in Rule

### Comparing Deep Learning and Conventional Machine Learning for Authorship Attribution and Text Generation

Comparing Deep Learning and Conventional Machine Learning for Authorship Attribution and Text Generation Gregory Luppescu Department of Electrical Engineering Stanford University gluppes@stanford.edu Francisco

### Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

### Naive Bayes Classifier Approach to Word Sense Disambiguation

Naive Bayes Classifier Approach to Word Sense Disambiguation Daniel Jurafsky and James H. Martin Chapter 20 Computational Lexical Semantics Sections 1 to 2 Seminar in Methodology and Statistics 3/June/2009

### Investigative Task Student Saturday Session

Student Notes: Prep Session Topic Strategies for Investigative Tasks The 90 minute free response section of the AP Statistics exam consists of five open ended problems and one investigative task. Students

### Machine Learning: Neural Networks. Junbeom Park Radiation Imaging Laboratory, Pusan National University

Machine Learning: Neural Networks Junbeom Park (pjb385@gmail.com) Radiation Imaging Laboratory, Pusan National University 1 Contents 1. Introduction 2. Machine Learning Definition and Types Supervised

### Introduction to Classification

Introduction to Classification Classification: Definition Given a collection of examples (training set ) Each example is represented by a set of features, sometimes called attributes Each example is to

### Pattern Classification and Clustering Spring 2006

Pattern Classification and Clustering Time: Spring 2006 Room: Instructor: Yingen Xiong Office: 621 McBryde Office Hours: Phone: 231-4212 Email: yxiong@cs.vt.edu URL: http://www.cs.vt.edu/~yxiong/pcc/ Detailed

### CS 6140: Machine Learning Spring 2017

CS 6140: Machine Learning Spring 2017 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Time and Loca@on

### Adaptive Behavior with Fixed Weights in RNN: An Overview

& Adaptive Behavior with Fixed Weights in RNN: An Overview Danil V. Prokhorov, Lee A. Feldkamp and Ivan Yu. Tyukin Ford Research Laboratory, Dearborn, MI 48121, U.S.A. Saint-Petersburg State Electrotechical

### Data Analysis: Eleventh Grade Algebra Tests. The Algebra Achievement test was intended to measure whether eleventh graders

Data Analysis: Eleventh Grade Algebra Tests The Algebra Achievement test was intended to measure whether eleventh graders in the Reform cohorts differed from eleventh graders in the Traditional cohort

### Scheduling Tasks under Constraints CS229 Final Project

Scheduling Tasks under Constraints CS229 Final Project Mike Yu myu3@stanford.edu Dennis Xu dennisx@stanford.edu Kevin Moody kmoody@stanford.edu Abstract The project is based on the principle of unconventional