# Linear Regression. Chapter Introduction

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Chapter 9 Linear Regression 9.1 Introduction In this class, we have looked at a variety of di erent models and learning methods, such as finite state machines, sequence models, and classification methods. However, in all of the cases that we have looked at so far, the unobserved variables we have been trying to predict have been finite and discrete. When we looked at Naive Bayes, we tried to predict if something was in a positive or a negative class. When we looked at Hidden Markov Models and Conditional Random Fields, we tried to figure out the part-ofspeech tags of individual words. These methods have been attempting to predict which class a particular latent variable belongs to. For Naive Bayes, we motivated out problem with a decision between two classes. Though that number is arbitrary and you could easily add a third class called neutral, or have a problem that naturally has dozens of classes. For our part-of-speech sequence tagging in English, we often choose from a set of 36 tags. The common thread is that in all of these methods, we are trying to choose a set number of classes. We now move on to look at what happens when we care about predicting a value that is not from a discrete set of possibilities, but rather, is continuous. For instance, if we are given an essay written in English, rather than predicting a letter grade, could we instead predict the percent score? Or, given a movie review, could we predict the average number of stars users rate it, rather than just saying if the review is positive or negative? 1

2 2 CHAPTER 9. LINEAR REGRESSION 9.2 Linear Regression Linear regression is a powerful tool used for predicting continuous variable outputs. In general, to predict an output that is continuous, the goal is to find a function that transforms an input into an output. Linear regression is simply the method that finds a solution to this problem by finding a linear function. Generally, given an input x, we are trying to find a function, f(x), that predicts an output y. This is the same way to formulate many of the learning methods that we have discussed in this class. We would like to find f(x) = y for some observed data x that predicts unknown data y. In methods discussed previously, like Naive Bayes, y is often a class, such as 0 or 1, and cannot take any other values. For linear regression, we still would like to find a function, but now y can take on a range of values Supervised Learning The majority of the models that we have discussed in this class are supervised, and linear regression is no di erent. Supervised learning models are simply methods that are trained using a data set where the values we are trying to predict (y) are known. After training on data where the values are known, the model tries to predict values for data that is unknown. In the classification tasks we have looked at (Naive Bayes, Logistic Regression, the Perceptron, and Topic Modeling) all of them are Supervised Learning methods except for Topic Modeling. In your homework, you have always been given supervised learning problems where the data contains a training file. You then report your predicted output compared to the the goldstandard labels in your testing file. Linear Regression is no di erent. Again, it is a method for predicting unseen values on a test set, having been given known values in a training set. The di erence is that the output values are continuous are our learned function is linear. 9.3 Linear Models Formally, we define Linear Regression as predicting a value ŷ 2 R from m di erent features x 2 R m. We would like to learn a function, f(x) =y that is in a linear form. Specifically, we are interested in finding: ŷ = 0 + x

3 9.3. LINEAR MODELS 3 Given training n training examples hx i,y i i for 1 apple i apple n, where each x has m features, our goal is to estimate the parameters h 0, i Parameter Estimation The goal in linear regression is to find a line that generalizes our data well. In other words, we would like to find h 0, i that minimize the error in our training set. We do this by minimizing the sum of the squared errors: 1 ˆ = arg min h 0, i2n = nx (y i ( 0 + x i )) 2 i=1 The intuition here is that we would like to minimize the di erence between y i and the value predicted by 0 + x i. In other words, x i is linearly transformed by and 0 to give a predicted value for y i. Across all n training examples, we want the sum of the di erence between our predicted and actual values to be at a minimum. We need to estimate the values of our. To do this, we take the cost function we just defined and apply the gradient descent algorithm to our problem. This requires taking a partial derivative with respect to each of our m number of s. After some algebraic manipulation we are left with the LMS or Least Mean Squares update rule. For more information on gradient descent and deriving the update rules, the text book Pattern Recognition and Machine Learning has some nice explanations beyond the scope of this lecture (Bishop 2009).

4 4 CHAPTER 9. LINEAR REGRESSION Features In our definition, we said that we would like to find y given m features. We must define a set number of features and the assumption in linear regression is that they are independent. Linear regression then finds a linear combination of these features that best predicts our training data. We have defined our general function to be: ŷ = 0 + x For the case where m = 1, or in other words, we only have one feature, our function is merely: ŷ = 0 + x 1 1 This is simply the function for a line y = mx + b where m = 1 and b = 0. Simply put we are mapping a single dimensional input to another dimension (x to y). We defined x 2 R m and for the case where m = 1, we are simply defining x to take a real value. For the slightly more di cult case where m = 2, we are now mapping values from a 2-dimensional space into another dimension. In other words, with 2 features for each data point, our function reduces to: ŷ = 0 + x x 2 2 This is also a line (hence the term linear regression). However, this is now in 3-dimensional space. In general, we are mapping from an m-dimensional space to a single dimensional, continuous value space. NLP Features In some form or another, many of you will be familiar with linear regression, even if that is from a high school science class where you fit a line to some data points in an experiment. However, most of you will be familiar with this problem from an application di erent from Natural Language Processing, particularly one where your x values are real numbers, in a continuous space themselves. To make use of linear regression in NLP, we must also make sure that our values are real numbers. That is why we defined x 2 R m. Mapping our features to real values is actually not a di cult problem, and many features we are interested in are already real numbers. For instance, getting computers to automatically grade essays is a topic of increasing interest, especially for standardized tests. We may hypothesize

6 6 CHAPTER 9. LINEAR REGRESSION or a movie review from 0-10, we could possibly get values that are negative or even above our range. That is just an issue we need to be aware of. As a simple post-processing step we can clip the value using min and max functions to make sure we stay in the appropriate ranges. However, in general, if our training data is representative of our testing data, it should only be an issue for a very small number of cases. Along these lines, even though linear regression predicts a real valued output, we can still use it in some classification tasks. If there is any ordinality among the classes, we can use linear regression and then map the output to a class. For instance, let s say that we have training essays that are given grades of A, B, C, and D. We can define A to be scores of , B as 80-89, C as 70-79, and D as We then take our training data and take the midpoints of these ranges, so any A essay would be 95. Assuming our data is distributed rather evenly, or that grades average to the middle of the range (the teacher thought the work was A material on average for A grades, not that all A s were just borderline B s), we can use this as our training data. We have mapped classes (grades) into numerical scores. We can then use linear regression to predict values for testing data, with a simple post-processing step of mapping the numerical score back to a grade letter. In general, you can do this with any sort of data that is ordinal, regardless of if it is a classification task, and depending on your problem, it may actually yield better results than some classification methods. 9.4 Overfitting In general, when we are trying to learn models, we need to worry about overfitting in our data. Recall that overfitting is when we make our model fit our training data so perfectly that it does not generalize well to our testing data. Most real world data sets are noisy. If our trained model fits the data too well, we have modeled our parameters as if there was no noise, so that our model will not fit our testing data as well. There are multiple di erent ways to deal with the overfitting problem and regularization is one very common method of doing so. When learning a model from data, we always have to be careful about overfitting, but the problem is particularly prevalent in Natural Language Processing. We have formally defined this problem such that we have m different features for our x values. If m gets too large relative to the n training examples we have, we will necessarily overfit our data as each parameter we learn for x will tend towards fitting just one of our n examples perfectly.

7 9.4. OVERFITTING 7 So, we should make sure to always choose m<<n. The reason the potential to overfit is so prevalent in Natural Language Processing is due to how we often choose features. For instance, in many of the methods that we have looked at so far, we choose a vocabulary size and treat each word as an independent feature. It is not uncommon to have tens of thousands of unique words in even a modest sized training corpus. If we assign each word a unique feature, our features (m) can easily outgrow the number of training examples (n). Regardless of any other ways we try to prevent overfitting, such as regularization, we must be aware of our feature set size and choose an appropriate feature set initially Regularization We have talked about regularization earlier in this class when we discussed Logistic Regression. Regularization attempts to prevent overfitting our training data, when learning a model, by adding an additional penalty term to our objective function. This added term acts orthogonally to the first term in our objective function which is modeled on the data. There are whole classes of regularizers, but in general, they aim to impose penalties on our parameters. Often these have the goal of driving as many of our parameters to 0 (or as close to it as possible) without degrading our performance on our training data. To implement regularization, when defining our objective function for parameter estimation, we include a regularization term. Generally, we give it a weight which is either given (often through trial and error) or tuned,

8 8 CHAPTER 9. LINEAR REGRESSION often with a held out set of data or cross validation. We choose a regularization term based on some desired properties. l 2 regularization is one of the most commonly used regularization methods. l 1 is also frequently chosen as a regularizer due to its property of driving many of the parameters to 0. l 2 Regularization l 2 regularization is simply the Euclidean distance between the origin to a point in our m dimensional space. The value of each of our parameters is squared, summed together, and finally the square root is taken. This can easily be interpreted as the distance metric commonly taught in grade school. Here s our objective function modified with the addition of an l 2 regularizer. 1 ˆ = arg min h 0, i2n = v nx uut X m (y i ( 0 + x i )) i=1 j=1 2 j l 1 Regularization l 1 regularization is the Taxicab or Manhattan distance. It is the distance if you can only move along a single axis at a time. Think of it as a taxi in Manhattan that must drive down an avenue and then down a street rather than going diagonal through a block. This regularizer has the nice property of making many of our parameters go to 0. In linear regression, we have made the assumption that our features are independent. One of the intuitions behind l 1 regularization is that if two features are actually dependent, one of them will be driven to zero. Again, here is our updated objective function with l 1 regularization. References 1 ˆ = arg min h 0, i2n = nx (y i ( 0 + x i )) 2 + i=1 mx j j=1 Bishop, Christopher M. (2009). Pattern Recognition and Machine Learning 8th edition. Springer Publishing.

### Midterm Exam Review Introduction to Machine Learning. Matt Gormley Lecture 14 March 6, 2017

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Midterm Exam Review Matt Gormley Lecture 14 March 6, 2017 1 Reminders Midterm Exam

### COMP 551 Applied Machine Learning Lecture 11: Ensemble learning

COMP 551 Applied Machine Learning Lecture 11: Ensemble learning Instructor: Herke van Hoof (herke.vanhoof@mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp551

### COMP 551 Applied Machine Learning Lecture 12: Ensemble learning

COMP 551 Applied Machine Learning Lecture 12: Ensemble learning Associate Instructor: Herke van Hoof (herke.vanhoof@mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551

### Regularization. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 20, 2018 Prof. Michael Paul

Regularization INFO-4604, Applied Machine Learning University of Colorado Boulder September 20, 2018 Prof. Michael Paul Generalization Prediction functions that work on the training data might not work

### Regularization. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 19, 2017 Prof. Michael Paul

Regularization INFO-4604, Applied Machine Learning University of Colorado Boulder September 19, 2017 Prof. Michael Paul Generalization Prediction functions that work on the training data might not work

### CS 760 Machine Learning Spring 2017

Page 1 University of Wisconsin Madison Department of Computer Sciences CS 760 Machine Learning Spring 2017 Final Examination Duration: 1 hour 15 minutes One set of handwritten notes and calculator allowed.

### Machine Learning and Applications in Finance

Machine Learning and Applications in Finance Christian Hesse 1,2,* 1 Autobahn Equity Europe, Global Markets Equity, Deutsche Bank AG, London, UK christian-a.hesse@db.com 2 Department of Computer Science,

### SUPERVISED LEARNING. We ve finished Part I: Problem Solving We ve finished Part II: Reasoning with uncertainty. Part III: (Machine) Learning

SUPERVISED LEARNING Progress Report We ve finished Part I: Problem Solving We ve finished Part II: Reasoning with uncertainty Part III: (Machine) Learning Supervised Learning Unsupervised Learning Overlaps

### Linear Models Continued: Perceptron & Logistic Regression

Linear Models Continued: Perceptron & Logistic Regression CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Linear Models for Classification Feature function

### Lecture 7: Distributed Representations

Lecture 7: Distributed Representations Roger Grosse 1 Introduction We ll take a break from derivatives and optimization, and look at a particular example of a neural net that we can train using backprop:

### ECE 6254 Statistical Machine Learning Spring 2017

ECE 6254 Statistical Machine Learning Spring 2017 Mark A. Davenport Georgia Institute of Technology School of Electrical and Computer Engineering Statistical machine learning How can we learn effective

### Session 1: Gesture Recognition & Machine Learning Fundamentals

IAP Gesture Recognition Workshop Session 1: Gesture Recognition & Machine Learning Fundamentals Nicholas Gillian Responsive Environments, MIT Media Lab Tuesday 8th January, 2013 My Research My Research

### Practical Advice for Building Machine Learning Applications

Practical Advice for Building Machine Learning Applications Machine Learning Fall 2017 Based on lectures and papers by Andrew Ng, Pedro Domingos, Tom Mitchell and others 1 This lecture: ML and the world

### Key Ideas in Machine Learning

CHAPTER 14 Key Ideas in Machine Learning Machine Learning Copyright c 2017. Tom M. Mitchell. All rights reserved. *DRAFT OF December 4, 2017* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This

### The Fundamentals of Machine Learning

The Fundamentals of Machine Learning Willie Brink 1, Nyalleng Moorosi 2 1 Stellenbosch University, South Africa 2 Council for Scientific and Industrial Research, South Africa Deep Learning Indaba 2017

### Bootstrapping Dialog Systems with Word Embeddings

Bootstrapping Dialog Systems with Word Embeddings Gabriel Forgues, Joelle Pineau School of Computer Science McGill University {gforgu, jpineau}@cs.mcgill.ca Jean-Marie Larchevêque, Réal Tremblay Nuance

### Lecture 3: Transcripts - Basic Concepts (1) and Decision Trees (1)

Lecture 3: Transcripts - Basic Concepts (1) and Decision Trees (1) Basic concepts 1. Welcome to Lecture 3. We will start Lecture 3 by introducing some basic notions and basic terminology. 2. These are

### Machine Learning: Summary

Machine Learning: Summary Greg Grudic CSCI-4830 Machine Learning 1 What is Machine Learning? The goal of machine learning is to build computer systems that can adapt and learn from their experience. Tom

### CS534 Machine Learning

CS534 Machine Learning Spring 2013 Lecture 1: Introduction to ML Course logistics Reading: The discipline of Machine learning by Tom Mitchell Course Information Instructor: Dr. Xiaoli Fern Kec 3073, xfern@eecs.oregonstate.edu

### Big Data Analytics Clustering and Classification

E6893 Big Data Analytics Lecture 4: Big Data Analytics Clustering and Classification Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science September 28th, 2017 1

### Predictive Analysis of Text: Concepts, Instances, and Classifiers. Heejun Kim

Predictive Analysis of Text: Concepts, Instances, and Classifiers Heejun Kim May 29, 2018 Predictive Analysis of Text Objective: developing computer programs that automatically predict a particular concept

### Classification of News Articles Using Named Entities with Named Entity Recognition by Neural Network

Classification of News Articles Using Named Entities with Named Entity Recognition by Neural Network Nick Latourette and Hugh Cunningham 1. Introduction Our paper investigates the use of named entities

### The Generalized Delta Rule and Practical Considerations

The Generalized Delta Rule and Practical Considerations Introduction to Neural Networks : Lecture 6 John A. Bullinaria, 2004 1. Training a Single Layer Feed-forward Network 2. Deriving the Generalized

### DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 10 2019 Class Outline Introduction 1 week Probability and linear algebra review Supervised

### Machine Learning 2nd Edition

INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

### Stay Alert!: Creating a Classifier to Predict Driver Alertness in Real-time

Stay Alert!: Creating a Classifier to Predict Driver Alertness in Real-time Aditya Sarkar, Julien Kawawa-Beaudan, Quentin Perrot Friday, December 11, 2014 1 Problem Definition Driving while drowsy inevitably

### TTIC 31190: Natural Language Processing

TTIC 31190: Natural Language Processing Kevin Gimpel Winter 2016 Lecture 10: Neural Networks for NLP 1 Announcements Assignment 2 due Friday project proposal due Tuesday, Feb. 16 midterm on Thursday, Feb.

### Lecture 1: Machine Learning Basics

1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

### Lecture 2 Fundamentals of machine learning

Lecture 2 Fundamentals of machine learning Topics of this lecture Formulation of machine learning Taxonomy of learning algorithms Supervised, semi-supervised, and unsupervised learning Parametric and non-parametric

### Sentiment Analysis of Yelp s Ratings Based on Text Reviews

Sentiment Analysis of Yelp s Ratings Based on Text Reviews Yun Xu, Xinhui Wu, Qinxia Wang Stanford University I. Introduction A. Background Yelp has been one of the most popular sites for users to rate

### P(A, B) = P(A B) = P(A) + P(B) - P(A B)

AND Probability P(A, B) = P(A B) = P(A) + P(B) - P(A B) P(A B) = P(A) + P(B) - P(A B) Area = Probability of Event AND Probability P(A, B) = P(A B) = P(A) + P(B) - P(A B) If, and only if, A and B are independent,

### Security Analytics Review for Final Exam. Purdue University Prof. Ninghui Li

Security Analytics Review for Final Exam Purdue University Prof. Ninghui Li Exam Date/Time Monday Dec 10 (8am 10am) LWSN B134 Organization of the Course Basic machine learning algorithms Neural networks

### Supervised Learning with Neural Networks and Machine Translation with LSTMs

Supervised Learning with Neural Networks and Machine Translation with LSTMs Ilya Sutskever in collaboration with: Minh-Thang Luong Quoc Le Oriol Vinyals Wojciech Zaremba Google Brain Deep Neural

### Neural Networks. Robert Platt Northeastern University. Some images and slides are used from: 1. CS188 UC Berkeley

Neural Networks Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley Problem we want to solve The essence of machine learning: A pattern exists We cannot pin

### Structured Output Prediction

Structured Output Prediction CS4780/5780 Machine Learning Fall 2011 Thorsten Joachims Cornell University Reading: T. Joachims, T. Hofmann, Yisong Yue, Chun-Nam Yu, Predicting Structured Objects with Support

### Fall 2015 COMPUTER SCIENCES DEPARTMENT UNIVERSITY OF WISCONSIN MADISON PH.D. QUALIFYING EXAMINATION

Fall 2015 COMPUTER SCIENCES DEPARTMENT UNIVERSITY OF WISCONSIN MADISON PH.D. QUALIFYING EXAMINATION Artificial Intelligence Monday, September 21, 2015 GENERAL INSTRUCTIONS 1. This exam has 10 numbered

### Machine Learning for Computer Vision

Prof. Daniel Cremers Machine Learning for Computer PD Dr. Rudolph Triebel Lecturers PD Dr. Rudolph Triebel rudolph.triebel@in.tum.de Room number 02.09.058 (Fridays) Main lecture MSc. Ioannis John Chiotellis

### Machine Learning ICS 273A. Instructor: Max Welling

Machine Learning ICS 273A Instructor: Max Welling Class Homework What is Expected? Required, (answers will be provided) A Project See webpage Quizzes A quiz every Friday Bring scantron form (buy in UCI

### Outline. Little green men INTRODUCTION TO STATISTICAL MACHINE LEARNING. Representing things in Machine Learning 10/22/2010

Outline INTRODUCTION TO STATISTICAL MACHINE LEARNING Representing things Feature vector Training sample Unsupervised learning Clustering Supervised learning Classification Regression Xiaojin Zhu jerryzhu@cs.wisc.edu

### 10701/15781 Machine Learning, Spring 2005: Homework 1

10701/15781 Machine Learning, Spring 2005: Homework 1 Due: Monday, February 6, beginning of the class 1 [15 Points] Probability and Regression [Stano] 1 1.1 [10 Points] The Matrix Strikes Back The Matrix

### Detection of Insults in Social Commentary

Detection of Insults in Social Commentary CS 229: Machine Learning Kevin Heh December 13, 2013 1. Introduction The abundance of public discussion spaces on the Internet has in many ways changed how we

### Lecture 1. Introduction Bastian Leibe Visual Computing Institute RWTH Aachen University

Advanced Machine Learning Lecture 1 Introduction 20.10.2015 Bastian Leibe Visual Computing Institute RWTH Aachen University http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de Organization Lecturer

### INTRODUCTION TO MACHINE LEARNING SOME CONTENT COURTESY OF PROFESSOR ANDREW NG OF STANFORD UNIVERSITY

INTRODUCTION TO MACHINE LEARNING SOME CONTENT COURTESY OF PROFESSOR ANDREW NG OF STANFORD UNIVERSITY IQS2: Spring 2013 Machine Learning Definition 2 Arthur Samuel (1959). Machine Learning: Field of study

### Learning of Open-Loop Neural Networks with Improved Error Backpropagation Methods

Learning of Open-Loop Neural Networks with Improved Error Backpropagation Methods J. Pihler Abstract The paper describes learning of artificial neural networks with improved error backpropagation methods.

### Linear Regression: Predicting House Prices

Linear Regression: Predicting House Prices I am big fan of Kalid Azad writings. He has a knack of explaining hard mathematical concepts like Calculus in simple words and helps the readers to get the intuition

### MACHINE LEARNING. Slide adapted from learning from data book and course, and Berkeley cs188 by Dan Klein, and Pieter Abbeel

MACHINE LEARNING Slide adapted from learning from data book and course, and Berkeley cs188 by Dan Klein, and Pieter Abbeel Machine Learning?? Learning from data Tasks: Prediction Classification Recognition

### Introduction to Machine Learning NPFL 054

Introduction to Machine Learning NPFL 054 http://ufal.mff.cuni.cz/course/npfl054 Barbora Hladká hladka@ufal.mff.cuni.cz Martin Holub holub@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and

### Incorporating Semantic Information into Image Classifiers

Incorporating Semantic Information into Image Classifiers Osbert Bastani and Hamsa Sridhar Advised by Richard Socher December 14, 2012 1 Introduction In this project, we are investigating the incorporation

### CS 6375 Advanced Machine Learning (Qualifying Exam Section) Nicholas Ruozzi University of Texas at Dallas

CS 6375 Advanced Machine Learning (Qualifying Exam Section) Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office:

### CSE 258 Lecture 3. Web Mining and Recommender Systems. Supervised learning Classification

CSE 258 Lecture 3 Web Mining and Recommender Systems Supervised learning Classification Last week Last week we started looking at supervised learning problems Last week We studied linear regression, in

### Parallel & Scalable Machine Learning Introduction to Machine Learning Algorithms

Parallel & Scalable Machine Learning Introduction to Machine Learning Algorithms Dr. Ing. Morris Riedel Adjunct Associated Professor School of Engineering and Natural Sciences, University of Iceland Research

### Session 4. Case Study of Modern Approach to Lapse Rate Assumption

SOA Predictive Analytics Seminar Taiwan 31 Aug. 2018 Taipei, Taiwan Session 4 Case Study of Modern Approach to Lapse Rate Assumption Richard Liao, ASA Stanley Hsieh Case Study of Modern Approach to Lapse

### Big Data. Making sense of signals (RGB-D video): Hand Tracking from MSR Cambridge

Big Data DD2434 Machine Learning, Advanced Course Lecture 1: Introduction Hedvig Kjellström hedvig@kth.se https://www.kth.se/social/course/dd2434/ Making sense of signals (RGB-D video): Hand Tracking from

### Introduction to Machine Learning Stephen Scott, Dept of CSE

Introduction to Machine Learning Stephen Scott, Dept of CSE What is Machine Learning? Building machines that automatically learn from experience Sub-area of artificial intelligence (Very) small sampling

### STA 414/2104 Statistical Methods for Machine Learning and Data Mining

STA 414/2104 Statistical Methods for Machine Learning and Data Mining Radford M. Neal, University of Toronto, 2014 Week 1 What are Machine Learning and Data Mining? Typical Machine Learning and Data Mining

### Course Overview and Introduction CE-717 : Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Course Overview and Introduction CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Course Info Instructor: Mahdieh Soleymani Email: soleymani@sharif.edu Lectures: Sun-Tue

### Registration Hw1 is due tomorrow night Hw2 will be out tomorrow night. Please start working on it as soon as possible Come to sections with questions

Administration Registration Hw1 is due tomorrow night Hw2 will be out tomorrow night. Please start working on it as soon as possible Come to sections with questions No lectures net Week!! Please watch

### n Learning is useful as a system construction method n Examples of systems that employ ML? q Supervised learning: correct answers for each example

Learning Learning from Data Russell and Norvig Chapter 18 Essential for agents working in unknown environments Learning is useful as a system construction method q Expose the agent to reality rather than

### Deep Learning for Natural Language Processing

Deep Learning for Natural Language Processing An Introduction Roee Aharoni Bar-Ilan University NLP Lab Berlin PyData Meetup, 10.8.16 Motivation # of mentions in paper titles at top-tier annual NLP conferences

### PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:

### CogSci 109: Lecture 23. Mon Dec 2, 2007 Multilayer artificial neural networks, examples, and applications (II)

CogSci 109: Lecture 23 Mon Dec 2, 2007 Multilayer artificial neural networks, examples, and applications (II) Outline for today Announcements Homework announcement Instead of a threshold, we can consider

### Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

### Lecture 1: Introduction to Machine Learning

Statistical Methods for Intelligent Information Processing (SMIIP) Lecture 1: Introduction to Machine Learning Shuigeng Zhou School of Computer Science September 13, 2017 What is machine learning? Machine

### Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science

E6893 Big Data Analytics Lecture 4: Big Data Analytics Algorithms II Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science September 27th, 2018 1 A schematic view

### CSE 258 Lecture 7. Web Mining and Recommender Systems. Recommender Systems

CSE 258 Lecture 7 Web Mining and Recommender Systems Recommender Systems Announcements Assignment 1 is out It will be due in week 8 on Monday before class HW3 will help you set up an initial solution HW1

### CSEP 573 Final Exam March 12, 2016

CSEP 573 Final Exam March 12, 2016 Name: This exam is take home and is due on Sunday March 20th at 11:45 pm. You can submit it in the online DropBox or to the course staff. This exam should not take significantly

### University of Wisconsin-Madison Computer Sciences Department. CS 760 Machine Learning. Fall Midterm Exam. (one page of notes allowed)

University of Wisconsin-Madison Computer Sciences Department CS 760 Machine Learning Fall 1997 Midterm Exam (one page of notes allowed) 100 points, 90 minutes December 3, 1997 Write your answers on these

### Lecture 12. Ensemble methods. Interim Revision

Lecture 12. Ensemble methods. Interim Revision COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne Ensemble methods This lecture Bagging and

### ID2223 Lecture 2: Distributed ML and Linear Regression

ID2223 Lecture 2: Distributed ML and Linear Regression Terminology Observations. Entities used for learning/evaluation Features. Attributes (typically numeric) used to represent an observation Labels.

### An Introduction to Machine Learning

MindLAB Research Group - Universidad Nacional de Colombia Introducción a los Sistemas Inteligentes Outline 1 2 What s machine learning History Supervised learning Non-supervised learning 3 Observation

### Extending WordNet using Generalized Automated Relationship Induction

Extending WordNet using Generalized Automated Relationship Induction Lawrence McAfee lcmcafee@stanford.edu Nuwan I. Senaratna nuwans@cs.stanford.edu Todd Sullivan tsullivn@stanford.edu This paper describes

### Sigmoid function is a) Linear B) non linear C) piecewise linear D) combination of linear & non linear

1. Neural networks are also referred to as (multiple answers) A) Neurocomputers B) connectionist networks C) parallel distributed processors D) ANNs 2. The property that permits developing nervous system

### Midterm practice questions

Midterm practice questions UMass CS 585 October 2017 1 Topics on the midterm Language concepts Parts of speech Regular expressions, text normalization Probability / machine learning Probability theory:

### Lecture 7: More on Learning Theory. Introduction to Active Learning

Lecture 7: More on Learning Theory. Introduction to Active Learning VC dimension Definition of PAC learning Motivation and examples for active learning Active learning scenarios Query heuristics With thanks

### CSE 255 Lecture 7. Data Mining and Predictive Analytics. Recommender Systems

CSE 255 Lecture 7 Data Mining and Predictive Analytics Recommender Systems Announcements Recommender systems are today (obviously) Assignment 1 will be out this week (I ll talk about it on Wednesday) It

### CSE 190 Lecture 8. Data Mining and Predictive Analytics. Recommender Systems

CSE 190 Lecture 8 Data Mining and Predictive Analytics Recommender Systems Why recommendation? The goal of recommender systems is To help people discover new content Why recommendation? The goal of recommender

### Learning Feature-based Semantics with Autoencoder

Wonhong Lee Minjong Chung wonhong@stanford.edu mjipeo@stanford.edu Abstract It is essential to reduce the dimensionality of features, not only for computational efficiency, but also for extracting the

(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

### Exploration vs. Exploitation. CS 473: Artificial Intelligence Reinforcement Learning II. How to Explore? Exploration Functions

CS 473: Artificial Intelligence Reinforcement Learning II Exploration vs. Exploitation Dieter Fox / University of Washington [Most slides were taken from Dan Klein and Pieter Abbeel / CS188 Intro to AI

### CS545 Machine Learning

Machine learning and related fields CS545 Machine Learning Course Introduction Machine learning: the construction and study of systems that learn from data. Pattern recognition: the same field, different

### Ensemble Learning Model selection Statistical validation

Ensemble Learning Model selection Statistical validation Ensemble Learning Definition Ensemble learning is a process that uses a set of models, each of them obtained by applying a learning process to a

### Unsupervised Learning: Clustering

Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning

### CS446: Machine Learning Spring Problem Set 5

CS446: Machine Learning Spring 2017 Problem Set 5 Handed Out: March 30 th, 2017 Due: April 11 th, 2017 Feel free to talk to other members of the class in doing the homework. I am more concerned that you

### Ensemble Learning CS534

Ensemble Learning CS534 Ensemble Learning How to generate ensembles? There have been a wide range of methods developed We will study to popular approaches Bagging Boosting Both methods take a single (base)

### Introduction to Machine Learning

Introduction to Machine Learning CMSC 422 MARINE CARPUAT marine@cs.umd.edu What is this course about? Machine learning studies algorithms for learning to do stuff By finding (and exploiting) patterns in

### 3.1. Supervised Learning

This chapter discusses concepts that are relevant to the work presented in this thesis. The sections that follow discuss basic concepts about supervised machine learning and active learning. Section 3.1

### Bootstrapping. Giri Iyengar. April 11, Cornell University Giri Iyengar (Cornell Tech) Bootstrapping April 11, / 21

Bootstrapping Giri Iyengar Cornell University gi43@cornell.edu April 11, 2018 Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 1 / 21 Overview 1 Bias-Variance trade-off and Cross Validation 2 Bootstrapping

### Ling/CSE 472: Introduction to Computational Linguistics. 4/11/17 Evaluation

Ling/CSE 472: Introduction to Computational Linguistics 4/11/17 Evaluation Overview Why do evaluation? Basic design consideration Data for evaluation Metrics for evaluation Precision and Recall BLEU score

### Machine Learning for Computer Vision

Prof. Daniel Cremers Machine Learning for Computer PD Dr. Rudolph Triebel Lecturers PD Dr. Rudolph Triebel rudolph.triebel@in.tum.de Room number 02.09.059 (Fridays) Main lecture MSc. Ioannis John Chiotellis

### Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

### Introduction to Computational Linguistics

Introduction to Computational Linguistics Olga Zamaraeva (2018) Based on Guestrin (2013) University of Washington April 10, 2018 1 / 30 This and last lecture: bird s eye view Next lecture: understand precision

### Programming Social Robots for Human Interaction. Lecture 4: Machine Learning and Pattern Recognition

Programming Social Robots for Human Interaction Lecture 4: Machine Learning and Pattern Recognition Zheng-Hua Tan Dept. of Electronic Systems, Aalborg Univ., Denmark zt@es.aau.dk, http://kom.aau.dk/~zt

### Document Embeddings via Recurrent Language Models

Document Embeddings via Recurrent Language Models Andrew Giel BS Computer Science agiel@cs.stanford.edu Ryan Diaz BS Computer Science ryandiaz@cs.stanford.edu Abstract Document embeddings serve to supply

### Mood Detection with Tweets

Mood Detection with Tweets Wen Zhang 1, Geng Zhao 2 and Chenye (Charlie) Zhu 3 1 Stanford University, zhangwen@stanford.edu 2 Stanford University, gengz@stanford.edu 3 Stanford University, chenye@stanford.edu

### Introduction to Machine Learning & Its Application in Healthcare Lecture 4 Oct 3, 2018 Presentation by: Leila Karimi

Introduction to Machine Learning & Its Application in Healthcare Lecture 4 Oct 3, 2018 Presentation by: Leila Karimi 1 What Is Machine Learning? A branch of artificial intelligence, concerned with the

### Predicting Success of Restaurants in Las Vegas

Predicting Success of Restaurants in Las Vegas Sang Goo Kang and Viet Vo Stanford University sanggookang@stanford.edu vtvo@stanford.edu Abstract Yelp has played a crucial role in influencing business success