# Introduction to Machine Learning for NLP I

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Introduction to Machine Learning for NLP I Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 1 / 49

2 Outline 1 This Course 2 Overview 3 Machine Learning Definition Data (Eperience) Tasks Performance Measures 4 Linear Regression: Overview and Cost Function 5 Summary Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 2 / 49

3 Course Overview Foundations of machine learning loss functions linear regression logistic regression gradient-based optimization neural networks and backpropagation Deep learning tools in Python Numpy Theano Keras (some) Tensorflow?, (some) Pytorch? Applications Word Embeddings Senitment Analysis Relation etraction (some) Machine Translation? Practical projects (NLP related, to be agreed on during the course) Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 3 / 49

4 Lecture Times, Tutorials Course homepage: dl-nlp.github.io 9-11 is supposed to be the lecture slot, and the tutorial slot but we will not stick to that allocation We will sometimes have longer Q&A-style/interactive tutorial sessions, sometimes more lectures (see net slide) Tutor: Simon Schäfer Will discuss eercise sheets in the tutorials Will help you with the projects Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 4 / 49

5 Plan 9-11 slot slot E. sheet 10/18 Overview / ML Intro I ML Intro I Linear algebra chapter 10/25 Linear algebra Q&A / ML II ML II Probability chapter 11/1 public holiday 11/8 Probability Q&A / ML III Numpy Numpy 11/15 ML IV/Theano Intro Convolution Theano I 9-11 slot slot E. sheet 11/22 Embeddings / CNNs & RNNs for NLP Numpy Q&A Read LSTM/RNN 11/29 LSTM (reading group) Theano I Q&A Theano II 12/6 Keras Keras Keras 12/13 DL for Relation Prediction Theano II Q&A Relation Prediction 12/20 Word Vectors Project Topics Project Assignments 9-11 slot slot E. sheet 1/10 Keras Q&A, Rel.Etr. Q&A Tensorflow 1/17 optimization methods/pytorch Help with projects 1/24 Other Work at CIS / LMU, Neural MT Help with projects 1/31 Project presentations presentations 2/7 Project presentations presentations Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 5 / 49

6 Formalities This class is graded by a project The grade of the project is determined taking the average of: Grade of the code written for the project. Grade of project documentation / mini-report. Grade of presentation about your project. You have to pass all three elements in order to pass the course. Bonus points: The grade can be improved by 0.5 absolute grades through the eercise sheets before New Year. Formula: g project = g project-code + g project-report + g project-presentation 3 g final = round(g project 0.5 ) where is the fraction of points reached in the eercises (between 0 and 1), and round selects the closest value of 1; 1.3; 1.7; 2; 3.7; 4 Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 6 / 49

7 Eercise sheets, Projects, Presentations 6 ECTS, 14 weeks avg work load 13hrs / week (3 in class, 10 at home) in the first weeks, spend enough time to read and prepare so that you are not lost later from mid-november to mid-december: programming assignments - coding takes time, and can be frustating (but rewarding)! Eercise sheets Work on non-programming eercise sheets individually For eercise sheets that contain programming parts, submit in teams of 2 or 3 Projects A list of topics will be proposed by me: Implement a deep learning technique applied to information etaction (or other NLP task) Own ideas also possible, need to be discussed with me Work in groups of two or three Project report: 3 pages / team member Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 7 / 49

8 Good project code shows that you master the techniques taught in the lectures and eercises.... shows that you can make own decisions : e.g. adapt model / task / training data etc if necessary.... is well-structured and easy to understand (telling variable names, meaningful modularization avoid: code duplication, dead code)... is correct (especially: train/dev/test splits, evaluation)... is within the scope of this lecture (time-wise should not eceed 5 10h) Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 8 / 49

9 A good project presentation is short (10 min. p.p min. Q&A per team)... similar to the report, contains the problem statement, motivation, model, and results... is targeted to your fellow students, who do not know details beforehand... contains interesting stuff: unepected observations? conclusions / recommendations? did you deviate from some common practice?... demonstrates that all team members worked together on the project Possible outline Background / Motivation Formal characterization of techniques used Technical Approach and Difficulties Eperiments, Results and Interpretation Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 9 / 49

10 A good project report is concise (3 pages / person) and clear... motivates and describes the model that you have implemented and the results that you have obtained... shows that you can correctly describe the concepts taught in this class... contains interesting stuff: unepected observations? conclusions / recommendations? did you deviate from some common practice? Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 10 / 49

11 Outline 1 This Course 2 Overview 3 Machine Learning Definition Data (Eperience) Tasks Performance Measures 4 Linear Regression: Overview and Cost Function 5 Summary Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 11 / 49

12 Machine Learning Machine learning for natural language processing Why? Advantages and disadvantages to alternatives? Accuracy; Coverage; resources required (data, epertise, human labour); Reliability/Robustness; Eplainability P NP VP VP V NP NP Det NN Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 12 / 49

13 Deep Learning Learn comple functions, that are (recursively) composed of simpler functions. Many parameters have to be estimated. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 13 / 49

14 Deep Learning Main Advantage: Feature learning Models learn to capture most essential properties of data (according to some performance measure) as intermediate representations. No need to hand-craft feature etraction algorithms Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 14 / 49

15 Neural Networks First training methods for deep nonlinear NNs appeared in the 1960s (Ivakhnenko and others). Increasing interest in NN technology (again) since around 5 years ago ( Neural Network Renaissance ): Orders of magnitude more data and faster computers now. Many successes: Image recognition and captioning Speech regonition NLP and Machine translation (demo of Bahdanau / Cho / Bengio system) Game playing (AlphaGO)... Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 15 / 49

16 Machine Learning Deep Learning builds on general Machine Learning concepts argmin θ H m i=1 Fitting data vs. generalizing from data L(f ( i ; θ), y i ) prediction prediction prediction feature feature feature Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 16 / 49

17 Outline 1 This Course 2 Overview 3 Machine Learning Definition Data (Eperience) Tasks Performance Measures 4 Linear Regression: Overview and Cost Function 5 Summary Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 17 / 49

18 A Definition A computer program is said to learn from eperience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with eperience E. (Mitchell 1997) Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 18 / 49

19 A Definition A computer program is said to learn from eperience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with eperience E. (Mitchell 1997) Learning: Attaining the ability to perform a task. A set of eamples ( eperience ) represents a more general task. Eamples are described by features: sets of numerical properties that can be represented as vectors R n. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 19 / 49

20 Outline 1 This Course 2 Overview 3 Machine Learning Definition Data (Eperience) Tasks Performance Measures 4 Linear Regression: Overview and Cost Function 5 Summary Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 20 / 49

21 Data A computer program is said to learn from eperience E [...], if its performance [...] improves with eperience E. Dataset: collection of eamples Design matri X R n m n: number of eamples m: number of features Eample: Xi,j count of feature j (e.g. a stem form) in document i. Unsupervised learning: Model X, or find interesting properties of X. Training data: only X. Supervised learning: Predict specific additional properties from X. Training data: Label vector y R n together with X Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 21 / 49

22 Data Low training error does not mean good generalization. Algorithm may overfit. prediction feature prediction feature Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 22 / 49

23 Data Splits Best Practice: Split data into training, cross-validation and test set. ( Cross-validation set = development set ). Optimize low-level parameters (feature weights...) on training set. Select models and hyper-parameters on cross-validation set. (type of machine learning model, number of features, regularization, priors). It is possible to overfit both in the training as well as in the model selection stage! Report final score on test set only after model has been selected! Don t report the error on training or cross-validation set as your model performance! Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 23 / 49

24 Outline 1 This Course 2 Overview 3 Machine Learning Definition Data (Eperience) Tasks Performance Measures 4 Linear Regression: Overview and Cost Function 5 Summary Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 24 / 49

25 Machine Learning Tasks A computer program is said to learn [...] with respect to some class of tasks T [...] if its performance at tasks in T [...] improves [...] Types of Tasks: Classification Regression Structured Prediction Anomaly Detection synthesis and sampling Imputation of missing values Denoising Clustering Reinforcement learning... Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 25 / 49

26 Machine Learning Tasks: Typical Eamples & Eamples from Recent NLP Reserch What are the most important conferences relevant to the intersection of ML and NLP? Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 26 / 49

27 Task: Classification Which of k classes does an eample belong to? f : R n {1... k} Typical eample: Categorize image patches Feature vector: color intensities for each piel; derived features. Output categories: Predefined set of labels Typical eample: Spam Classification Feature vector: High-dimensional, sparse vector. Each dimension indicates occurrence of a particular word, or other -specific information. Output categories: spam vs. ham Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 27 / 49

28 Task: Classification EMNLP 2017: Given a person name in a sentence that contains keywords related to police ( officer, police...) and to killing ( killed, shot ), was the person a civilian killed by police? Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 28 / 49

29 Task: Regression Predict a numerical value given some input. f : R n R Typical eamples: Predict the risk of an insurance customer. Predict the value of a stock. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 29 / 49

30 Task: Regression ACL 2017: Given a response in a multi-turn dialogue, predict the value (on a scale from 1 to 5) how natural a response is. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 30 / 49

31 Often involves search and problem-specific algorithms. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 31 / 49 Task: Structured Prediction Predict a multi-valued output with special inter-dependencies and constraints. Typical eamples: Part-of-speech tagging Syntactic parsing Protein-folding

32 Task: Structured Prediction ACL 2017: jointly find all relations relations of interest in a sentence by tagging arguments and combining them. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 32 / 49

33 Task: Reinforcement Learning In reinforcement learning, the model (also called agent) needs to select a serious of actions, but only observes the outcome (reward) at the end. The goal is to predict actions that will maimize the outcome. EMNLP 2017: The computer negotiates with humans in natural language in order to maimize its points in a game. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 33 / 49

34 Task: Anomaly Detection Detect atypical items or events. Common approach: Estimate density and identify items that have low probability. Eamples: Quality assurance Detection of criminal activity Often items categorized as outliers are sent to humans for further scrutiny. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 34 / 49

35 Task: Anomaly Detection ACL 2017: Schizophrenia patients can be detected by their non-standard use of mataphors, and more etreme sentiment epressions. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 35 / 49

36 Supervised and Unsupervised Learning Unsupervised learning: Learn interesting properties, such as probability distribution p() Supervised learning: learn mapping from to y, typically by estimating p(y ) Supervised learning in an unsupervised way: p(y ) = p(, y) y p(, y ) Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 36 / 49

37 Outline 1 This Course 2 Overview 3 Machine Learning Definition Data (Eperience) Tasks Performance Measures 4 Linear Regression: Overview and Cost Function 5 Summary Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 37 / 49

38 Performance Measures A computer program is said to learn [...] with respect to some [...] performance measure P, if its performance [...] as measured by P, improves [...] Quantitative measure of algorithm performance. Task-specific. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 38 / 49

39 Discrete Loss Functions Can be used to measure classification performance. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 39 / 49

40 Discrete Loss Functions Can be used to measure classification performance. Not applicable to measure density estimation or regression performance. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 39 / 49

41 Discrete Loss Functions Can be used to measure classification performance. Not applicable to measure density estimation or regression performance. Accuracy Proportion of eamples for which model produces correct output. 0-1 loss = error rate = 1 - accuracy. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 39 / 49

42 Discrete Loss Functions Can be used to measure classification performance. Not applicable to measure density estimation or regression performance. Accuracy Proportion of eamples for which model produces correct output. 0-1 loss = error rate = 1 - accuracy. Accuracy may be inappropriate for skewed label distributions, where relevant category is rare F1-score = 2 Prec Rec Prec + Rec Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 39 / 49

43 Discrete vs. Continuous Loss Functions Discrete loss functions cannot indicate how wrong a wrong decision for one eample is. Continuous loss functions are more widely applicable.... are often easier to optimize (differentiable).... can also be applied to discrete tasks (classification). Sometimes algorithms are optimized using one loss (e.g. Hinge loss) and evaluated using another loss (e.g. F1-Score). Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 40 / 49

44 Eamples for Continuous Loss Functions Density estimation: log probability of eample Regression: squared error Classification: Loss L(y i f ( i )) is function of label prediction label { 1, 1}, prediction R Correct prediction: y i f ( i ) > 0 Wrong prediction: y i f ( i ) <= 0 zero-one loss, Hinge-loss, logistic loss... Loss on data set is sum of per-eample losses. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 41 / 49

45 Outline 1 This Course 2 Overview 3 Machine Learning Definition Data (Eperience) Tasks Performance Measures 4 Linear Regression: Overview and Cost Function 5 Summary Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 42 / 49

46 Linear Regression For one instance: Input: vector R n Output: scalar y R (actual output: y; predicted output: ŷ) Linear function ŷ = w T = n w j j Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 43 / 49 j=1

47 Linear Regression Linear function: ŷ = w T = n w j j Parameter vector w R n Weight w j decides if value of feature j increases or decreases prediction ŷ. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 44 / 49 j=1

48 Linear Regression For the whole data set: Use matri X and vector y to stack instances on top of each other. Typically first column contains all 1 for the intercept (bias, shift) term n y n X = y = y 2. 1 m2 m3... mn y m For entire data set, predictions are stacked on top of each other: ŷ = Xw Estimate parameters using X (train) and y (train). Make high-level decisions (which features...) using X (dev) and y (dev). Evaluate resulting model using X (test) and y (test). Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 45 / 49

49 Simple Eample: Housing Prices Predict Munich property prices (in 1K Euros) from just one feature: Square meters of property X = y = Prediction is: w w [ ] ŷ = w w 2 = w1 = Xw w w w w 1 will contain costs incurred in any property acquisition w 2 will contain remaining average price per square meter. Optimal parameters are for the above case: [ ] w = ŷ = Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 46 / 49

50 Linear Regression: Mean Squared Error Mean squared error of training (or test) data set is the sum of squared differences between the predictions and labels of all m instances. MSE (train) = 1 m m i=1 (ŷ (train) i y (train) i ) 2 In matri notation: MSE (train) = 1 m ŷ(train) y (train) ) 2 2 = 1 m X(train) w y (train) ) 2 2 Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 47 / 49

51 Outline 1 This Course 2 Overview 3 Machine Learning Definition Data (Eperience) Tasks Performance Measures 4 Linear Regression: Overview and Cost Function 5 Summary Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 48 / 49

52 Summary Deep Learning many successes in recent years feature learning instead of feature engineering builds on general machine learning concepts Machine learning definition Data Task Cost function Machine tasks Classification Regression... Linear regression Output depends linearly on input Cost function: Mean squared error Net up: estimating the parameters Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 49 / 49

### Lecture 1: Machine Learning Basics

1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

### Slides credited from Richard Socher

Slides credited from Richard Socher Sequence Modeling Idea: aggregate the meaning from all words into a vector Compositionality Method: Basic combination: average, sum Neural combination: Recursive neural

### CSC321 Lecture 1: Introduction

CSC321 Lecture 1: Introduction Roger Grosse Roger Grosse CSC321 Lecture 1: Introduction 1 / 26 What is machine learning? For many problems, it s difficult to program the correct behavior by hand recognizing

### CS519: Deep Learning. Winter Fuxin Li

CS519: Deep Learning Winter 2017 Fuxin Li Course Information Instructor: Dr. Fuxin Li KEC 2077, lif@eecs.oregonstate.edu TA: Mingbo Ma: mam@oregonstate.edu Xu Xu: xux@oregonstate.edu My office hour: TBD

### Word Sense Determination from Wikipedia. Data Using a Neural Net

1 Word Sense Determination from Wikipedia Data Using a Neural Net CS 297 Report Presented to Dr. Chris Pollett Department of Computer Science San Jose State University By Qiao Liu May 2017 Word Sense Determination

### Unsupervised Learning: Clustering

Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning

### CS534 Machine Learning

CS534 Machine Learning Spring 2013 Lecture 1: Introduction to ML Course logistics Reading: The discipline of Machine learning by Tom Mitchell Course Information Instructor: Dr. Xiaoli Fern Kec 3073, xfern@eecs.oregonstate.edu

### INTRODUCTION TO DATA SCIENCE

DATA11001 INTRODUCTION TO DATA SCIENCE EPISODE 6: MACHINE LEARNING TODAY S MENU 1. WHAT IS ML? 2. CLASSIFICATION AND REGRESSSION 3. EVALUATING PERFORMANCE & OVERFITTING WHAT IS MACHINE LEARNING? Definition:

### Computer Vision for Card Games

Computer Vision for Card Games Matias Castillo matiasct@stanford.edu Benjamin Goeing bgoeing@stanford.edu Jesper Westell jesperw@stanford.edu Abstract For this project, we designed a computer vision program

### Linear Models Continued: Perceptron & Logistic Regression

Linear Models Continued: Perceptron & Logistic Regression CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Linear Models for Classification Feature function

### Load Forecasting with Artificial Intelligence on Big Data

1 Load Forecasting with Artificial Intelligence on Big Data October 9, 2016 Patrick GLAUNER and Radu STATE SnT - Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg 2

### Machine Learning for SAS Programmers

Machine Learning for SAS Programmers The Agenda Introduction of Machine Learning Supervised and Unsupervised Machine Learning Deep Neural Network Machine Learning implementation Questions and Discussion

### CS545 Machine Learning

Machine learning and related fields CS545 Machine Learning Course Introduction Machine learning: the construction and study of systems that learn from data. Pattern recognition: the same field, different

### CS 510: Lecture 8. Deep Learning, Fairness, and Bias

CS 510: Lecture 8 Deep Learning, Fairness, and Bias Next Week All Presentations, all the time Upload your presentation before class if using slides Sign up for a timeslot google doc, if you haven t already

### Python Machine Learning

Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

### Programming Social Robots for Human Interaction. Lecture 4: Machine Learning and Pattern Recognition

Programming Social Robots for Human Interaction Lecture 4: Machine Learning and Pattern Recognition Zheng-Hua Tan Dept. of Electronic Systems, Aalborg Univ., Denmark zt@es.aau.dk, http://kom.aau.dk/~zt

(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

### COMP 551 Applied Machine Learning Lecture 11: Ensemble learning

COMP 551 Applied Machine Learning Lecture 11: Ensemble learning Instructor: Herke van Hoof (herke.vanhoof@mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp551

### In-depth: Deep learning (one lecture) Applied to both SL and RL above Code examples

Introduction to machine learning (two lectures) Supervised learning Reinforcement learning (lab) In-depth: Deep learning (one lecture) Applied to both SL and RL above Code examples 2017-09-30 2 1 To enable

### CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh February 28, 2017

CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh February 28, 2017 HW2 due Thursday Announcements Office hours on Thursday: 4:15pm-5:45pm Talk at 3pm: http://www.sam.pitt.edu/arc-

### Twitter Sentiment Analysis with Recursive Neural Networks

Twitter Sentiment Analysis with Recursive Neural Networks Ye Yuan, You Zhou Department of Computer Science Stanford University Stanford, CA 94305 {yy0222, youzhou}@stanford.edu Abstract In this paper,

### Machine Learning for Computer Vision

Prof. Daniel Cremers Machine Learning for Computer PD Dr. Rudolph Triebel Lecturers PD Dr. Rudolph Triebel rudolph.triebel@in.tum.de Room number 02.09.058 (Fridays) Main lecture MSc. Ioannis John Chiotellis

### CSE 546 Machine Learning

CSE 546 Machine Learning Instructor: Luke Zettlemoyer TA: Lydia Chilton Slides adapted from Pedro Domingos and Carlos Guestrin Logistics Instructor: Luke Zettlemoyer Email: lsz@cs Office: CSE 658 Office

### Large Scale Data Analysis Using Deep Learning

Large Scale Data Analysis Using Deep Learning Introduction to Deep Learning U Kang Seoul National University U Kang 1 In This Lecture Overview of deep learning History of deep learning and its recent advances

### Linear Regression. Chapter Introduction

Chapter 9 Linear Regression 9.1 Introduction In this class, we have looked at a variety of di erent models and learning methods, such as finite state machines, sequence models, and classification methods.

### Linear Regression: Predicting House Prices

Linear Regression: Predicting House Prices I am big fan of Kalid Azad writings. He has a knack of explaining hard mathematical concepts like Calculus in simple words and helps the readers to get the intuition

### SB2b Statistical Machine Learning Hilary Term 2017

SB2b Statistical Machine Learning Hilary Term 2017 Mihaela van der Schaar and Seth Flaxman Guest lecturer: Yee Whye Teh Department of Statistics Oxford Slides and other materials available at: http://www.oxford-man.ox.ac.uk/~mvanderschaar/home_

### Lecture 1: Introduc4on

CSC2515 Spring 2014 Introduc4on to Machine Learning Lecture 1: Introduc4on All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

### STA 414/2104 Statistical Methods for Machine Learning and Data Mining

STA 414/2104 Statistical Methods for Machine Learning and Data Mining Radford M. Neal, University of Toronto, 2014 Week 1 What are Machine Learning and Data Mining? Typical Machine Learning and Data Mining

### CPSC 340: Machine Learning and Data Mining. Course Review/Preview Fall 2015

CPSC 340: Machine Learning and Data Mining Course Review/Preview Fall 2015 Admin Assignment 6 due now. We will have office hours as usual next week. Final exam details: December 15: 8:30-11 (WESB 100).

### Session 1: Gesture Recognition & Machine Learning Fundamentals

IAP Gesture Recognition Workshop Session 1: Gesture Recognition & Machine Learning Fundamentals Nicholas Gillian Responsive Environments, MIT Media Lab Tuesday 8th January, 2013 My Research My Research

### Deep Neural Networks for Acoustic Modelling. Bajibabu Bollepalli Hieu Nguyen Rakshith Shetty Pieter Smit (Mentor)

Deep Neural Networks for Acoustic Modelling Bajibabu Bollepalli Hieu Nguyen Rakshith Shetty Pieter Smit (Mentor) Introduction Automatic speech recognition Speech signal Feature Extraction Acoustic Modelling

### Applied Machine Learning Lecture 1: Introduction

Applied Machine Learning Lecture 1: Introduction Richard Johansson January 16, 2018 welcome to the course! machine learning is getting increasingly popular among students our courses are full! many thesis

### TensorFlow APIs for Image Classification. Installing Tensorflow and TFLearn

CSc-215 (Gordon) Week 10B notes TensorFlow APIs for Image Classification TensorFlow is a powerful open-source library for Deep Learning, developed at Google. It became available to the general public in

### System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

### Session 4: Regularization (Chapter 7)

Session 4: Regularization (Chapter 7) Tapani Raiko Aalto University 30 September 2015 Tapani Raiko (Aalto University) Session 4: Regularization (Chapter 7) 30 September 2015 1 / 27 Table of Contents Background

### Detection of Insults in Social Commentary

Detection of Insults in Social Commentary CS 229: Machine Learning Kevin Heh December 13, 2013 1. Introduction The abundance of public discussion spaces on the Internet has in many ways changed how we

### COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection.

COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection. Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise

### COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection.

COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection. Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551

### Deep Learning Explained

Deep Learning Explained Module 1: Introduction and Overview Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez, Senior Researcher, Microsoft Course outline What is deep learning?

### Welcome to CMPS 142: Machine Learning. Administrivia. Lecture Slides for. Instructor: David Helmbold,

Welcome to CMPS 142: Machine Learning Instructor: David Helmbold, dph@soe.ucsc.edu Web page: www.soe.ucsc.edu/classes/cmps142/winter07/ Text: Introduction to Machine Learning, Alpaydin Administrivia Sign

### Recommender Systems. Sargur N. Srihari

Recommender Systems Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Recommender Systems Types of Recommender

### Welcome to CMPS 142 and 242: Machine Learning

Welcome to CMPS 142 and 242: Machine Learning Instructor: David Helmbold, dph@soe.ucsc.edu Office hours: Monday 1:30-2:30, Thursday 4:15-5:00 TA: Aaron Michelony, amichelo@soe.ucsc.edu Web page: www.soe.ucsc.edu/classes/cmps242/fall13/01

### Stanford NLP. Evan Jaffe and Evan Kozliner

Stanford NLP Evan Jaffe and Evan Kozliner Some Notable Researchers Chris Manning Statistical NLP, Natural Language Understanding and Deep Learning Dan Jurafsky sciences Percy Liang Natural Language Understanding,

### Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

### Sapienza Università di Roma

Sapienza Università di Roma Machine Learning Course Prof: Paola Velardi Deep Q-Learning with a multilayer Neural Network Alfonso Alfaro Rojas - 1759167 Oriola Gjetaj - 1740479 February 2017 Contents 1.

### Natural Language Processing with Deep Learning CS224N/Ling284

Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Recurrent Neural Networks and Language Models Abigail See Announcements Assignment 1: Grades will be released after class Assignment

### COMP150 DR Final Project Proposal

COMP150 DR Final Project Proposal Ari Brown and Julie Jiang October 26, 2017 Abstract The problem of sound classification has been studied in depth and has multiple applications related to identity discrimination,

### Machine Learning 2nd Edition

INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

### Classification with Deep Belief Networks. HussamHebbo Jae Won Kim

Classification with Deep Belief Networks HussamHebbo Jae Won Kim Table of Contents Introduction... 3 Neural Networks... 3 Perceptron... 3 Backpropagation... 4 Deep Belief Networks (RBM, Sigmoid Belief

### P(A, B) = P(A B) = P(A) + P(B) - P(A B)

AND Probability P(A, B) = P(A B) = P(A) + P(B) - P(A B) P(A B) = P(A) + P(B) - P(A B) Area = Probability of Event AND Probability P(A, B) = P(A B) = P(A) + P(B) - P(A B) If, and only if, A and B are independent,

### Disclaimer. Copyright. Deep Learning With Python

i Disclaimer The information contained within this ebook is strictly for educational purposes. If you wish to apply ideas contained in this ebook, you are taking full responsibility for your actions. The

### Dynamic Memory Networks for Question Answering

Dynamic Memory Networks for Question Answering Arushi Raghuvanshi Department of Computer Science Stanford University arushi@stanford.edu Patrick Chase Department of Computer Science Stanford University

### Jeff Howbert Introduction to Machine Learning Winter

Classification Ensemble e Methods 1 Jeff Howbert Introduction to Machine Learning Winter 2012 1 Ensemble methods Basic idea of ensemble methods: Combining predictions from competing models often gives

### Exploration vs. Exploitation. CS 473: Artificial Intelligence Reinforcement Learning II. How to Explore? Exploration Functions

CS 473: Artificial Intelligence Reinforcement Learning II Exploration vs. Exploitation Dieter Fox / University of Washington [Most slides were taken from Dan Klein and Pieter Abbeel / CS188 Intro to AI

### Principles of Machine Learning

Principles of Machine Learning Lab 5 - Optimization-Based Machine Learning Models Overview In this lab you will explore the use of optimization-based machine learning models. Optimization-based models

### Knowledge Representation and Reasoning with Deep Neural Networks. Arvind Neelakantan

Knowledge Representation and Reasoning with Deep Neural Networks Arvind Neelakantan UMass Amherst: David Belanger, Rajarshi Das, Andrew McCallum and Benjamin Roth Google Brain: Martin Abadi, Dario Amodei,

### Two Ideas For Structured Data: Reward augmented maximum likelihood Order matters. Samy Bengio, and the Brain team

Two Ideas For Structured Data: Reward augmented maximum likelihood Order matters Samy Bengio, and the Brain team Reward augmented maximum likelihood for neural structured prediction Mohammad Norouzi, Samy

### COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning Nakul Verma Machine learning: what? Study of making machines learn a concept without having to explicitly program it. Constructing algorithms that can: learn

### COMP 551 Applied Machine Learning Lecture 12: Ensemble learning

COMP 551 Applied Machine Learning Lecture 12: Ensemble learning Associate Instructor: Herke van Hoof (herke.vanhoof@mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551

### Modelling Time Series Data with Theano. Charles Killam, LP.D. Certified Instructor, NVIDIA Deep Learning Institute NVIDIA Corporation

Modelling Time Series Data with Theano Charles Killam, LP.D. Certified Instructor, NVIDIA Deep Learning Institute NVIDIA Corporation 1 DEEP LEARNING INSTITUTE DLI Mission Helping people solve challenging

### Introduction to Machine Learning

Introduction to Machine Learning Hamed Pirsiavash CMSC 678 http://www.csee.umbc.edu/~hpirsiav/courses/ml_fall17 The slides are closely adapted from Subhransu Maji s slides Course background What is the

### Lecture 6: Course Project Introduction and Deep Learning Preliminaries

CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 6: Course Project Introduction and Deep Learning Preliminaries Outline for Today Course projects What

### An Introduction to Deep Learning. Labeeb Khan

An Introduction to Deep Learning Labeeb Khan Special Thanks: Lukas Masuch @lukasmasuch +lukasmasuch Lead Software Engineer: Machine Intelligence, SAP The Big Players Companies The Big Players Startups

### COMP 527: Data Mining and Visualization. Danushka Bollegala

COMP 527: Data Mining and Visualization Danushka Bollegala Introductions Lecturer: Danushka Bollegala Office: 2.24 Ashton Building (Second Floor) Email: danushka@liverpool.ac.uk Personal web: http://danushka.net/

### Santa Monica College--- Spring 2016 Department of Mathematics MATH 54(#2730) Elementary Statistics Friday, 8:00am 12:05pm, Room MC74

Santa Monica College--- Spring 2016 Department of Mathematics MATH 54(#2730) Elementary Statistics Friday, 8:00am 12:05pm, Room MC74 Instructor: Melanie Xie Office Hours: Friday, 7:00 am 7:55am, Room MC74

### CS 2750: Machine Learning. Other Topics. Prof. Adriana Kovashka University of Pittsburgh April 13, 2017

CS 2750: Machine Learning Other Topics Prof. Adriana Kovashka University of Pittsburgh April 13, 2017 Plan for last lecture Overview of other topics and applications Reinforcement learning Active learning

### Stay Alert!: Creating a Classifier to Predict Driver Alertness in Real-time

Stay Alert!: Creating a Classifier to Predict Driver Alertness in Real-time Aditya Sarkar, Julien Kawawa-Beaudan, Quentin Perrot Friday, December 11, 2014 1 Problem Definition Driving while drowsy inevitably

### CS Machine Learning

CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

### Azure Machine Learning. Designing Iris Multi-Class Classifier

Media Partners Azure Machine Learning Designing Iris Multi-Class Classifier Marcin Szeliga 20 years of experience with SQL Server Trainer & data platform architect Books & articles writer Speaker at numerous

### 10707 Deep Learning. Russ Salakhutdinov. Language Modeling. h0p://www.cs.cmu.edu/~rsalakhu/10707/ Machine Learning Department

10707 Deep Learning Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu h0p://www.cs.cmu.edu/~rsalakhu/10707/ Language Modeling Neural Networks Online Course Disclaimer: Some of the material

### Survey Analysis of Machine Learning Methods for Natural Language Processing for MBTI Personality Type Prediction

Survey Analysis of Machine Learning Methods for Natural Language Processing for MBTI Personality Type Prediction Brandon Cui (bcui19@stanford.edu) 1 Calvin Qi (calvinqi@stanford.edu) 2 Abstract We studied

### Deep Learning Fun with TensorFlow. Martin Andrews Red Cat Labs

Deep Learning Fun with TensorFlow Martin Andrews Red Cat Labs Outline About me + Singapore community + Workshops Something in-the-news : Actual talk content Including lots of code (show of hands?) Deep

### E9 205 Machine Learning for Signal Processing

E9 205 Machine Learning for Signal Processing Introduction to Machine Learning of Sensory Signals 14-08-2017 Instructor - Sriram Ganapathy (sriram@ee.iisc.ernet.in) Teaching Assistant - Aravind Illa (aravindece77@gmail.com).

### Era of AI (Deep Learning) and harnessing its true potential

Era of AI (Deep Learning) and harnessing its true potential Artificial Intelligence (AI) AI Augments our brain with infallible memories and infallible calculators Humans and Computers have become a tightly

### Artificial Neural Networks

Artificial Neural Networks Outline Introduction to Neural Network Introduction to Artificial Neural Network Properties of Artificial Neural Network Applications of Artificial Neural Network Demo Neural

### Deep Learning and its application to CV and NLP. Fei Yan University of Surrey June 29, 2016 Edinburgh

Deep Learning and its application to CV and NLP Fei Yan University of Surrey June 29, 2016 Edinburgh Overview Machine learning Motivation: why go deep Feed-forward networks: CNN Recurrent networks: LSTM

### Machine Learning for NLP

Natural Language Processing SoSe 2014 Machine Learning for NLP Dr. Mariana Neves April 30th, 2014 (based on the slides of Dr. Saeedeh Momtazi) Introduction Field of study that gives computers the ability

### Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina

### Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. January 11, 2011

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 11, 2011 Today: What is machine learning? Decision tree learning Course logistics Readings: The Discipline

### Neural Text Summarization

Neural Text Summarization Urvashi Khandelwal Department of Computer Science Stanford University urvashik@stanford.edu Abstract Generation based text summarization is a hard task and recent deep learning

### Improved Word and Symbol Embedding for Part-of-Speech Tagging

Improved Word and Symbol Embedding for Part-of-Speech Tagging Nicholas Altieri, Sherdil Niyaz, Samee Ibraheem, and John DeNero {naltieri,sniyaz,sibraheem,denero}@berkeley.edu Abstract State-of-the-art

### Lecture 1. Introduction Bastian Leibe Visual Computing Institute RWTH Aachen University

Advanced Machine Learning Lecture 1 Introduction 20.10.2015 Bastian Leibe Visual Computing Institute RWTH Aachen University http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de Organization Lecturer

### Overview COEN 296 Topics in Computer Engineering Introduction to Pattern Recognition and Data Mining Course Goals Syllabus

Overview COEN 296 Topics in Computer Engineering to Pattern Recognition and Data Mining Instructor: Dr. Giovanni Seni G.Seni@ieee.org Department of Computer Engineering Santa Clara University Course Goals

### Machine Learning : Hinge Loss

Machine Learning Hinge Loss 16/01/2014 Machine Learning : Hinge Loss Recap tasks considered before Let a training dataset be given with (i) data and (ii) classes The goal is to find a hyper plane that

### Artificial Neural Networks. Andreas Robinson 12/19/2012

Artificial Neural Networks Andreas Robinson 12/19/2012 Introduction Artificial Neural Networks Machine learning technique Learning from past experience/data Predicting/classifying novel data Biologically

### Let the data speak: Machine Learning methods for data editing and imputation. Paper by: Felibel Zabala Presented by: Amanda Hughes

Let the data speak: Machine Learning methods for data editing and imputation Paper by: Felibel Zabala Presented by: Amanda Hughes September 2015 Objective Machine Learning (ML) methods can be used to help

### Article from. Predictive Analytics and Futurism December 2015 Issue 12

Article from Predictive Analytics and Futurism December 2015 Issue 12 The Third Generation of Neural Networks By Jeff Heaton Neural networks are the phoenix of artificial intelligence. Right now neural

### Homework III Using Logistic Regression for Spam Filtering

Homework III Using Logistic Regression for Spam Filtering Introduction to Machine Learning - CMPS 242 By Bruno Astuto Arouche Nunes February 14 th 2008 1. Introduction In this work we study batch learning

### Decision Tree for Playing Tennis

Decision Tree Decision Tree for Playing Tennis (outlook=sunny, wind=strong, humidity=normal,? ) DT for prediction C-section risks Characteristics of Decision Trees Decision trees have many appealing properties

### Modelling Sentence Pair Similarity with Multi-Perspective Convolutional Neural Networks ZHUCHENG TU CS 898 SPRING 2017 JULY 17, 2017

Modelling Sentence Pair Similarity with Multi-Perspective Convolutional Neural Networks ZHUCHENG TU CS 898 SPRING 2017 JULY 17, 2017 1 Outline Motivation Why do we want to model sentence similarity? Challenges

### INTRODUCTION TO MACHINE LEARNING. Machine Learning: What s The Challenge?

INTRODUCTION TO MACHINE LEARNING Machine Learning: What s The Challenge? Goals of the course Identify a machine learning problem Use basic machine learning techniques Think about your data/results What

### PG DIPLOMA IN MACHINE LEARNING & AI 11 MONTHS ONLINE

& PG DIPLOMA IN MACHINE LEARNING & AI 11 MONTHS ONLINE UpGrad is an online education platform to help individuals develop their professional potential in the most engaging learning environment. Online

### Mocking the Draft Predicting NFL Draft Picks and Career Success

Mocking the Draft Predicting NFL Draft Picks and Career Success Wesley Olmsted [wolmsted], Jeff Garnier [jeff1731], Tarek Abdelghany [tabdel] 1 Introduction We started off wanting to make some kind of

### Introduction: Convolutional Neural Networks for Visual Recognition.

Introduction: Convolutional Neural Networks for Visual Recognition boris.ginzburg@intel.com 1 Acknowledgments This presentation is heavily based on: http://cs.nyu.edu/~fergus/pmwiki/pmwiki.php http://deeplearning.net/reading-list/tutorials/

### Deep Learning for Amazon Food Review Sentiment Analysis

000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

### Programming Assignment2: Neural Networks

Programming Assignment2: Neural Networks Problem :. In this homework assignment, your task is to implement one of the common machine learning algorithms: Neural Networks. You will train and test a neural

### WING-NUS at CL-SciSumm 2017: Learning from Syntactic and Semantic Similarity for Citation Contextualization

WING-NUS at CL-SciSumm 2017: Learning from Syntactic and Semantic Similarity for Citation Contextualization Animesh Prasad School of Computing, National University of Singapore, Singapore a0123877@u.nus.edu