Introduction to Machine Learning for NLP I

Similar documents
Lecture 1: Machine Learning Basics

Python Machine Learning

(Sub)Gradient Descent

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Assignment 1: Predicting Amazon Review Ratings

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

CS Machine Learning

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Generative models and adversarial training

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Model Ensemble for Click Prediction in Bing Search Ads

Probability and Statistics Curriculum Pacing Guide

CSL465/603 - Machine Learning

arxiv: v1 [cs.lg] 15 Jun 2015

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Lecture 1: Basic Concepts of Machine Learning

Second Exam: Natural Language Parsing with Neural Networks

Statewide Framework Document for:

Grade 6: Correlated to AGS Basic Math Skills

Laboratorio di Intelligenza Artificiale e Robotica

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Artificial Neural Networks written examination

Multi-Lingual Text Leveling

Axiom 2013 Team Description Paper

Word Segmentation of Off-line Handwritten Documents

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Measurement. When Smaller Is Better. Activity:

Functional Skills Mathematics Level 2 assessment

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Knowledge Transfer in Deep Convolutional Neural Nets

Rule Learning With Negation: Issues Regarding Effectiveness

A study of speaker adaptation for DNN-based speech synthesis

THE UNIVERSITY OF SYDNEY Semester 2, Information Sheet for MATH2068/2988 Number Theory and Cryptography

Online Updating of Word Representations for Part-of-Speech Tagging

Laboratorio di Intelligenza Artificiale e Robotica

Extending Place Value with Whole Numbers to 1,000,000

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Linking Task: Identifying authors and book titles in verbose queries

arxiv: v1 [cs.cl] 2 Apr 2017

Human Emotion Recognition From Speech

arxiv: v2 [cs.cv] 30 Mar 2017

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

MGT/MGP/MGB 261: Investment Analysis

Rule Learning with Negation: Issues Regarding Effectiveness

Math 96: Intermediate Algebra in Context

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

A Comparison of Two Text Representations for Sentiment Analysis

Probabilistic Latent Semantic Analysis

School of Innovative Technologies and Engineering

Mathematics process categories

Calibration of Confidence Measures in Speech Recognition

Radius STEM Readiness TM

arxiv: v1 [cs.cv] 10 May 2017

CS 446: Machine Learning

A Reinforcement Learning Variant for Control Scheduling

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

arxiv: v4 [cs.cl] 28 Mar 2016

Helping Your Children Learn in the Middle School Years MATH

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

GACE Computer Science Assessment Test at a Glance

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Students Understanding of Graphical Vector Addition in One and Two Dimensions

Applications of data mining algorithms to analysis of medical data

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

Australian Journal of Basic and Applied Sciences

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

Natural Language Processing. George Konidaris

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Learning Probabilistic Behavior Models in Real-Time Strategy Games

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Copyright 2017 DataWORKS Educational Research. All rights reserved.

Mathematics. Mathematics

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Software Maintenance

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Math 098 Intermediate Algebra Spring 2018

Applications of memory-based natural language processing

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

OFFICE SUPPORT SPECIALIST Technical Diploma

SANTIAGO CANYON COLLEGE Reading & English Placement Testing Information

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Mining Association Rules in Student s Assessment Data

Cal s Dinner Card Deals

Transcription:

Introduction to Machine Learning for NLP I Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 1 / 49

Outline 1 This Course 2 Overview 3 Machine Learning Definition Data (Eperience) Tasks Performance Measures 4 Linear Regression: Overview and Cost Function 5 Summary Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 2 / 49

Course Overview Foundations of machine learning loss functions linear regression logistic regression gradient-based optimization neural networks and backpropagation Deep learning tools in Python Numpy Theano Keras (some) Tensorflow?, (some) Pytorch? Applications Word Embeddings Senitment Analysis Relation etraction (some) Machine Translation? Practical projects (NLP related, to be agreed on during the course) Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 3 / 49

Lecture Times, Tutorials Course homepage: dl-nlp.github.io 9-11 is supposed to be the lecture slot, and 11-12 the tutorial slot...... but we will not stick to that allocation We will sometimes have longer Q&A-style/interactive tutorial sessions, sometimes more lectures (see net slide) Tutor: Simon Schäfer Will discuss eercise sheets in the tutorials Will help you with the projects Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 4 / 49

Plan 9-11 slot 11-12 slot E. sheet 10/18 Overview / ML Intro I ML Intro I Linear algebra chapter 10/25 Linear algebra Q&A / ML II ML II Probability chapter 11/1 public holiday 11/8 Probability Q&A / ML III Numpy Numpy 11/15 ML IV/Theano Intro Convolution Theano I 9-11 slot 11-12 slot E. sheet 11/22 Embeddings / CNNs & RNNs for NLP Numpy Q&A Read LSTM/RNN 11/29 LSTM (reading group) Theano I Q&A Theano II 12/6 Keras Keras Keras 12/13 DL for Relation Prediction Theano II Q&A Relation Prediction 12/20 Word Vectors Project Topics Project Assignments 9-11 slot 11-12 slot E. sheet 1/10 Keras Q&A, Rel.Etr. Q&A Tensorflow 1/17 optimization methods/pytorch Help with projects 1/24 Other Work at CIS / LMU, Neural MT Help with projects 1/31 Project presentations presentations 2/7 Project presentations presentations Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 5 / 49

Formalities This class is graded by a project The grade of the project is determined taking the average of: Grade of the code written for the project. Grade of project documentation / mini-report. Grade of presentation about your project. You have to pass all three elements in order to pass the course. Bonus points: The grade can be improved by 0.5 absolute grades through the eercise sheets before New Year. Formula: g project = g project-code + g project-report + g project-presentation 3 g final = round(g project 0.5 ) where is the fraction of points reached in the eercises (between 0 and 1), and round selects the closest value of 1; 1.3; 1.7; 2; 3.7; 4 Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 6 / 49

Eercise sheets, Projects, Presentations 6 ECTS, 14 weeks avg work load 13hrs / week (3 in class, 10 at home) in the first weeks, spend enough time to read and prepare so that you are not lost later from mid-november to mid-december: programming assignments - coding takes time, and can be frustating (but rewarding)! Eercise sheets Work on non-programming eercise sheets individually For eercise sheets that contain programming parts, submit in teams of 2 or 3 Projects A list of topics will be proposed by me: Implement a deep learning technique applied to information etaction (or other NLP task) Own ideas also possible, need to be discussed with me Work in groups of two or three Project report: 3 pages / team member Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 7 / 49

Good project code...... shows that you master the techniques taught in the lectures and eercises.... shows that you can make own decisions : e.g. adapt model / task / training data etc if necessary.... is well-structured and easy to understand (telling variable names, meaningful modularization avoid: code duplication, dead code)... is correct (especially: train/dev/test splits, evaluation)... is within the scope of this lecture (time-wise should not eceed 5 10h) Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 8 / 49

A good project presentation...... is short (10 min. p.p. + 15 min. Q&A per team)... similar to the report, contains the problem statement, motivation, model, and results... is targeted to your fellow students, who do not know details beforehand... contains interesting stuff: unepected observations? conclusions / recommendations? did you deviate from some common practice?... demonstrates that all team members worked together on the project Possible outline Background / Motivation Formal characterization of techniques used Technical Approach and Difficulties Eperiments, Results and Interpretation Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 9 / 49

A good project report...... is concise (3 pages / person) and clear... motivates and describes the model that you have implemented and the results that you have obtained... shows that you can correctly describe the concepts taught in this class... contains interesting stuff: unepected observations? conclusions / recommendations? did you deviate from some common practice? Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 10 / 49

Outline 1 This Course 2 Overview 3 Machine Learning Definition Data (Eperience) Tasks Performance Measures 4 Linear Regression: Overview and Cost Function 5 Summary Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 11 / 49

Machine Learning Machine learning for natural language processing Why? Advantages and disadvantages to alternatives? Accuracy; Coverage; resources required (data, epertise, human labour); Reliability/Robustness; Eplainability P NP VP VP V NP NP Det NN Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 12 / 49

Deep Learning Learn comple functions, that are (recursively) composed of simpler functions. Many parameters have to be estimated. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 13 / 49

Deep Learning Main Advantage: Feature learning Models learn to capture most essential properties of data (according to some performance measure) as intermediate representations. No need to hand-craft feature etraction algorithms Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 14 / 49

Neural Networks First training methods for deep nonlinear NNs appeared in the 1960s (Ivakhnenko and others). Increasing interest in NN technology (again) since around 5 years ago ( Neural Network Renaissance ): Orders of magnitude more data and faster computers now. Many successes: Image recognition and captioning Speech regonition NLP and Machine translation (demo of Bahdanau / Cho / Bengio system) Game playing (AlphaGO)... Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 15 / 49

Machine Learning Deep Learning builds on general Machine Learning concepts argmin θ H m i=1 Fitting data vs. generalizing from data L(f ( i ; θ), y i ) prediction prediction prediction feature feature feature Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 16 / 49

Outline 1 This Course 2 Overview 3 Machine Learning Definition Data (Eperience) Tasks Performance Measures 4 Linear Regression: Overview and Cost Function 5 Summary Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 17 / 49

A Definition A computer program is said to learn from eperience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with eperience E. (Mitchell 1997) Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 18 / 49

A Definition A computer program is said to learn from eperience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with eperience E. (Mitchell 1997) Learning: Attaining the ability to perform a task. A set of eamples ( eperience ) represents a more general task. Eamples are described by features: sets of numerical properties that can be represented as vectors R n. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 19 / 49

Outline 1 This Course 2 Overview 3 Machine Learning Definition Data (Eperience) Tasks Performance Measures 4 Linear Regression: Overview and Cost Function 5 Summary Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 20 / 49

Data A computer program is said to learn from eperience E [...], if its performance [...] improves with eperience E. Dataset: collection of eamples Design matri X R n m n: number of eamples m: number of features Eample: Xi,j count of feature j (e.g. a stem form) in document i. Unsupervised learning: Model X, or find interesting properties of X. Training data: only X. Supervised learning: Predict specific additional properties from X. Training data: Label vector y R n together with X Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 21 / 49

Data Low training error does not mean good generalization. Algorithm may overfit. prediction feature prediction feature Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 22 / 49

Data Splits Best Practice: Split data into training, cross-validation and test set. ( Cross-validation set = development set ). Optimize low-level parameters (feature weights...) on training set. Select models and hyper-parameters on cross-validation set. (type of machine learning model, number of features, regularization, priors). It is possible to overfit both in the training as well as in the model selection stage! Report final score on test set only after model has been selected! Don t report the error on training or cross-validation set as your model performance! Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 23 / 49

Outline 1 This Course 2 Overview 3 Machine Learning Definition Data (Eperience) Tasks Performance Measures 4 Linear Regression: Overview and Cost Function 5 Summary Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 24 / 49

Machine Learning Tasks A computer program is said to learn [...] with respect to some class of tasks T [...] if its performance at tasks in T [...] improves [...] Types of Tasks: Classification Regression Structured Prediction Anomaly Detection synthesis and sampling Imputation of missing values Denoising Clustering Reinforcement learning... Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 25 / 49

Machine Learning Tasks: Typical Eamples & Eamples from Recent NLP Reserch What are the most important conferences relevant to the intersection of ML and NLP? Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 26 / 49

Task: Classification Which of k classes does an eample belong to? f : R n {1... k} Typical eample: Categorize image patches Feature vector: color intensities for each piel; derived features. Output categories: Predefined set of labels Typical eample: Spam Classification Feature vector: High-dimensional, sparse vector. Each dimension indicates occurrence of a particular word, or other email-specific information. Output categories: spam vs. ham Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 27 / 49

Task: Classification EMNLP 2017: Given a person name in a sentence that contains keywords related to police ( officer, police...) and to killing ( killed, shot ), was the person a civilian killed by police? Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 28 / 49

Task: Regression Predict a numerical value given some input. f : R n R Typical eamples: Predict the risk of an insurance customer. Predict the value of a stock. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 29 / 49

Task: Regression ACL 2017: Given a response in a multi-turn dialogue, predict the value (on a scale from 1 to 5) how natural a response is. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 30 / 49

Often involves search and problem-specific algorithms. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 31 / 49 Task: Structured Prediction Predict a multi-valued output with special inter-dependencies and constraints. Typical eamples: Part-of-speech tagging Syntactic parsing Protein-folding

Task: Structured Prediction ACL 2017: jointly find all relations relations of interest in a sentence by tagging arguments and combining them. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 32 / 49

Task: Reinforcement Learning In reinforcement learning, the model (also called agent) needs to select a serious of actions, but only observes the outcome (reward) at the end. The goal is to predict actions that will maimize the outcome. EMNLP 2017: The computer negotiates with humans in natural language in order to maimize its points in a game. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 33 / 49

Task: Anomaly Detection Detect atypical items or events. Common approach: Estimate density and identify items that have low probability. Eamples: Quality assurance Detection of criminal activity Often items categorized as outliers are sent to humans for further scrutiny. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 34 / 49

Task: Anomaly Detection ACL 2017: Schizophrenia patients can be detected by their non-standard use of mataphors, and more etreme sentiment epressions. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 35 / 49

Supervised and Unsupervised Learning Unsupervised learning: Learn interesting properties, such as probability distribution p() Supervised learning: learn mapping from to y, typically by estimating p(y ) Supervised learning in an unsupervised way: p(y ) = p(, y) y p(, y ) Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 36 / 49

Outline 1 This Course 2 Overview 3 Machine Learning Definition Data (Eperience) Tasks Performance Measures 4 Linear Regression: Overview and Cost Function 5 Summary Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 37 / 49

Performance Measures A computer program is said to learn [...] with respect to some [...] performance measure P, if its performance [...] as measured by P, improves [...] Quantitative measure of algorithm performance. Task-specific. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 38 / 49

Discrete Loss Functions Can be used to measure classification performance. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 39 / 49

Discrete Loss Functions Can be used to measure classification performance. Not applicable to measure density estimation or regression performance. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 39 / 49

Discrete Loss Functions Can be used to measure classification performance. Not applicable to measure density estimation or regression performance. Accuracy Proportion of eamples for which model produces correct output. 0-1 loss = error rate = 1 - accuracy. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 39 / 49

Discrete Loss Functions Can be used to measure classification performance. Not applicable to measure density estimation or regression performance. Accuracy Proportion of eamples for which model produces correct output. 0-1 loss = error rate = 1 - accuracy. Accuracy may be inappropriate for skewed label distributions, where relevant category is rare F1-score = 2 Prec Rec Prec + Rec Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 39 / 49

Discrete vs. Continuous Loss Functions Discrete loss functions cannot indicate how wrong a wrong decision for one eample is. Continuous loss functions...... are more widely applicable.... are often easier to optimize (differentiable).... can also be applied to discrete tasks (classification). Sometimes algorithms are optimized using one loss (e.g. Hinge loss) and evaluated using another loss (e.g. F1-Score). Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 40 / 49

Eamples for Continuous Loss Functions Density estimation: log probability of eample Regression: squared error Classification: Loss L(y i f ( i )) is function of label prediction label { 1, 1}, prediction R Correct prediction: y i f ( i ) > 0 Wrong prediction: y i f ( i ) <= 0 zero-one loss, Hinge-loss, logistic loss... Loss on data set is sum of per-eample losses. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 41 / 49

Outline 1 This Course 2 Overview 3 Machine Learning Definition Data (Eperience) Tasks Performance Measures 4 Linear Regression: Overview and Cost Function 5 Summary Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 42 / 49

Linear Regression For one instance: Input: vector R n Output: scalar y R (actual output: y; predicted output: ŷ) Linear function ŷ = w T = n w j j Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 43 / 49 j=1

Linear Regression Linear function: ŷ = w T = n w j j Parameter vector w R n Weight w j decides if value of feature j increases or decreases prediction ŷ. Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 44 / 49 j=1

Linear Regression For the whole data set: Use matri X and vector y to stack instances on top of each other. Typically first column contains all 1 for the intercept (bias, shift) term. 1 12 13... 1n y 1 1 22 23... 2n X =....... y = y 2. 1 m2 m3... mn y m For entire data set, predictions are stacked on top of each other: ŷ = Xw Estimate parameters using X (train) and y (train). Make high-level decisions (which features...) using X (dev) and y (dev). Evaluate resulting model using X (test) and y (test). Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 45 / 49

Simple Eample: Housing Prices Predict Munich property prices (in 1K Euros) from just one feature: Square meters of property. 1 450 730 X = 1 900 y = 1300 1 1350 1700 Prediction is: w 1 + 450w 2 1 450 [ ] ŷ = w 1 + 900w 2 = 1 900 w1 = Xw w w 1 + 1350w 2 1 1350 2 w 1 will contain costs incurred in any property acquisition w 2 will contain remaining average price per square meter. Optimal parameters are for the above case: [ ] 759.1 273.3 w = ŷ = 1245.1 1.08 1731.1 Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 46 / 49

Linear Regression: Mean Squared Error Mean squared error of training (or test) data set is the sum of squared differences between the predictions and labels of all m instances. MSE (train) = 1 m m i=1 (ŷ (train) i y (train) i ) 2 In matri notation: MSE (train) = 1 m ŷ(train) y (train) ) 2 2 = 1 m X(train) w y (train) ) 2 2 Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 47 / 49

Outline 1 This Course 2 Overview 3 Machine Learning Definition Data (Eperience) Tasks Performance Measures 4 Linear Regression: Overview and Cost Function 5 Summary Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 48 / 49

Summary Deep Learning many successes in recent years feature learning instead of feature engineering builds on general machine learning concepts Machine learning definition Data Task Cost function Machine tasks Classification Regression... Linear regression Output depends linearly on input Cost function: Mean squared error Net up: estimating the parameters Benjamin Roth (CIS LMU München) Introduction to Machine Learning for NLP I 49 / 49