Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Introduction to Machine Learning (CS419M) Lecture 1: What and why? Jan 5, 2018

What is Machine Learning? Ability of computers to learn from data or past experience

What is Machine Learning? Ability of computers to learn from data or past experience data: Comes from various sources such as sensors, domain knowledge, experimental runs, etc.

Pigeon Superstition Video link: https://www.youtube.com/watch?v=ttfqlkgwe2u

Supervised Learning Given a labeled set of input-output pairs, D = {(x i,y i )} N i=1 objective is to learn a function mapping the inputs x to outputs y Inputs can be complex objects such as images, sentences, speech signals, etc. Typically take the form of features. Outputs are either categorical (classification tasks) or real-valued (regression tasks). More on these concepts in later classes.

Image recognition Image from ImageNet classification with deep CNNs, Krizhevsky et al.

What is Machine Learning? Ability of computers to learn from data or past experience data: Comes from various sources such as sensors, domain knowledge, experimental runs, etc. learn: Make intelligent predictions or decisions based on data 1. Supervised learning: decision trees, neural networks, etc.

Unsupervised Learning Given a set of inputs, D = {x i } N i=1, discover some patterns in the data Most common example: Clustering

When do we need ML? (I) For tasks that are easily performed by humans but are complex for computer systems to emulate Vision: Identify faces in a photograph, objects in a video or still image, etc. Natural language: Translate a sentence from Hindi to English, question answering, etc. Speech: Recognise spoken words, speaking sentences naturally Game playing: Play games like chess Robotics: Walking, jumping, displaying emotions, etc. Driving a car, flying a plane, navigating a maze, etc.

Relationship between AI, ML, DL Image from: https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/

When do we need ML? (II) For tasks that are beyond human capabilities Analysis of large and complex datasets E.g. IBM Watson s Jeopardy-playing machine Image credit: https://i.ytimg.com/vi/p18edakuc1u/maxresdefault.jpg

When do we need ML? (II) For tasks that are beyond human capabilities Analysis of large and complex datasets E.g. Autopilot controls Image credit: https://media.newyorker.com/photos/59095c8a019dfc3494e9f7b9/16:9/w_1200,h_630,c_limit/hazards-of-autopilot.jpg

ML and Statistics? Glossary Machine learning Statistics network, graphs model weights parameters learning fitting generalization test set performance supervised learning regression/classification unsupervised learning density estimation, clustering large grant = $1,000,000 large grant= $50,000 nice place to have a meeting: Snowbird, Utah, French Alps nice place to have a meeting: Las Vegas in August Glossary from: http://statweb.stanford.edu/~tibs/stat315a/glossary.pdf

Course Specifics

Pre-requisites Officially: No prerequisites. But would help if you ve taken Data Structures and Algorithms, Data Analysis and Interpretation, or any equivalent course taught by respective departments When to take this course: Should be comfortable with probability linear algebra multivariable calculus programming

Course webpage https://www.cse.iitb.ac.in/~pjyothi/cs419/

Course logistics Reading: All mandatory reading will be freely available online and posted on the course website. Textbooks (available online): 1. Understanding Machine Learning. Shai Shalev-Shwartz and Shai Ben-David. Cambridge University Press. 2017. 2. The Elements of Statistical Learning. Trevor Hastie, Robert Tibshirani and Jerome Friedman. Second Edition. 2009. Communication: We will use Moodle to communicate about the course. Attendance: Strongly advised to attend all lectures. Lot of material covered in class will not be on the slides. (Also, points for participation.)

Course TAs 1. Himanshu Agarwal, M.Tech. II 2. Aniket Kuiri, M.Tech. II 3. Pooja Palod, M.Tech. II 4. Sunandini Sanyal, M.Tech. I 5. Anupama Vijjapu, M.Tech. I 6. Saurabh Garg, B.Tech. IV 7. Tanmay Parekh, B.Tech. IV

Course Syllabus Provide an overview of machine learning and well-known techniques. We will briefly cover some ML applications as well. Some Topics: Basic foundations of ML, classification/regression, Naive Bayes classifier, linear and logistic regression Supervised learning: Decision trees, support vector machines, neural networks, etc. Unsupervised learning: k-means clustering, etc. Brief introduction to ML applications in computer vision, speech and natural language processing.

Evaluation (subject to change) Assignments: 4-5 assignments contributing to 40% of your grade. Late policy for assignments will vary and will be announced along with the assignment. Assignments will contain programming questions Midsem and final exam: 15% + 25% Participation: 5%. Four in-class quizzes will be randomly handed out. Should have attempted at least two to receive full points for participation.

Academic Integrity Policy Write what you know. Use your own words. If you refer to *any* external material, *always* cite your sources. Follow proper citation guidelines. If you re caught for plagiarism or copying, penalties are much higher than simply omitting that question. In short: Just not worth it. Image credit: https://www.flickr.com/photos/kurok/22196852451

Evaluation Project Grading: Constitutes 15% of the total grade. (Exceptional projects could get extra credit. Details posted on website.) Team: 2-3 members. Individual projects are highly discouraged. Project details: Apply the techniques you studied in class to any interesting problem of your choice Think of a problem early and work on it throughout the course. Project milestones will be posted on Moodle. Examples of project ideas: auto-complete code, generate song lyrics, help irctc predict ticket prices, etc. Feel free to be creative; consult with TAs/me if it s feasible

Datasets abound Kaggle: https://www.kaggle.com/datasets

Datasets abound Kaggle: https://www.kaggle.com/datasets Another good resource: http://deeplearning.net/datasets/ Popular resource for ML beginners: http://archive.ics.uci.edu/ml/index.php Interesting datasets for computational journalists: http://cjlab.stanford.edu/2015/09/30/lab-launch-and-data-sets/ Speech and language resources: www.openslr.org/ and so do ML libraries/toolkits scikit-learn, opencv, Keras, Tensorflow, NLTK, etc.

Some basic concepts

Typical ML approach How do we approach an ML problem? Modeling: Use a model to represent the task

Modeling Word Trans Posn Phn L-Lag T-Lag G-Lag L-Phn Prev- Phn T-Phn G-Phn Lip-op TT-op Glot sur Lip-op sur TT-op sur Glot

Typical ML approach How do we approach an ML problem? Modeling: Use a model to represent the task Decoding/Inference: Given a model, answer questions with respect to the model

Inference Given an observed set of values, how accurately can we predict the identity of a word? Frame # Articulatory Features Word Posn Word list weighted by probabilities 1 G1 LP0 V0 T0 Phn 2 G1 LP1 V0 T0 L-Lag T-Lag G-Lag 3 G1 LP1 V0 T0 4 G1 LP2 V1 T3 L-Phn T-Phn G-Phn 5 G1 LP2 V1 T3 : Lip-Op sur! Lip-Op TT-Op sur! TT-Op Glot sur! Glot Articulatory Features

Typical ML approach How do we approach an ML problem? Modeling: Use a model to represent the task Decoding/Inference: Given a model, answer questions with respect to the model Learning: The model could be parameterized and the parameters are learned using data

How do we know if our model s any good? Generalization: Does the trained model produce good predictions on examples beyond the training set? We should be careful not to overfit the training data Occam s Razor: All other things being equal, pick the simplest solution These concepts will be made more precise in later classes

No free lunch theorem There is no single best model that works optimally for all kinds of problems (Wolpert 1997) No learning is possible without some prior assumptions about the problem at hand Need many different types of models to cover variety of problems in the real world. Each model will have a range of algorithms that can be used to train it.