Machine Learning for Data Science (CS4786) Lecture 1

Similar documents
Probabilistic Latent Semantic Analysis

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Python Machine Learning

CS Machine Learning

Speech Emotion Recognition Using Support Vector Machine

Human Emotion Recognition From Speech

CSL465/603 - Machine Learning

Lecture 1: Machine Learning Basics

Comment-based Multi-View Clustering of Web 2.0 Items

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

A Case Study: News Classification Based on Term Frequency

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Learning Methods in Multilingual Speech Recognition

WHEN THERE IS A mismatch between the acoustic

Generative models and adversarial training

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Artificial Neural Networks written examination

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

A study of speaker adaptation for DNN-based speech synthesis

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Switchboard Language Model Improvement with Conversational Data from Gigaword

Modeling function word errors in DNN-HMM based LVCSR systems

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Time series prediction

Truth Inference in Crowdsourcing: Is the Problem Solved?

Beyond the Blend: Optimizing the Use of your Learning Technologies. Bryan Chapman, Chapman Alliance

Paper Reference. Edexcel GCSE Mathematics (Linear) 1380 Paper 1 (Non-Calculator) Foundation Tier. Monday 6 June 2011 Afternoon Time: 1 hour 30 minutes

arxiv: v2 [cs.cv] 30 Mar 2017

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

A Vector Space Approach for Aspect-Based Sentiment Analysis

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Welcome to. ECML/PKDD 2004 Community meeting

Finding Your Friends and Following Them to Where You Are

A Comparison of Two Text Representations for Sentiment Analysis

CS 446: Machine Learning

(Sub)Gradient Descent

Multivariate k-nearest Neighbor Regression for Time Series data -

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Modeling function word errors in DNN-HMM based LVCSR systems

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

12- A whirlwind tour of statistics

Writing Research Articles

Innovative Methods for Teaching Engineering Courses

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance

Indian Institute of Technology, Kanpur

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Assignment 1: Predicting Amazon Review Ratings

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Word Segmentation of Off-line Handwritten Documents

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

Shockwheat. Statistics 1, Activity 1

Speak Up 2012 Grades 9 12

Rule Learning With Negation: Issues Regarding Effectiveness

Lecture 1: Basic Concepts of Machine Learning

The Evolution of Random Phenomena

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Syllabus for CHEM 4660 Introduction to Computational Chemistry Spring 2010

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

A survey of multi-view machine learning

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Edinburgh Research Explorer

Bug triage in open source systems: a review

Comparison of network inference packages and methods for multiple networks inference

Discriminative Learning of Beam-Search Heuristics for Planning

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Machine Learning and Development Policy

An Online Handwriting Recognition System For Turkish

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Universidade do Minho Escola de Engenharia

CSC200: Lecture 4. Allan Borodin

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Rule-based Expert Systems

Probability and Statistics Curriculum Pacing Guide

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

A Neural Network GUI Tested on Text-To-Phoneme Mapping

THE world surrounding us involves multiple modalities

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Linking Task: Identifying authors and book titles in verbose queries

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

K 1 2 K 1 2. Iron Mountain Public Schools Standards (modified METS) Checklist by Grade Level Page 1 of 11

Navigating the PhD Options in CMS

CS 598 Natural Language Processing

BYLINE [Heng Ji, Computer Science Department, New York University,

Calibration of Confidence Measures in Speech Recognition

Transcription:

Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Phillips Hall 101 Instructor : Karthik Sridharan

THE AWESOME TA S 1 Esin Durmus 2 Vlad Niculae 3 Jonathan Simon 4 Ashudeep Singh 5 Yu Sun [TA consultant] 6 Yechuan(Jeff) Tian 7 Felix Wu

COURSE INFORMATION Course webpage is the official source of information: http://www.cs.cornell.edu/courses/cs4786/2016sp Join Piazza: https://piazza.com/class/ijxdhmmko1h130 TA office hours will start from next week While the course is not coding intensive, you will need to do some light coding.

COURSE INFORMATION Assignments worth 60% of the grades Two competitions (worth 40% of the grade) TA office hours will start from next week Course is not coding intensive, light coding needed though (language your choice)

ASSIGNMENTS Diagnostic assignment 0 is out: for our calibration. 3% of assignment grades allotted only to hand in A0 (we wont be giving grades for solutions) Students who want to take course for credit need to submit this, only then you will be added to CMS. Hand in your assignments beginning of class on 4th Feb. Has to be done individually

ASSIGNMENTS Diagnostic assignment 0 is out: for our calibration. 3% of assignment grades allotted only to hand in A0 (we wont be giving grades for solutions) Students who want to take course for credit need to submit this, only then you will be added to CMS. Hand in your assignments beginning of class on 4th Feb. Has to be done individually Three assignments A1, A2 and A3 Can be done in groups of size at most 4. Only one write up/submission per group

ASSIGNMENTS Diagnostic assignment 0 is out: for our calibration. 3% of assignment grades allotted only to hand in A0 (we wont be giving grades for solutions) Students who want to take course for credit need to submit this, only then you will be added to CMS. Hand in your assignments beginning of class on 4th Feb. Has to be done individually Three assignments A1, A2 and A3 Can be done in groups of size at most 4. Only one write up/submission per group Diagnostic assignment P1 (sometime mid semester) Has to be done individually Worth 10% of class grades

COMPETITIONS 2 competition/challenges, Clustering/data visualization challenge Prediction challenge with focus on feature extraction/selection Will be hosted on In class Kaggle! Grades for project focus more on thought process (demonstrated through your reports) Kaggle scores only factor in for part of the grade. Groups of size at most 4.

Lets get started...

DATA DELUGE Each time you use your credit card: who purchased what, where and when Netflix, Hulu, smart TV: what do different groups of people like to watch Social networks like Facebook, Twitter,... : who is friends with who, what do these people post or tweet about Millions of photos and videos, many tagged Wikipedia, all the news websites: pretty much most of human knowledge

Guess?

Social Network of Marvel Comic Characters!

What can we learn from all this data?

WHAT IS MACHINE LEARNING? Use data to automatically learn to perform tasks better.

W HERE IS IT USED? Movie Rating Prediction

WHERE IS IT USED? Pedestrian Detection

WHERE IS IT USED? Market Predictions

W HERE IS IT USED? Spam Classification

MORE APPLICATIONS Each time you use your search engine Autocomplete: Blame machine learning for bad spellings Biometrics: reason you shouldn t smile Recommendation systems: what you may like to buy based on what your friends and their friends buy Computer vision: self driving cars, automatically tagging photos Topic modeling: Automatically categorizing documents/emails by topics or music by genre...

TOPICS WE WILL COVER 1 Dimensionality Reduction: Principal Component Analysis (PCA), Canonical Component Analysis (CCA), Random projections, Compressed Sensing (CS),... 2 Clustering and Mixture models: k-means clustering, gaussian mixture models, single-link clustering, spectral clustering,... 3 Probabilistic Modeling & Graphical Models: Probabilistic modeling, MLE Vs MAP Vs Bayesian approaches, inference and learning in graphical models, Latent Dirichlet Allocation (LDA), Hidden Markov Models (HMM),...

UNSUPERVISED LEARNING Given (unlabeled) data, find useful information, pattern or structure Dimensionality reduction/compression : compress data set by removing redundancy and retaining only useful information Clustering: Find meaningful groupings in data Topic modeling: discover topics/groups with which we can tag data points

DIMENSIONALITY REDUCTION You are provided with n data points each in R d Goal: Compress data into n, points in R K where K << d Retain as much information about the original data set Retain desired properties of the original data set Eg. PCA, compressed sensing,...

PRINCIPAL COMPONENT ANALYSIS (PCA) Eigen Face: Write down each data point as a linear combination of small number of basis vectors Data specific compression scheme One of the early successes: in face recognition: classification based on nearest neighbor in the reduced dimension space

COMPRESSED SENSING From Compressive Sensing Camera Can we compress directly while receiving the input? We now have cameras that directly sense/record compressed information... and very fast! Time spent only for reconstructing the compressed information Especially useful for capturing high resolution MRI s

DATA VISUALIZATION 2D projection Help visualize data (in relation to each other) Preserve relative distances among data-points (at least close by ones)

CLUSTERING K-means clustering Given just the data points group them in natural clusters Roughly speaking Points within a cluster must be close to each other Points between clusters must be separated Helps bin data points, but generally hard to do

T ELL ME WHO YOUR FRIENDS ARE...

T ELL ME WHO YOUR FRIENDS ARE... Cluster nodes in a graph. Analysis of social network data.

TOPIC MODELLING Probabilistic generative model for documents Each document has a fixed distribution over topics, each topic is has a fixed distribution over words belonging to it Unlike clustering, groups are non-exclusive

SUPERVISED LEARNING Training data comes as input output pairs (x, y) Based on this data we learn a mapping from input to output space Goal: Given new input instance x, predict outcome y accurately based on given training data Classification, regression

WHAT WE WON T COVER Feature extraction is a problem/domain specific art, we won t cover this in class We won t cover optimization methods for machine learning Implementation tricks and details won t be covered There are literally thousands of methods, we will only cover a few!

WHAT YOU CAN TAKE HOME How to think about a learning problem and formulate it Well known methods and how and why they work Hopefully we can give you an intuition on choice of methods/approach to try out on a given problem

DIMENSIONALITY REDUCTION Given data x 1,..., x n R d compress the data points in to low dimensional representation y 1,..., y n R K where K << d

WHY DIMENSIONALITY REDUCTION? For computational ease As input to supervised learning algorithm Before clustering to remove redundant information and noise Data visualization Data compression Noise reduction

DIMENSIONALITY REDUCTION Desired properties: 1 Original data can be (approximately) reconstructed 2 Preserve distances between data points 3 Relevant information is preserved 4 Redundant information is removed 5 Models our prior knowledge about real world Based on the choice of desired property and formalism we get different methods

SNEAK PEEK Linear projections Principal component analysis