COMP 441/552: Large Scale Machine Learning

Similar documents
CS Machine Learning

Top US Tech Talent for the Top China Tech Company

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Rule Learning with Negation: Issues Regarding Effectiveness

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Rule Learning With Negation: Issues Regarding Effectiveness

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

San José State University Department of Psychology PSYC , Human Learning, Spring 2017

CSL465/603 - Machine Learning

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Ryerson University Sociology SOC 483: Advanced Research and Statistics

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

How to read a Paper ISMLL. Dr. Josif Grabocka, Carlotta Schatten

Team Formation for Generalized Tasks in Expertise Social Networks

Learning From the Past with Experiment Databases

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition

Executive Guide to Simulation for Health

Major Milestones, Team Activities, and Individual Deliverables

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Multivariate k-nearest Neighbor Regression for Time Series data -

Generative models and adversarial training

MGMT3274 INTERNATONAL BUSINESS PROCESSES AND PROBLEMS

Syllabus - ESET 369 Embedded Systems Software, Fall 2016

Welcome to. ECML/PKDD 2004 Community meeting

TUESDAYS/THURSDAYS, NOV. 11, 2014-FEB. 12, 2015 x COURSE NUMBER 6520 (1)

A Case Study: News Classification Based on Term Frequency

SYLLABUS: RURAL SOCIOLOGY 1500 INTRODUCTION TO RURAL SOCIOLOGY SPRING 2017

Discriminative Learning of Beam-Search Heuristics for Planning

Pitching Accounts & Advertising Sales ADV /PR

GEOCODING LOCATIONS OF HISTORIC RECLAMATION RESEARCH SITES USING GOOGLE EARTH

Python Machine Learning

Foothill College Summer 2016

ACCOUNTING FOR MANAGERS BU-5190-OL Syllabus

Biscayne Bay Campus, Marine Science Building (room 250 D)

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Measurement & Analysis in the Real World

Class Meeting Time and Place: Section 3: MTWF10:00-10:50 TILT 221

ENVR 205 Engineering Tools for Environmental Problem Solving Spring 2017

A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique

Streaming Video Control Review. Who am I?

Extending Place Value with Whole Numbers to 1,000,000

(I couldn t find a Smartie Book) NEW Grade 5/6 Mathematics: (Number, Statistics and Probability) Title Smartie Mathematics

University of Waterloo School of Accountancy. AFM 102: Introductory Management Accounting. Fall Term 2004: Section 4

An Introduction to Simio for Beginners

Lecture 1: Machine Learning Basics

arxiv: v1 [cs.lg] 15 Jun 2015

New Venture Financing

B. How to write a research paper

CS 3516: Computer Networks

UCLA UCLA Electronic Theses and Dissertations

Data Structures and Algorithms

Sociology 521: Social Statistics and Quantitative Methods I Spring 2013 Mondays 2 5pm Kap 305 Computer Lab. Course Website

Laboratorio di Intelligenza Artificiale e Robotica

Advanced Multiprocessor Programming

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

THE GEORGE WASHINGTON UNIVERSITY Department of Economics. ECON 1012: PRINCIPLES OF MACROECONOMICS Prof. Irene R. Foster

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Detecting English-French Cognates Using Orthographic Edit Distance

Social Media Journalism J336F Unique Spring 2016

Penn State University - University Park MATH 140 Instructor Syllabus, Calculus with Analytic Geometry I Fall 2010

Sight Word Assessment

While you are waiting... socrative.com, room number SIMLANG2016

Linking Task: Identifying authors and book titles in verbose queries

Guiding Subject Liaison Librarians in Understanding and Acting on User Survey Results

Learning to Rank with Selection Bias in Personal Search

RM 2234 Retailing in a Digital Age SPRING 2016, 3 credits, 50% face-to-face (Wed 3pm-4:15pm)

CSC200: Lecture 4. Allan Borodin

Comprehensive Progress Report

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

95723 Managing Disruptive Technologies

Using LibQUAL+ at Brown University and at the University of Connecticut Libraries

Office Hours: Day Time Location TR 12:00pm - 2:00pm Main Campus Carl DeSantis Building 5136

Instructor Dr. Kimberly D. Schurmeier

Shockwheat. Statistics 1, Activity 1

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Mining Association Rules in Student s Assessment Data

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

TCH_LRN 531 Frameworks for Research in Mathematics and Science Education (3 Credits)

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Answers To Hawkes Learning Systems Intermediate Algebra

Intermediate Algebra

Week 01. MS&E 273: Technology Venture Formation

(Sub)Gradient Descent

COSI Meet the Majors Fall 17. Prof. Mitch Cherniack Undergraduate Advising Head (UAH), COSI Fall '17: Instructor COSI 29a

Syllabus for CHEM 4660 Introduction to Computational Chemistry Spring 2010

Spring 2016 Stony Brook University Instructor: Dr. Paul Fodor

INTERMEDIATE ALGEBRA Course Syllabus

DICE - Final Report. Project Information Project Acronym DICE Project Title

Automatic Discretization of Actions and States in Monte-Carlo Tree Search

Laboratorio di Intelligenza Artificiale e Robotica

CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS

ITSC 2321 Integrated Software Applications II COURSE SYLLABUS

Many instructors use a weighted total to calculate their grades. This lesson explains how to set up a weighted total using categories.

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Level 1 Mathematics and Statistics, 2015

CIS Introduction to Digital Forensics 12:30pm--1:50pm, Tuesday/Thursday, SERC 206, Fall 2015

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

MMC 6949 Professional Internship Fall 2016 University of Florida, Online Master of Arts in Mass Communication 3 Credit Hours

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Transcription:

COMP 441/552: Large Scale Machine Learning Rice University Anshumali Shrivastava anshumali At rice.edu 1 / 12

About Instructor : Anshumali Shrivastava Email : anshumali AT rice.edu Class Timing: Monday/Wednesday/Friday 11am to 11:50am TA: Chen Luo cl67 @ rice. Class Location : DCH 1062 Office Hours : TBD Website: http: //www.cs.rice.edu/~as143/comp441_spring17/index.html Discussions and Announcements: Canvas 2 / 12

Grading Total (105%) Project 50% (Group of 2)1 4-5 assignments (Best 4 will be considered) 25% 2 Quizzes 15% 2 3 1 Scribes (Individual) 10% Participation and Discussion Forums 5% 1 Grads Have Higher Bar Extra Sections for Grads 3 Grads Will Have More Questions 2 3 / 12

IMPORTANT Work in Group of 2. Both Get Same Marks, Choose Wisely. Proposals Due: 23rd Jan. Midterm Presentation: 6th and 8th March Final Presentation: 19th and 21st April Final Reports Due: 1st May 4 / 12

What Should A Project Be Like? Ideally publishable in Top Tier Conferences ICML, NIPS, KDD, etc. Take a popular ML algorithm with recent benchmark method/implementation. Make is (5x+) faster using parallelism/approximations. Or Reduce memory footprint. An end-to-end implementation of an ML algorithm with support multi-core/gpus/multi-node with 2-5 benchmarks. (Less Risky) Novel Estimators/Algorithms with some theoretical Analysis or Large Scale Evaluations. Must show advantage over existing methods. Creating (or having access to a unique) Large-Scale dataset (from mostly web), for a novel task. Organize it: label generating/creation, cleaning, etc. Run 3-4 (or more) intuitive benchmarks on it. Important: Benchmarking your proposal against 2-3 recent popular methods on performance and accuracy. Evaluations on Multiple and Large Datasets. Beating the best published accuracy on a popular dataset. (Risky) You can use your existing project, if it involves large-scale ML. 5 / 12

Projects What should not be aimed Standard ML problem on existing data. I proposed XYZ algorithm, it works on this (small) dataset. However, there are no baselines. I got 5% or less improvements over standard methods on some small dataset (usually less than million examples). Most Important Component of Class, Start Now! How will it work Form a Group. I can help co-ordinate. Formulate the Problem and Project. Get approved by the Instructor. We have few pre-defined and concrete projects. Come talk. 6 / 12

Other Requirements Assignments 4-5 bi-weekly assignments. Due on Friday in Class. Only 4 will be counted. Scribes Each student will scribe 1 lecture, starting next week 16th. Scribes are due, by email, on the 5 days of the class (16th Due on 21st). Choose dates soon. (Spreadsheet Link Soon) LaTeX template on Website. Exams Two 10-15 min In-Class Quizzes (Will be Announced). No Finals. No mid-terms. 7 / 12

Some Problems How can we search through billions of webpages quickly? What goes behind recommendations engines? Deep Learning. 8 / 12

Some Problems Contd. How to deal with massive graphs? Many more... 9 / 12

Some Broad Topics. Sketching and Streaming. Hashing and Randomized Algorithms. Optimization for Big-Data. Kernels Features. Submodular Optimization. Recommender Systems. Mining Massive Graphs. Deep Learning. Active Learning and Crowd Sourcing. Online Learning and Multi Arm Bandits. 10 / 12

Textbook No standard Textbook: ML is a fast evolving field. Most topics are still under development. Lecture Scribes, with references, will be made available. You may look at Mining Massive Datasets (online book free) Scaling up Machine Learning: Parallel and Distributed Approaches (Ron Bekkerman et. al.) 11 / 12

Next : Some Probability 12 / 12