What is Data Science?

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Lecture 1: Machine Learning Basics

(Sub)Gradient Descent

Generative models and adversarial training

CSL465/603 - Machine Learning

Python Machine Learning

Assignment 1: Predicting Amazon Review Ratings

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Axiom 2013 Team Description Paper

arxiv: v1 [cs.lg] 15 Jun 2015

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Lecture 1: Basic Concepts of Machine Learning

Sociology 521: Social Statistics and Quantitative Methods I Spring 2013 Mondays 2 5pm Kap 305 Computer Lab. Course Website

Probabilistic Latent Semantic Analysis

Learning From the Past with Experiment Databases

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

CS Machine Learning

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Undergraduate Program Guide. Bachelor of Science. Computer Science DEPARTMENT OF COMPUTER SCIENCE and ENGINEERING

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Time series prediction

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Top US Tech Talent for the Top China Tech Company

The Good Judgment Project: A large scale test of different methods of combining expert predictions

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

On-Line Data Analytics

Learning Methods for Fuzzy Systems

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

A cognitive perspective on pair programming

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Seminar - Organic Computing

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

TD(λ) and Q-Learning Based Ludo Players

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Math Placement at Paci c Lutheran University

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.

Georgetown University at TREC 2017 Dynamic Domain Track

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

Probability and Statistics Curriculum Pacing Guide

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

A study of speaker adaptation for DNN-based speech synthesis

Knowledge Transfer in Deep Convolutional Neural Nets

5 Guidelines for Learning to Spell

Navigating the PhD Options in CMS

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Automating the E-learning Personalization

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called

arxiv: v1 [cs.cv] 10 May 2017

Clouds = Heavy Sidewalk = Wet. davinci V2.1 alpha3

Digital Fabrication and Aunt Sarah: Enabling Quadratic Explorations via Technology. Michael L. Connell University of Houston - Downtown

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Computational Data Analysis Techniques In Economics And Finance

arxiv: v2 [cs.cl] 26 Mar 2015

Deep Facial Action Unit Recognition from Partially Labeled Data

Presentation skills. Bojan Jovanoski, project assistant. University Skopje Business Start-up Centre

Extending Place Value with Whole Numbers to 1,000,000

The Strong Minimalist Thesis and Bounded Optimality

Laboratorio di Intelligenza Artificiale e Robotica

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

AI Agent for Ice Hockey Atari 2600

Missouri Mathematics Grade-Level Expectations

Infrastructure Issues Related to Theory of Computing Research. Faith Fich, University of Toronto

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

MGT/MGP/MGB 261: Investment Analysis

WHEN THERE IS A mismatch between the acoustic

Data Fusion Through Statistical Matching

Full text of O L O W Science As Inquiry conference. Science as Inquiry

AN EXAMPLE OF THE GOMORY CUTTING PLANE ALGORITHM. max z = 3x 1 + 4x 2. 3x 1 x x x x N 2

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

Evaluation of Teach For America:

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

INPE São José dos Campos

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

COVER SHEET. This is the author version of article published as:

Lecture 10: Reinforcement Learning

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Review: Speech Recognition with Deep Learning Methods

A Case Study: News Classification Based on Term Frequency

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Rule Learning With Negation: Issues Regarding Effectiveness

Mathematics subject curriculum

Transcription:

What is Data Science? Peter Diao, SAMSI Field of Dreams 2017 November 4, 2017

Two Ways to Dene a Field 1 A mathematician, like a painter or a poet, is a maker of patterns. If his patterns are more permanent than theirs, it is because they are made with ideas. - Hardy, English Mathematician, 1877-1947

Two Ways to Dene a Field 1 A mathematician, like a painter or a poet, is a maker of patterns. If his patterns are more permanent than theirs, it is because they are made with ideas. - Hardy, English Mathematician, 1877-1947 2 Mathematics is what mathematicians happen to be studying.

Data Science as a term is getting very popular

Data Science Outpaces Data

Science is not looking too good

Driven by Desire to Capitalize on Growth in Data Sets

Data Science as a term is getting very popular

Data Scientist as Job

What do they do? Image taken from R for Data Science by Grolemund and Wickham (free introduction to practical data science skills!) Your undergraduate days are a perfect time to acquire such practical skills. Could be helpful for employment and also very handy for analysis of scientic data.

80% of the time spent Importing and Tidying Data From OpenIntro Statistics by Diez, Barr, Cetinkaya-Rundel. Columns: variables or features; Rows: cases or examples

Visualizing From OpenIntro Statistics by Diez, Barr, Cetinkaya-Rundel. Scatterplots still the best for visualizing relationships.

Model: Mathematical Relationships The most famous is simple linear regression, in which we try to nd the line y = b 0 + b 1 x that minimizes the sum of the squared errors for the data we are trying to t.

A Log Transformation was needed here

Communicate Take a look at this famous visualization of Gapminder. What transformation did he use on the x-axis and how does it change the story?

So Far Employers looking for: coding skills, math skills, hacking together solutions skills

What is Data Science? Using data to solve a problem. 1 Using website trac data to design a better website. 2 Using data on social network users to suggest contacts. 3 Using mobile phone data to track the formation of urban slums in developing countries. 4 Using text mining and sentiment analysis to see how the public feels about a stock in order to trade stocks. 5 Using a database of high level go play in order to make a machine capable of beating the world's best go players. 6 Using facial recognition software to identify individuals in order to pay for things. 7 Using ratings for previously seen movies to make suggestions for movies a person may like. 8 Using voice data to compile a national articial intelligence to identify individuals by their voice. 9 Using brain activitity patterns to identify interesting components of the brain that function together.

Simple Linear Regression Given nite data set: (x i, y i ) n i=1. Find b 0 and b 1 so that L(b 0, b 1 ) := n i=1 (y i b 1 x i b 0 ) 2 is minimized. Notice that L is a convex function. Therefore it has a unique minimum.

Optimization as main tool! Using the gradient, which is a generalization of the derivative to multiple dimensions, we can nd a way to descend on the surface step by step. Take Multivariable Calculus! Since our loss function L(b 0, b 1 ) is convex, we will eventually reach the line of best t. Take Convex Optimization!

Stereotypical Prediction 1 The variable you want predicted Y (say the price of Tesla stock tomorrow). 2 The features used to predict X 1, X 2,..., X k (say the weather, the stock prices of a 100 dierent related stocks on the previous day, etc.) 3 The form of the prediction function and the parameters dening them F θ : X 1 X 2 X n Y (this varies for every kind of prediction strategy). 4 Large quantities of training data. 5 A loss function based on the data L(θ), which we are trying to minimize in order to nd the best F θ. 6 An optimization algorithm for minimizing L(θ). 7 Validating the function on test data.

Everything is a Long Vector How to teach a robot to be able to recognize images as either a cat or a non-cat? This sounds like a biology problem. How can we formulate this as a mathematics problem?

Everything is a Long Vector How to teach a robot to be able to recognize images as either a cat or a non-cat? This sounds like a biology problem. How can we formulate this as a mathematics problem? R 3 1000 1000 is a space of 1000 by 1000 rgb images

Everything is a Long Vector How to teach a robot to be able to recognize images as either a cat or a non-cat? This sounds like a biology problem. How can we formulate this as a mathematics problem? R 3 1000 1000 is a space of 1000 by 1000 rgb images C R 3 1000 1000 is the cat subset.

Everything is a Long Vector How to teach a robot to be able to recognize images as either a cat or a non-cat? This sounds like a biology problem. How can we formulate this as a mathematics problem? R 3 1000 1000 is a space of 1000 by 1000 rgb images C R 3 1000 1000 is the cat subset. Try to learn the classier function f C : R 3000000 {1, 1} so that f C (x) = 1 x C.

Everything is a Long Vector How to teach a robot to be able to recognize images as either a cat or a non-cat? This sounds like a biology problem. How can we formulate this as a mathematics problem? R 3 1000 1000 is a space of 1000 by 1000 rgb images C R 3 1000 1000 is the cat subset. Try to learn the classier function f C : R 3000000 {1, 1} so that f C (x) = 1 x C. Let us play in a playground: playground.tensorflow.org/

Everything is a Long Vector How to teach a robot to be able to recognize images as either a cat or a non-cat? This sounds like a biology problem. How can we formulate this as a mathematics problem? R 3 1000 1000 is a space of 1000 by 1000 rgb images C R 3 1000 1000 is the cat subset. Try to learn the classier function f C : R 3000000 {1, 1} so that f C (x) = 1 x C. Let us play in a playground: playground.tensorflow.org/ Take Linear Algebra!

Many Dierent Kinds of Classiers Out There Helpful examples at http://scikit-learn.org/stable/index.html Learn scikit-learn package of Python!

Amazing Idea: Learning the Predictors X 1,..., X k Say we want to classify 32 32 faces. That means 1024 features or dimensions. Hard problem! Curse of dimensionality.

Amazing Idea: Learning the Predictors X 1,..., X k Dimension Reduction or Representation Learning Take Linear Algebra! Mattias Scholz PhD Thesis 2006

Amazing Idea: Learning the Predictors X 1,..., X k k Eigenfaces

Representation Learning + Prediction Now we can classify faces: Raw images to Eigenface basis coordinates to Prediction R 32 32 X 1... X k Y We learn the feature representation F : R 32 32 X 1... X k rst. Then we learn classier X 1... X k Y.

Several Layers of Feature Representations Deep Learning From Szegedy et al. 2015. We don't really understand why it works, it is very hard to analyze non-convex heuristic optimization.

Power of Representation Learning Vision: ImageNet classication with deep convolutional neural networks (2012), A. Krizhevsky et al. Language: Ecient estimation of word representations in vector space (2013), T. Mikolov et al Decision Making: Mastering the game of Go with deep neural networks and tree search (2016), D. Silver et al. The Representation can be reused for dierent tasks: CNN features o-the-shelf: An astounding baseline for recognition (2014), A. Razavian et al. Unsupervised: Unsupervised representation learning with deep convolutional generative adversarial networks (2015), A. Radford et al. Art of Optimization: Training very deep networks (2015), R. Srivastava et al.

Obligatory Slide on Big Data" How many images do you think we have?

Obligatory Slide on Big Data" How many images do you think we have? 7 billion people, 3 billion people with smartphones, 1 picture a day = approximately 1 trillion pictures a year

Obligatory Slide on Big Data" How many images do you think we have? 7 billion people, 3 billion people with smartphones, 1 picture a day = approximately 1 trillion pictures a year Some claim that more data was generated in the last 2 years than the rest of the history of mankind.

Obligatory Slide on Big Data" How many images do you think we have? 7 billion people, 3 billion people with smartphones, 1 picture a day = approximately 1 trillion pictures a year Some claim that more data was generated in the last 2 years than the rest of the history of mankind. In comparison: there are around 3 billion seconds in a 100 year lifetime.

Obligatory Slide on Big Data" How many images do you think we have? 7 billion people, 3 billion people with smartphones, 1 picture a day = approximately 1 trillion pictures a year Some claim that more data was generated in the last 2 years than the rest of the history of mankind. In comparison: there are around 3 billion seconds in a 100 year lifetime. Such deep representations can only be learned with such large data sets and massive computers (industry is outpacing academia).

Obligatory Slide on Big Data" How many images do you think we have? 7 billion people, 3 billion people with smartphones, 1 picture a day = approximately 1 trillion pictures a year Some claim that more data was generated in the last 2 years than the rest of the history of mankind. In comparison: there are around 3 billion seconds in a 100 year lifetime. Such deep representations can only be learned with such large data sets and massive computers (industry is outpacing academia). If error = bias + variance, then we want a large and exible class of functions so that bias is small since large enough data can control variance.

Big Data and Mathematics Major technological advance of the last half century is information technology.

Big Data and Mathematics Major technological advance of the last half century is information technology. The result is Big Data.

Big Data and Mathematics Major technological advance of the last half century is information technology. The result is Big Data. Today, big data provides an opportunity to create AI; understand life and the mind; lay new foundations for computational sciences.

Big Data and Mathematics Major technological advance of the last half century is information technology. The result is Big Data. Today, big data provides an opportunity to create AI; understand life and the mind; lay new foundations for computational sciences. For mathematicians, it is a chance to make discoveries on the order of the formulation of probability theory or calculus.

Have fun!