Machine Learning in Action. Heli Helskyaho Tulevaisuuden tuotekehitys 2018

Similar documents
Python Machine Learning

Top US Tech Talent for the Top China Tech Company

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Lecture 1: Basic Concepts of Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CSL465/603 - Machine Learning

Laboratorio di Intelligenza Artificiale e Robotica

A Case Study: News Classification Based on Term Frequency

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

CS Machine Learning

EdX Learner s Guide. Release

Education for an Information Age

Word Segmentation of Off-line Handwritten Documents

CS 446: Machine Learning

Research computing Results

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Lecture 1: Machine Learning Basics

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Leveraging MOOCs to bring entrepreneurship and innovation to everyone on campus

1 Use complex features of a word processing application to a given brief. 2 Create a complex document. 3 Collaborate on a complex document.

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Introduction to CS 100 Overview of UK. CS September 2015

Laboratorio di Intelligenza Artificiale e Robotica

Human Emotion Recognition From Speech

Circuit Simulators: A Revolutionary E-Learning Platform

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Reducing Features to Improve Bug Prediction

Feature-oriented vs. Needs-oriented Product Access for Non-Expert Online Shoppers

Evolutive Neural Net Fuzzy Filtering: Basic Description

Speak Up 2012 Grades 9 12

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Open Source Mobile Learning: Mobile Linux Applications By Lee Chao

Axiom 2013 Team Description Paper

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

CNS 18 21th Communications and Networking Simulation Symposium

Best Practices in Internet Ministry Released November 7, 2008

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

Resource Package. Community Action Day

Beveridge Primary School. One to one laptop computer program for 2018

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Outreach Connect User Manual

Speech Emotion Recognition Using Support Vector Machine

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Time series prediction

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Assignment 1: Predicting Amazon Review Ratings

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Applying Fuzzy Rule-Based System on FMEA to Assess the Risks on Project-Based Software Engineering Education

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

(Sub)Gradient Descent

Computer Software Evaluation Form

GRAPHIC DESIGN TECHNOLOGY Associate in Applied Science: 91 Credit Hours

Learning Methods in Multilingual Speech Recognition

Reinforcement Learning by Comparing Immediate Reward

Master s Programme in Computer, Communication and Information Sciences, Study guide , ELEC Majors

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

1 Instructional Design Website: Making instruction easy for HCPS Teachers Henrico County, Virginia

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Strengthening assessment integrity of online exams through remote invigilation

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Modeling user preferences and norms in context-aware systems

K 1 2 K 1 2. Iron Mountain Public Schools Standards (modified METS) Checklist by Grade Level Page 1 of 11

Model Ensemble for Click Prediction in Bing Search Ads

Nearing Completion of Prototype 1: Discovery

Seminar - Organic Computing

Radius STEM Readiness TM

THE WEB 2.0 AS A PLATFORM FOR THE ACQUISITION OF SKILLS, IMPROVE ACADEMIC PERFORMANCE AND DESIGNER CAREER PROMOTION IN THE UNIVERSITY

Applications of data mining algorithms to analysis of medical data

Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Five Challenges for the Collaborative Classroom and How to Solve Them

AC : DESIGNING AN UNDERGRADUATE ROBOTICS ENGINEERING CURRICULUM: UNIFIED ROBOTICS I AND II

The Wegwiezer. A case study on using video conferencing in a rural area

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Exposé for a Master s Thesis

Implementing a tool to Support KAOS-Beta Process Model Using EPF

Australian Journal of Basic and Applied Sciences

Houghton Mifflin Online Assessment System Walkthrough Guide

Welcome to. ECML/PKDD 2004 Community meeting

Data Fusion Models in WSNs: Comparison and Analysis

Building Community Online

STUDENT MOODLE ORIENTATION

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Mining Association Rules in Student s Assessment Data

95723 Managing Disruptive Technologies

Android App Development for Beginners

Journal title ISSN Full text from

Transcription:

Machine Learning in Action Heli Helskyaho Tulevaisuuden tuotekehitys 2018

Introduction, Heli Graduated from University of Helsinki (Master of Science, computer science), currently a doctoral student, researcher and lecturer (databases, Big Data, Multi-model Databases, methods and tools for utilizing semi-structured data for decision making) at University of Helsinki Worked for IT since 1990 Data and Database! CEO for Miracle Finland Oy Oracle ACE Director Ambassador for EOUC (EMEA Oracle Users Group Community) Listed as one of the TOP 100 influences on IT sector in Finland (2015, 2016, 2017) Public speaker and an author Author of the book Oracle SQL Developer Data Modeler for Database Design Mastery (Oracle Press, 2015), co-author for Real World SQL and PL/SQL: Advice from the Experts (Oracle Press, 2016)

What is Machine Learning? An important part of Artificial Intelligence (AI) Machine learning (ML) teaches computers to learn from experience (algorithms) Learn from data and make predictions Mathematics, statistics, field of study that gives computers the ability to learn without being explicitly programmed -- Arthur Samuel, 1959 A systematic study of algorithms and systems that improve their knowledge or performance with experience

Why ML? Why now? Improved technology The price for storage solutions An environment that NEEDS ML and is finally able to really use it Artificial Intelligence (AI) BIG DATA

What is Big Data? There is no size that makes a data to be Big Data, it always depends on the capabilities The data is Big when traditional processing with traditional tools is not possible due to the amount or the complexity of the data You cannot open an attachement in email You cannot edit a photo etc.

The three V s Volume, the size/scale of the data Velocity, the speed of change, analysis of streaming data Variety, different formats of data sources, different forms of data; structured, semi-structured, unstructured

The other V s Veracity, the uncertainty of the data, the data is worthless or harmful if it s not accurate Viability, validate that hypothesis before taking further action (and, in the process of determining the viability of a variable, we can expand our view to determine other variables) Value, the potential value Variability, refers to data whose meaning is constantly changing, in consistency of data; for example words and context Visualization, a way of presenting the data in a manner that s readable and accessible

Challenges in Big Data More and more data (volume) Different data models and formats (variety) Loading in progress while data exploration going on (velocity) Not all data is reliable (veracity) We do not know what we are looking for (value, viability, variability) Must support also non-technical users (journalists, investors, politicians, ) (visualization) All must be done efficiently and fast and as much as possibly by machines

When to use ML? You have data! ML cannot be performed without data part of the data for finding the model, part to prove it (not all for finding the model!) Rules and equations are Complex (image recognition) Constantly changing (fraud detection) The nature of the data changes and the program must adapt (today s spam is tomorrow s ham) (predicting shopping trends)

The Task A Task The problem to be solved with ML It is very important to define the Task well Machine learning is not only a computational subject, the practical side is very important

It s all about Algorithms An Algorithm the experience for the computer to learn with, solves the learning problem Humans learn with experience, machines learn with algorithms It is not easy to find the right Algorithm for the Task usually try with several algorithms to find the best one selecting an algorithm is a process of trial and error

The Model A Model The output of ML The Task is Addressed by Models Different Models: Predictive model forecast what might happen in the future Descriptive model what happened Prescriptive model recommending one or more courses of action and showing the likely outcome of each decision

Features Features/Dimensions an individual measurable property or characteristic of a phenomenon being observed (Christopher Bishop (2006), Pattern recognition and machine learning) Deriving features (feature engineering, feature extraction) is one of the most important parts of machine learning. It turns data into information that a machine learning algorithm can use. A Model is only as good as its Features

ML in short Use the right Features with right Algorithms to build the right Models that archieve the right Tasks

Two types of Methods (techniques) Unsupervised learning finds hidden patterns or intrinsic structures in input data Supervised learning trains a model on known input and output data to predict future outputs

Unsupervised Learning Learning from unlabeled input data by finding hidden patterns or intrinsic structures in that data used typically when you don t have a specific goal are not sure what information the data contains want to reduce the features of your data as a preprocessing for supervised learning

Data for Unsupervised Learning

Clustering Clustering is the most common method for unsupervised learning and used for exploratory data analysis to find hidden patterns or groupings in data. Clustering algorithms Hard clustering each data point belongs to only one cluster Soft clustering each data point can belong to more than one cluster

Supervised Learning Learning from known, labelled data Training a model on known input and output data to predict future outputs (remember that uncertainty is always involved)

Data for Supervised Learning

A process of supervised learning 1/2 1. Train 1. Load data 2. Pre-process data 3. Learn using a method and an algorithm 4. Create a model iterate until you find the best model

A process of supervised learning 2/2 2. Predict (use the model with new data) 1. New data 2. Pre-process data 3. Use the model 4. Get predictions 5. Integrate the models into applications

Supervised Learning, methods/techniques Predictive models Classification Regression

Supervised Learning, Classification Classification models are trained to classify data into categories. They predict discrete responses an email is genuine or spam a tumor is small, medium size, or large a tumor is cancerous or benign a person is creditworthy or not For example applications like medical imaging, speech recognition, and credit scoring

Supervised Learning, Regression To predict continuous responses changes in temperature fluctuations in electricity demand For example applications like forecasting stock prices, handwriting recognition, acoustic signal processing, failure prediction in hardware, and electricity load forecasting.

Educated guess! ML always gives an approximated answer Some are better than others, some are useful several models, choose the best, but still: all approximations! There is no correct answer

Real life use cases for ML Spam filters Log filters (and alarms) Data analytics Image recognition Speech recognition Medical diagnosis Robotics Fraud protection

A simple example, Chatbot Demo

Real life use cases for ML Online shopping (Amazon, Search, recommendations) Voice-to-Text, Smart Personal Assistants (mobile services: recipe for bread, find the nearest grocery ) Siri, Google Assistant, Alexa, Echo, Cortana, Facebook

Example Facebook, References Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, James Law, Kevin Lee, Jason Lu, Pieter Noordhuis, Misha Smelyanskiy, Liang Xiong, Xiaodong Wang, Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective, Facebook, Inc. X. He, J. Pan, O. Jun, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, and J. Quinonero Candela, Practical lessons from predicting clicks on ads at facebook, in Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ser. ADKDD 14. New York, NY, USA: ACM, 2014, pp. 5:1 5:9. J. Dunn, Introducing FBLearner flow: Facebook s AI backbone, May 2016, https://fb.me/dunn 2016. https://code.facebook.com/posts/1072626246134461/introducing-fblearner-flow-facebook-s-ai-backbone/

Facebook s mission Give people the power to build community and bring the world closer together. Facebook connects more than two billion people as of December 2017 Could not be done without ML The massive amount of data required by machine learning services presents challenges to Facebook s datacenters. Several techniques are used to efficiently feed data to the models including decoupling of data feed and training, data/compute co-location, and networking optimizations. Disaster recovery planning is essential actively evaluating and prototyping new hardware solutions while remaining cognizant of game changing algorithmic innovations

Facebook, some use cases for ML, the Products News Feed ranking Ads Search Sigma Lumos Facer Language Translation Speech Recognition

News Feed ML is used for ranking and personalizing News Feed stories filtering out offensive content highlighting trending topics ranking search results, and much more. General models are trained to determine various user and environmental factors that should ultimately determine the rank order of content. The model is used to generate a personalized set of the best posts, images, and other content to display from thousands of candidates, and the best ordering of this chosen content.

Ads Online advertising allows advertisers to only bid and pay for measurable user responses, such as clicks on ads. As a consequence, click prediction systems are central to most online advertising systems. General Ads models are trained to learn how user traits, user context, previous interactions, and advertisement attributes can be most predictive of the likelihood of clicking on an ad, visiting a website, and/or purchasing a product. Inputs are run through a trained model to immediately determine which ads to display to a particular Facebook user.

Predicting the Clicks The click prediction system needs to be robust and adaptive, and capable of learning from massive volumes of data. At Facebook they use a model which combines decision trees with logistic regression Based on their experience: the most important thing is to have the right features (those capturing historical information about the user or ad dominate other types of features) and the right model Measures: the accuracy of prediction

Search Launches a series of distinct and specialized sub-searches to the various verticals, e.g., videos, photos, people, events, etc. A classifier layer is run atop the various search verticals to predict which of the many verticals to search (searching all possible verticals would be inefficient) The classifier and these search verticals consist of an offline stage to train the models and an online stage to run the models and perform the classification and search

Sigma General classification and anomaly detection framework that is used for a variety of internal applications (site integrity, spam detection, payments, registration, unauthorized employee access, and event recommendations) Sigma includes hundreds of distinct models running in production everyday each model is trained to detect anomalies (e.g. classify content)

Lumos Extract high-level attributes and embeddings from an image and its content That data can be used as input to other products and services for example as it were text.

Facer Facebook s face detection and recognition framework Given an image finds all of the faces in that image runs a user-specific facial-recognition algorithm to determine the likelihood of that face belonging to one of your top-n friends who have enabled face recognition This allows Facebook to suggest which of your friends you might want to tag within the photos you upload.

Language Translation Service that manages internationalization of Facebook content Supports translations for more than 45 languages (as the source or target language) supports more than 2000 translation directions serves 4.5B translated post impressions every day Each language pair direction has its own model multi-language models are being considered

Speech Recognition Converts audio streams into text Provides automated captioning for video Most streams are English language other languages will be available in future Additionally, non-language audio events are also detected with a similar system (simpler model).

How do they do all this at Facebook?

FBLearner Platform

The success factors success is predicated on the availability of extensive, high-quality data complex preprocessing logic is applied to ensure that data is cleaned and normalized to allow efficient transfer and easy learning The ability to rapidly process and feed these data to the training machines is important for ensuring that we have fast and efficient offline training. These impose very high resource requirement especially on storage, network, and CPU. actively evaluating and prototyping new hardware solutions while remaining cognizant of game changing algorithmic innovations Knowing what to measure to know what to improve

Facebook we noticed that the largest improvements in accuracy often came from quick experiments, feature engineering, and model tuning rather than applying fundamentally different algorithms An engineer may need to attempt hundreds of experiments before finding a successful new feature or set of hyperparameters.

Oracle SQL Developer demo

Oracle SQL Developer, Data Miner Oracle SQL Developer is a free tool from Oracle Has an add-on called Data Miner Advanced analytics (Data Miner uses that) is a licensed product (in the EE database separately licensed, in the Cloud: Database Service either High Performace Package or Extreme Performance Package) Oracle Data Miner GUI Installation Instructions http://www.oracle.com/technetwork/database/options/advancedanalytics/odm/odmrinstallation-2080768.html Tutorial http://www.oracle.com/webfolder/technetwork/tutorials/obe/db/12c/bigdatadm/ ODM12c-BDL4.html

Chapter 10

And so many more languages to learn Python C/C++ Java JavaScript Julia, Scala, Ruby, Octave, MATLAB, SAS https://medium.com/towards-data-science/what-is-the-best-programminglanguage-for-machine-learning-a745c156d6b7

The future and now! AI and machine learning is here and it s the future These skills are valuable to both YOU and your business

Conclusion Several V s related to Big Data Volume Velocity Variety Veracity Viability Value Variability Visualization

Conclusion ML can be used everywhere : Spam filters Log filters (and alarms) Data analytics Image recognition Speech recognition Medical diagnosis Robotics Chatbots

Conclusion Facebook uses ML everywhere News Feed ranking Ads Search Sigma Lumos Facer Language Translation Speech Recognition

Conclusion You could use it everywhere

THANK YOU! QUESTIONS? Email: heli@miracleoy.fi Twitter: @HeliFromFinland Blog: Helifromfinland.com