Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Similar documents
Moderator: Gary Weckman Ohio University USA

Lecture 10: Reinforcement Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

What is a Mental Model?

Axiom 2013 Team Description Paper

Red Flags of Conflict

An Introduction to Simio for Beginners

Lecture 1: Machine Learning Basics

How to get the most out of EuroSTAR 2013

Major Milestones, Team Activities, and Individual Deliverables

BMBF Project ROBUKOM: Robust Communication Networks

UNDERSTANDING DECISION-MAKING IN RUGBY By. Dave Hadfield Sport Psychologist & Coaching Consultant Wellington and Hurricanes Rugby.

Artificial Neural Networks written examination

Generative models and adversarial training

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Generating Test Cases From Use Cases

Fundraising 101 Introduction to Autism Speaks. An Orientation for New Hires

Tutoring First-Year Writing Students at UNM

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

Intelligent Agents. Chapter 2. Chapter 2 1

MYCIN. The MYCIN Task

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Test Effort Estimation Using Neural Network

4-3 Basic Skills and Concepts

Word Segmentation of Off-line Handwritten Documents

Thesis-Proposal Outline/Template

IBM Software Group. Mastering Requirements Management with Use Cases Module 6: Define the System

The Nature of Exploratory Testing

SELF-STUDY QUESTIONNAIRE FOR REVIEW of the COMPUTER SCIENCE PROGRAM

Getting Started with Deliberate Practice

Empowering Public Education Through Online Learning

Speech Emotion Recognition Using Support Vector Machine

Hentai High School A Game Guide

Two Futures of Software Testing

Machine Learning and Development Policy

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Reinforcement Learning by Comparing Immediate Reward

The Strong Minimalist Thesis and Bounded Optimality

Python Machine Learning

Top Ten Persuasive Strategies Used on the Web - Cathy SooHoo, 5/17/01

Day 1 Note Catcher. Use this page to capture anything you d like to remember. May Public Consulting Group. All rights reserved.

Managerial Decision Making

What s in Your Communication Toolbox? COMMUNICATION TOOLBOX. verse clinical scenarios to bolster clinical outcomes: 1

Human Emotion Recognition From Speech

LEGO MINDSTORMS Education EV3 Coding Activities

Seminar - Organic Computing

Paper Reference. Edexcel GCSE Mathematics (Linear) 1380 Paper 1 (Non-Calculator) Foundation Tier. Monday 6 June 2011 Afternoon Time: 1 hour 30 minutes

Laboratorio di Intelligenza Artificiale e Robotica

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Go fishing! Responsibility judgments when cooperation breaks down

TOURISM ECONOMICS AND POLICY (ASPECTS OF TOURISM) BY LARRY DWYER, PETER FORSYTH, WAYNE DWYER

Welcome to ACT Brain Boot Camp

TEACH WRITING WITH TECHNOLOGY

Introduction on Lean, six sigma and Lean game. Remco Paulussen, Statistics Netherlands Anne S. Trolie, Statistics Norway

Itely,Newzeland,singapor etc. A quality investigation known as QualityLogic history homework help online that 35 of used printers cartridges break

Meeting Agenda for 9/6

Context Free Grammars. Many slides from Michael Collins

DegreeWorks Advisor Reference Guide

This curriculum is brought to you by the National Officer Team.

CS 446: Machine Learning

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Visit us at:

Part I. Figuring out how English works

The Evolution of Random Phenomena

PRESENTED BY EDLY: FOR THE LOVE OF ABILITY

Helping at Home ~ Supporting your child s learning!

Team Dispersal. Some shaping ideas

END TIMES Series Overview for Leaders

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

How To Take Control In Your Classroom And Put An End To Constant Fights And Arguments

Telekooperation Seminar

TIPS FOR SUCCESSFUL PRACTICE OF SIMULATION

A Case Study: News Classification Based on Term Frequency

File # for photo

Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I

Chapter 2. Intelligent Agents. Outline. Agents and environments. Rationality. PEAS (Performance measure, Environment, Actuators, Sensors)

Introduction to Simulation

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

A Reinforcement Learning Variant for Control Scheduling

IMPROVE THE QUALITY OF WELDING

University of Toronto Physics Practicals. University of Toronto Physics Practicals. University of Toronto Physics Practicals

No Parent Left Behind

How to make an A in Physics 101/102. Submitted by students who earned an A in PHYS 101 and PHYS 102.

Computer Software Evaluation Form

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are:

Agents and environments. Intelligent Agents. Reminders. Vacuum-cleaner world. Outline. A vacuum-cleaner agent. Chapter 2 Actuators

AN EXAMPLE OF THE GOMORY CUTTING PLANE ALGORITHM. max z = 3x 1 + 4x 2. 3x 1 x x x x N 2

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Software Maintenance

Practice Examination IREB

TUESDAYS/THURSDAYS, NOV. 11, 2014-FEB. 12, 2015 x COURSE NUMBER 6520 (1)

Infrared Paper Dryer Control Scheme

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

On the Combined Behavior of Autonomous Resource Management Agents

Transcription:

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Testing a Moving Target How Do We Test Machine Learning Systems? Peter Varhol, Technology Strategy Research

About Me International speaker and writer Degrees in Math, CS, Psychology Evangelist at Dynatrace Former university professor, tech journalist

Gerie Owen www.gerieowen.com gerie.owen@gerieowen.com Test Manager, Tester and as such experienced bug finder and bug misser Subject expert on testing for TechTarget s SearchSoftwareQuality.com International and Domestic Conference Presenter Marathon Runner & Running Coach Cat Mom 4

What You Will Learn What kind of systems produce nondeterministic results Why we can t test these systems using traditional techniques How we can assess, measure, and communicate quality with learning and adaptive systems

Agenda What are machine learning and adaptive systems? How are these systems evaluated? Challenges in testing these systems What constitutes a bug? Summary and conclusions

We Think We Know Testing We test deterministic systems For a given input, the output is always the same And we know what the output is supposed to be If the output is something else We may have a bug We know nothing

Machine Learning and Adaptive Systems We are now building a different kind of software It never returns the same result That doesn t make it wrong How can we assess the quality? How do we know if there is a bug?

How Does This Happen? The problem domain is ambiguous There is no single right answer Close enough is good We don t know quite why the software responds as it does We can t easily trace code paths

What Technologies Are Involved? Neural networks Genetic algorithms Rules engines Feedback mechanisms Sometimes hardware

Neural Networks Set of layered algorithms whose variables can be adjusted via a learning process The learning process involves training with known inputs and outputs The algorithms adjust coefficients to converge on the correct answer (or not) You freeze the algorithms and coefficients, and deploy

A Sample Neural Network

Genetic Algorithms Use the principle of natural selection Create a range of possible solutions Try out each of them Choose and combine two of the better alternatives Rinse and repeat as necessary

Rules Engines Layers of if-then rules, with likelihoods associated With complex inputs, the results can be different Determining what rules/probabilities should be changed is almost impossible How do we measure quality?

How Are These Systems Used? Transportation Self-driving cars Aircraft Ecommerce Recommendation engines Finance Stock trading systems

A Practical Example Electric wind sensor Determines wind speed and direction Based on the cooling of filaments Several hundred data points of known results Designed a three-layer neural network Then used the known data to train it

Another Practical Example Retail recommendation engines Other people bought this You may also be interested in that They don t have to be perfect But they can bring in additional revenue

Challenges to Validating Requirements What does it mean to be correct? The result will be different every time There is no one single right answer How will this really work in production? How do I test it at all?

Possible Answers Only look at outputs for given inputs And set accuracy parameters Don t look at the outputs at all Focus on performance/usability/other features We can t test accuracy Throw up our hands and go home

Testing Machine Learning Systems Have objective acceptance criteria Test with new data Don t count on all results being accurate Understand the architecture of the network as a part of the testing process Communicate the level of confidence you have in the results to management and users

What About Adaptive Systems? Adaptive systems are very similar to machine learning The problems solved are slightly different Neural algorithms are used, and trained But the algorithms aren t frozen in production

Machine Learning and Adaptive Systems These are two different things Machine learning systems get training, but are static after deployment Adaptive systems continue to adapt in production They dynamically optimize They require feedback

Adaptive Systems Airline pricing Ticket prices change three times a day based on demand It can cost less to go farther It can cost less later Ecommerce systems Recommendations try to discern what else you might want Can I incentivize you to fill up the plane?

Recommendation Engines Can Be Very Wrong Brooks Ghost running shoes Versus ghost costumes We don t take context into account But do they make money? Well, probably

Considerations for Testing Adaptive Systems You need test scenarios Best case, average case, and worst case You will not reach mathematical optimization Determine what level of outcomes are acceptable for each scenario Defects will be reflected in the inability of the model to achieve goals

What Does Being Correct Mean? Are we making money? Is the adaptive system more efficient? Are recommendations being picked up? Is it worthwhile to test recommendations? How would you score that?

These Are Very Different Measures We have never tested these characteristics before Can we learn? How to we make quality recommendations? Consistency? Value? Does it matter?

Objections I will never encounter this type of application! You might be surprised I will do what I ve always done Um, no you won t My goals will be defined by others Unless they re not You may be the one

How Do We Test These Things? Multiple inputs at one time Inputs may be ambiguous or approximate The output may be different each time Testing accuracy is a fool s game Past data We know how different pricing strategies turned out We made recommendations in the past

What is a Bug? A mismatch between inputs and outputs? It supposed to be that way! Not every recommendation will be a good one But that doesn t mean it s a bug Too many wrong answers Define too many

We Found a Bug, Now What? The bug could be unrelated to the neural network Treat it as a normal bug If the neural network is involved Determine a definition of inaccurate Determine the likelihood of an inaccurate answer This may involve serious redevelopment

Conclusions We have little experience with learning and adaptive systems Requirements have to be very different We need to understand the difference between correct and accurate We need objective requirements And the ability to measure them And the ability to communicate what they mean

Thank You Peter Varhol Dynatrace LLC peter.varhol@dynatrace.com