Machine Learning & Non-Parametric Methods for Cost Analysis

Similar documents
Lecture 1: Basic Concepts of Machine Learning

Python Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CS Machine Learning

A Case Study: News Classification Based on Term Frequency

Probabilistic Latent Semantic Analysis

An Introduction to Simio for Beginners

Implementing a tool to Support KAOS-Beta Process Model Using EPF

Software Maintenance

(Sub)Gradient Descent

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Learning From the Past with Experiment Databases

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

Circuit Simulators: A Revolutionary E-Learning Platform

Human Emotion Recognition From Speech

Seminar - Organic Computing

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Assignment 1: Predicting Amazon Review Ratings

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Artificial Neural Networks written examination

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Reducing Features to Improve Bug Prediction

LEGO MINDSTORMS Education EV3 Coding Activities

Axiom 2013 Team Description Paper

CSL465/603 - Machine Learning

Modeling user preferences and norms in context-aware systems

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Lecture 1: Machine Learning Basics

Linking Task: Identifying authors and book titles in verbose queries

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Word Segmentation of Off-line Handwritten Documents

Universidade do Minho Escola de Engenharia

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Generative models and adversarial training

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Time series prediction

Kristin Moser. Sherry Woosley, Ph.D. University of Northern Iowa EBI

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

The open source development model has unique characteristics that make it in some

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

What is a Mental Model?

M55205-Mastering Microsoft Project 2016

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Multivariate k-nearest Neighbor Regression for Time Series data -

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Automating the E-learning Personalization

Reinforcement Learning by Comparing Immediate Reward

Robot manipulations and development of spatial imagery

Radius STEM Readiness TM

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Mining Association Rules in Student s Assessment Data

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Switchboard Language Model Improvement with Conversational Data from Gigaword

Rule Learning With Negation: Issues Regarding Effectiveness

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

A Reinforcement Learning Variant for Control Scheduling

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Education for an Information Age

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

A Pipelined Approach for Iterative Software Process Model

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

ME 443/643 Design Techniques in Mechanical Engineering. Lecture 1: Introduction

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Computerized Adaptive Psychological Testing A Personalisation Perspective

Top US Tech Talent for the Top China Tech Company

Self Study Report Computer Science

Introduction to Modeling and Simulation. Conceptual Modeling. OSMAN BALCI Professor

An OO Framework for building Intelligence and Learning properties in Software Agents

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

The Enterprise Knowledge Portal: The Concept

CREATING SHARABLE LEARNING OBJECTS FROM EXISTING DIGITAL COURSE CONTENT

Learning Methods for Fuzzy Systems

What can I learn from worms?

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Speech Emotion Recognition Using Support Vector Machine

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

GACE Computer Science Assessment Test at a Glance

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Introduction to Simulation

Examining the Structure of a Multidisciplinary Engineering Capstone Design Program

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Student User s Guide to the Project Integration Management Simulation. Based on the PMBOK Guide - 5 th edition

TotalLMS. Getting Started with SumTotal: Learner Mode

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Disciplinary Literacy in Science

How to read a Paper ISMLL. Dr. Josif Grabocka, Carlotta Schatten

Journal title ISSN Full text from

CNS 18 21th Communications and Networking Simulation Symposium

A student diagnosing and evaluation system for laboratory-based academic exercises

Transcription:

Engineering, Test & Technology Boeing Research & Technology Machine Learning & Non-Parametric Methods for Cost Analysis Karen Mourikas, Nile Hanov, Joe King, Denise Nelson ICEAA Workshop, June 2018

Machine Learning Approach to Cost Analysis Machine Learning in General ML* Algorithms for Cost Analysis ML Applications related to Cost Random Forest Prediction Latent Semantic Analysis Challenges * ML = Machine Learning 2

Machine Learning Buzz Words Big Data Smart Manufacturing Deep Learning Predictive Analytics Neural Networks Autoencoders NLP (Natural Language Processing) IOT (Internet of Things) Feature Extraction Machine Learning Vocabulary 3

What is Machine Learning? Simply, when a machine mimics "cognitive" functions such as "learning" and "problem solving" * Machine Learning (ML) is a method in which algorithms teach themselves to grow (i.e. learn) from data learn without being explicitly programmed Machine Learning Supervised Task Driven: Regression, Classification Unsupervised Data Driven: Clustering Reinforced Reaction to environment: WarGames Machine Learning is a type of Artificial Intelligence * Russell, Stuart J.; Norvig, Peter ; Artificial Intelligence: A Modern Approach, 2003 & 2009 4

What can Machine Learning do? Speech recognition Autonomous scheduling Financial forecasting Spam filtering Logistics planning VLSI layout Automatic assembly Information extraction Market Share Analysis Route finding Robotics household, surgery, navigation Failure prediction Fraud detection Web search engines Autonomous cars Energy optimization Question answering systems Social network analysis Medical diagnosis, imaging Document summarization Many applications for Machine Learning 5

Why is Machine Learning so popular now? Machine Learning has been around for a long time Has become more popular recently Data Explosion Much more data available for complex analyses Machine Power Moore s Law: faster and cheaper computers Accuracy of Algorithms Reliable enough for usable products The Future is Here 6

How does Machine Learning Work? Typically consists of two stages Training phase Training Data Feature Extraction ML Algorithm Model Testing Phase Test Data Model (from training phase) Prediction General Process 7

Machine Learning Approach to Cost Analysis Machine Learning in General ML* Algorithms for Cost Analysis ML Applications related to Cost Random Forest Prediction Latent Semantic Analysis Challenges * ML = Machine Learning 8

Machine Learning for Cost Prediction & Analysis Typical Cost Prediction Methods Analogies Engineering / Bottoms up Parametric Equations / Top down Machine Learning Alternative to traditional cost estimating Age of Big Data & Messy Data Interactions and non-linear behavior Relationship not well understood nor apparent Relatively quick & easy to implement Could we use Machine Learning techniques for cost prediction? 9

Supervised Algorithms K-Nearest-Neighbors (KNN) Clustering approach Given new features, finds nearest example and return its value Key features Regression and Classification Regression` Classification Fast Classification, Similarity Detection Support Vector Machines (SVM) Clustering approach Finds the widest margin between classes (boundary decisions) Key features Able to separate non-linearly- separable regions Able to find Optimal Solutions 10

Supervised Algorithms Neural Networks (NN) Multi-layer perceptron model Finds weights for inputs that optimize the cost function Key features Very complex shapes/decision boundaries Needs a lot of data Finds patterns in large amounts of data Random Forest Prediction Decision Tree Ensemble Each tree is built from a sample (random) set of features Key features Training set can be small Regression & Classification Classification Handles small n, large p problems 11

Unsupervised Algorithms Boeing Research & Technology Enterprise Initiatives Natural Language Processing - Latent Semantic Analysis (LSA) / Latent Dirichlet Allocation (LDA) Document Clustering Information retrieval in document groups Key features Automatic topic detection Key term discovery Word Clustering Automatic Document Grouping 12

Machine Learning Approach to Cost Analysis Machine Learning in General ML Algorithms for Cost Analysis ML Applications related to Cost Random Forest Prediction Latent Semantic Analysis Challenges * ML = Machine Learning 13

Trees and Forests A Single Decision Tree Represents a set of decisions & outcomes Easily interpretable, but Not a great predictor An Ensemble of trees Many trees (100s) Not as easy to interpret, but Provides greater prediction accuracy & more stability Random Forests Ensemble of decision trees randomly constructed More accurate predictions and reduced error Boeing Research & Technology Enterprise Initiatives Source: Alexas_Fotos/Pixabay Random Forests Prediction based on Decision Tree Theory 14

Why use Random Forest Prediction? Boeing Research & Technology Enterprise Initiatives Advantages Excellent predictors Useful if relationship between inputs and outputs is unclear Captures non-linear and interaction behavior Handles qualitative data as well as missing values Relatively stable due to diversity in trees Can handle small population size with large number of predictors Lower generalization error than other methods Runtime very fast, commercial/open source software available Disadvantages Not so easily interpreted Predicts a numeric value (cost) - Not a parametric equation (CER) Versatile Black-box Approach 15

Application: Logistics Transport Cost Prediction Objective Predict the shipping cost of products to help determine the best locations to manufacture them Analysis Approach 1000 s of data points, messy, missing values, many potential predictors Initial Plan: Multivariate Regression Very cumbersome; required manual partitioning into suitable subsets Chosen method: Random Forest Prediction Limited data prep; automatic partitioning / different perspectives Very easy to implement, execute, and analyze Random Forest Prediction facilitates logistics transport cost analysis 16

Logistics Transport Cost Prediction Model Data Description Consists of 150K data points Automatically separated into two distinct data sets Domestic with ~ 100K data points International with ~ 50K data points Potential Predictors Started with 20 potential predictors Reduced to 3 key predictors Mode of transportation Origin &/or Destination (country/state) Bill weight Random Forest Prediction for Big, Messy Data Getty images credits: Mario Gutiérrez delivery truck; Anucha Sirivisansuwan: barge; hollydc: mailbox; oat autta: cargo truck; JPM: train 17

Analytical Results Goodness of fit Predicted R 2 International: 0.83 Domestic: 0.88 Graphical Interpretations Quickly produce various charts via R Shiny web-based application Select Model Type of Chart Predictor Analysis made easy with R Shiny Package 18

Next Steps: What to do about the Decision makers want to know what s inside What can we do? Compare results to actuals Using excel? Be Careful! Develop Interpretation GUI R-Shiny to peek inside the black box Visualize / Automate standard statistical analyses Ability to play with the model Build algorithm to create a CER From all the trees, branches, values Cost prediction f(tree i ) i.= (1..n) Provide ability to peek into black box 19

Machine Learning Approach to Cost Analysis Machine Learning in General ML Algorithms for Cost Analysis Applications related to Cost Random Forest Prediction Latent Semantic Analysis Challenges * ML = Machine Learning 20

Application: Analysis of Cost Saving Ideas Objective Identify best cost savings ideas to apply to other products Analysis Approach Collaborative workshops to generate ideas to optimize the product 1000 s of ideas in free form text from 100 s of workshops Could any of these ideas be applicable to other products? Natural Language Processing to identify cost-savings ideas for reuse Chosen Methods: Latent Semantic Analysis, Latent Dirichlet Allocation Powerful, well-proven, task-invariant algorithms Framework already in place Open source algorithms Natural Language Processing Analyses highlight ideas for reuse 21

Generalize Cost Savings Ideas via Text Analytics Collaborative Idea Generation Aggregate ideas from 100s of products 1000s Unique Ideas Review Product Detail Generate Ideas: 10s Machine Learning Analysis Identify key terms Key Term Group ideas into topics to generalize results Heat map aligns product ideas to topics Align products to key terms Can we identify & apply Ideas from one product to others? 22

Similarity Matrices to Align Ideas Unstructured Text 100s documents 1000s freeform texts X Y Z X Product X Product Y Y Product Z Z Idea #1 from Product X highly similar to Idea #9 from Product Z Cluster similar ideas from unique products via similarity matrices 23

Text Analytics to Identify Reusable Ideas (1 of 2) Topic cluster Key Terms Latent Semantic Analyses Reduce # of Reduce retention # of Reduce # of retention retention clips for clips installation for clips for of installation wires of installation of wires wires Main Cost- Savings Idea Cluster similar ideas & identify key terms and main concept 24

Text Analytics to Identify Reusable Ideas (2 of 2) Reduce # of retention clips for installation of wires LSA Terms One Term Topic Cluster Products aligned to term Frequency Latent Dirichlet Allocation & Term Frequency Inverse Document Frequency Term frequency ~ importance ~ of idea aligned with product 25

Next Steps Boeing Research & Technology Enterprise Initiatives Validate model and verify results Modify & Implement existing GUI Framework Evaluate results requires thinking! Scale to larger population Hundreds more workshops & products Thousands more ideas Capture and incorporate actuals Implement cost-saving ideas on other products 26

Machine Learning Approach to Cost Analysis Machine Learning in General ML Algorithms for Cost Analysis Applications related to Cost Random Forest Prediction Latent Semantic Analysis Challenges * ML = Machine Learning 27

Challenges for Cost Analysis Community Machine Learning for cost analysis & estimating Different from traditional methods Will take time to catch on Black box method Not so easy to interpret or follow input-to-output logic Regression Algorithms Predict a numeric value (cost) - not a parametric equation (CER) ML Algorithms Require pre and post processing for reasonable results Do Benefits outweigh Challenges? 28

Authors Karen Mourikas is an Associate Technical Fellow at The Boeing Company specializing in Operations Analysis, Affordability, and Systems Optimization. Her current work includes Product Teardown & Should-cost analyses, and Production Systems modeling. Karen has MS degrees in Applied Math and in Operations Research Engineering from the University of Southern California. Karen is a life-time member of ICEAA and has presented at several ICEAA & ISPA/SCEA conferences over the years. Nile Hanov is a Data Scientist at Boeing Research & Technology where he develops novel next gen solutions for commercial and military platforms. In this role, he applies machine learning to event driven data to help organizations better understand and predict failures on board of an aircraft. Nile has four patents under review by the U.S. Patent Office all of which focus on event forecasting and system improvement. He is also currently pursuing a Ph.D. in Computer Science (with a focus on Artificial Intelligence and Machine Learning) at University of California - Irvine. Joseph King is a data scientist at The Boeing Company with Boeing Commercial Airplane Analytics, utilizing data to build predictive models and provide analytical solutions. Joseph has contributed to areas such as sensor data analysis, text mining maintenance messages, and customer behavior modeling. Joseph s education background includes a MS in Business Analytics from the University of Tennessee and a background in mathematics and operations research. Denise Nelson is a Systems Analyst at The Boeing Company specializing in software estimating, costrisk analysis and parametric modeling. Currently, Denise supports Boeing Commercial Airlines Product Development activities. Previous efforts include life-cycle cost analysis; reliability and maintainability analysis; and project management of immersive simulation modeling. Denise graduated from Cal Poly Pomona with an MS in Pure Math and BS in Statistics. karen.mourikas@boeing.com Nile.Hanov@boeing.com joseph.a.king3@boeing.com Denise.J.Nelson@boeing.com 29

Machine Learning & Non-Parametric Methods for Cost Analysis The world of big data opens up new opportunities for ICEAA, such as machine learning and non-parametric methods. These methods are more flexible since they do not require explicit assumptions about the structure of the model. However, a large number of observations is needed in order to obtain accurate results. Hence, big data to the rescue! This presentation examines several non-parametric methods, with examples related to our community, and discusses opportunities and limitations going forward. Abstract 30

Engineering, Test & Technology Boeing Research & Technology Questions?