Copyright 2016, Oracle and/or its affiliates. All rights reserved.

Similar documents
Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CS Machine Learning

Python Machine Learning

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Reducing Features to Improve Bug Prediction

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Rule Learning With Negation: Issues Regarding Effectiveness

Word Segmentation of Off-line Handwritten Documents

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Software Maintenance

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Rule Learning with Negation: Issues Regarding Effectiveness

The Evolution of Random Phenomena

CSL465/603 - Machine Learning

Australian Journal of Basic and Applied Sciences

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Comparison of network inference packages and methods for multiple networks inference

AQUA: An Ontology-Driven Question Answering System

Switchboard Language Model Improvement with Conversational Data from Gigaword

Generating Test Cases From Use Cases

Assignment 1: Predicting Amazon Review Ratings

On-Line Data Analytics

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Learning From the Past with Experiment Databases

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Issues in the Mining of Heart Failure Datasets

Comprehensive Program Review (CPR)

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing

Visit us at:

Linking Task: Identifying authors and book titles in verbose queries

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Exploration. CS : Deep Reinforcement Learning Sergey Levine

A Vector Space Approach for Aspect-Based Sentiment Analysis

Patterns for Adaptive Web-based Educational Systems

Major Milestones, Team Activities, and Individual Deliverables

(Sub)Gradient Descent

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Human Emotion Recognition From Speech

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

16.1 Lesson: Putting it into practice - isikhnas

A Case Study: News Classification Based on Term Frequency

Lecture 1: Basic Concepts of Machine Learning

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Introduction to Simulation

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

IMGD Technical Game Development I: Iterative Development Techniques. by Robert W. Lindeman

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Laboratorio di Intelligenza Artificiale e Robotica

Computer Architecture CSC

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Seminar - Organic Computing

MYCIN. The MYCIN Task

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Extending Place Value with Whole Numbers to 1,000,000

Using dialogue context to improve parsing performance in dialogue systems

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

arxiv: v1 [cs.lg] 15 Jun 2015

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Loughton School s curriculum evening. 28 th February 2017

Test Effort Estimation Using Neural Network

Probability estimates in a scenario tree

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Radius STEM Readiness TM

Detecting English-French Cognates Using Orthographic Edit Distance

Georgetown University at TREC 2017 Dynamic Domain Track

Knowledge Transfer in Deep Convolutional Neural Nets

Degree Qualification Profiles Intellectual Skills

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Introduction to Causal Inference. Problem Set 1. Required Problems

arxiv: v1 [cs.cv] 10 May 2017

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Speech Emotion Recognition Using Support Vector Machine

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

An OO Framework for building Intelligence and Learning properties in Software Agents

Probabilistic Latent Semantic Analysis

Probability and Statistics Curriculum Pacing Guide

Moderator: Gary Weckman Ohio University USA

Implementing a tool to Support KAOS-Beta Process Model Using EPF

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Case study Norway case 1

Measurement & Analysis in the Real World

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Artificial Neural Networks written examination

Transcription:

The following is intended to provide some insight into a line of research in Oracle Labs. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. Oracle reserves the right to alter its development plans and practices at any time, and the development, release, and timing of any features or functionality described in connection with any Oracle product or service remains at the sole discretion of Oracle. Any views expressed in this presentation are my own and do not necessarily reflect the views of Oracle. 2

Bisco@ and Cannoli An Ini&al Explora&on into Machine Learning for the Purposes of Finding Bugs in Source Code Tim Chappell*, CrisDna Cifuentes, Paddy Krishnan, Shlomo Geva* Queensland University of Technology*, Oracle Labs November 15, 2016

Project Overview Imagine if machine learning could detect bugs for us in sotware With good precision With good recall With good performance And beat Parfait and other stadc code analysis tools at finding bugs in sotware This Friday Project is an invesdgadon into what is feasible in this space Project started in February 2016 4

Machine Learning is the subfield of computer science that gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959) Wikipedia 5

Machine Learning Approaches Supervised Learning The learning algorithm is given example inputs and their desired outputs, with the goal to learn a general rule that maps inputs to outputs Unsupervised Learning The learning algorithm infers structure in its inputs to produce the outputs of interest 6

Machine Learning Approaches Supervised Learning The learning algorithm is given example inputs and their desired outputs, with the goal to learn a general rule that maps inputs to outputs Two tools Bisco@ Cannoli Unsupervised Learning The learning algorithm infers structure in its inputs to produce the outputs of interest 7

Supervised Learning Classifiers and Decision Trees Diagram from: http://sebastianraschka.com/images/blog/2014/intro_supervised_learning/decision_tree_1.png 8

2D Decision Boundary http://statweb.stanford.edu/~jtaylo/courses/stats202/_images/trees_fig_03.png 9

Iris Dataset Example Made use of two petal features (length and width) Classified into three classes of Irises (setosa, versicolor, virginica) 10

AbstracDng The Iris Dataset Example Features are inputs Classes are outputs Dataset needs to contain features and classes 11

AbstracDng The Iris Dataset Example Features are inputs Classes are outputs Dataset needs to contain features and classes For bugs in source code Features ==? Classes == bug type 12

Bisco@ 13

Bisco@ s Feature SelecDon Complexity of the code CyclomaDc complexity Def-use chains # edges # knots Length of code Line count NesDng level Vocabulary FuncDon start line FuncDon end line Text features! ( ), 00 1 FILE Input Logged Intermediate Code instrucdon frequency add alloca and ashr bitcast br call extractvalue fadd 14

Bisco@ s Feature SelecDon Intermediate Code 2-grams alloca-alloca store-store store-br br-load load-icmp icomp-br br-br Clang analyze output Array-subscript-is-undefined Bad-free Dead-assignment Dead-increment Dereference-of-null-pointer Double-free FuncDon-call-argument-isan-uniniDalized-value Memory-leak Out-of-bound-array-access Output from other StaDc Code Analysis tools Parfait Splint UNO 15

Feature SelecDon Dimensionality ReducDon 3000 2500 2000 1500 1000 500 8000 7000 6000 5000 4000 3000 2000 1000 0 0 8,190 features reduced to 500 16

Feature SelecDon Dimensionality ReducDon LOONNE: leave one out nearest neighbour error Removes the least disdnguishing feature at each step by minimising the global error Given a feature set FS, GlobalError(FS) = Sum of all misclassificadons for FS LOONNE removes feature f if for all other features f, GlobalError(FS-{f}) > GlobalError(FS-{f }) 17

Bisco@ s ClassificaDon Algorithm Random Forests Forest of 100 randomly-seeded decision trees using random subsets of the feature set The outcomes of the decision trees are combined to produce a single outcome for each result Useful when no natural probabilisdc distribudon amongst features Granularity of analysis: funcdon level Line number level too fine for inidal experimentadon 18

Training and Test Datasets: BegBunch s Accuracy Suites Bugs are marked up in the suites BegBunch Suite Type of Benchmark Average Non-Commented Lines of Code # Func&ons # and Types of Bugs Cigital SyntheDc 15 50 Samate SyntheDc 20 2,366 Iowa SyntheDc 31 1,686 OracleLabs- Accuracy* Real 917 547 Buffer overruns: 1709 Memory leaks: 196 UniniDalised vars: 131 Trained with 4-fold cross-validadon over test datasets * These bug kernels were extracted from open source code, including relevant flow of control. 19

Results ML (Bisco@) vs StaDc Code Analysis Tools Type of Bug Splint Parfait BiscoG 500 features Buffer overrun 581/999 TP (58%) 343 FP 885/999 (89%) Memory leak - 9/42 (21%) UniniDalised variable 12/15 TP (80%) 54 FP 13/15 (87%) 14 FP 910/999 (91%) 10 FP 17/42 (40%) 11 FP 8/15 (53%) 262 FP 3 FP 0 FP Evaluated using 4-fold cross-validadon over BegBunch dataset 20

What Did Bisco@ Learn? Top 10 features [Parfait] buffer overflow [Parfait] read outside array bounds [Splint] fresh storage not released before return [Text], [Complexity] funcdon end line [Parfait] uninidalised variable [Splint] funcdon exported but not used outside [Splint] for body not block [Text] contents Training datasets have high number of synthedc benchmarks Bisco@ learnt to rely on features that don t make sense (e.g., end of line) None of the features are representadve of a bug 21

Results ML (Bisco@) vs StaDc Code Analysis Tools Type of Bug Splint Parfait BiscoG Buffer overrun 581/999 TP (58%) 343 FP 885/999 (89%) Memory leak - 9/42 (21%) UniniDalised variable 12/15 TP (80%) 54 FP 13/15 (87%) 14 FP 910/999 (91%) 10 FP 17/42 (40%) 11 FP 8/15 (53%) 500 features 1-&2-grams + complexity features (553 features) 262 FP 23/999 (2%) 3 FP 5/42 (12%) 0 FP 0/15 (0%) 5 FP 0 FP 0 FP Evaluated using 4-fold cross-validadon over BegBunch dataset 22

Bisco@ Conclusions Need more datasets of representadve bugs; marked up I.e., not synthedc benchmarks The crux of supervised learning is determining the right set of features What features make a bug a bug? 23

Deep Learning succeeds when it s difficult to figure out what features you want to use in your classifier 24

Machine Learning Approaches Supervised Learning The learning algorithm is given example inputs and their desired outputs, with the goal to learn a general rule that maps inputs to outputs Unsupervised Learning The learning algorithm infers structure in its inputs to produce the outputs of interest Two tools Bisco@ Cannoli 25

Supervised Learning ConvoluDonal Neural Networks 3-layer neural network http://cs231n.github.io/assets/nn1/neural_net2.jpeg 26

Supervised Learning ConvoluDonal Neural Networks Convolu&onal neural network http://cs231n.github.io/assets/cnn/cnn.jpeg 27

Cannoli 28

Cannoli s Architecture 29

Training Dataset: BegBunch s Scalability Suites Bugs are not marked up in these suites BegBunch Suite Average Non-Commented Lines of Code # Func&ons Calysto 87,636 11,214 OracleLabs-Scalability 394,739 53,448 30

Results ML (Cannoli) vs StaDc Code Analysis Tools Training on Scalability Suite (50/50 split), tes&ng on OpenSolaris ONNV b93* (no split) Type of Bug Parfait v0.4.1 Cannoli Buffer overrun 221 TP, 81 FP 213/221 TP, 56095 FP Memory leak 506 TP, 94 FP 497/506 TP, 47414 FP Training on Scalability Suites using Parfait v1.7.1.3 results as ground truth * 168,666 functions 31

Results ML (Cannoli) vs StaDc Code Analysis Tools Training on BegBunch s Accuracy Suites (no split), tes&ng on OpenSolaris ONNV b93* Type of Bug Parfait v0.4.1 Cannoli Buffer overrun 221 TP, 81 FP 23/221 TP, 9146 FP Memory leak 506 TP, 94 FP 0/506 TP, 174 FP UniniDalised variable 30 TP, 16 FP 0/30 TP, 153 FP Training on Scalability Suites using Parfait v1.7.1.3 results as ground truth * 168,666 functions 32

What Did Cannoli Learn?? 33

Cannoli Conclusions Image recognidon techniques not ideal for source code analysis Results from black-box techniques are not very useful for bug detecdon No bug traces can be derived for developers to understand the results of the tool 34

Summary Of The State Of The Art Paper Venue-Year Summary Brun, Ernst ICSE-04 ProperDes inferred using both buggy and fixed code Yamaguchi et al. ACSAC-12 Extrapolate vulnerabilides from known vulnerabilides using AST representadons ALETHEIA CCS-14 StaDsDcal analyses to predict rare vulnerabilides; tunable to focus on FP eliminadon/tp detecdon. Basic features (per Bisco@) JSNice POPL-15 Use program dependence graphs and stadsdcal predicdon to deobfuscate JavaScript code Mou et al. AAAI-16 ConvoluDonal Neural Networks using AST representadon to idendfy code similarides Wang et al. ICSE-16 Use Deep Belief Networks and AST representadon to detect within project and cross project defects Greico et al. CODASPY-16 Use stadc and dynamic features (state of memory) to detect vulnerabilides 35

Summary Two ML approaches were implemented to find bugs in C code Bisco@: supervised learning using a random forest of decision trees and LOONNE Cannoli: supervised learning using a convoludonal neural network Both learned something But results are Ded to the datasets used; i.e., doesn t learn to find bugs in unseen code Bisco@ captures syntacdc features of the program Need to capture seman/c features Need a lot more representa&ve data 36

Future Plans 1. Create enough data for datasets RepresentaDve propordon of buggy vs non-buggy code RepresentaDve number of bugs for each bug type of interest Fixed version of each buggy example 2. Explore different approaches to encode semandcs Use of buggy vs fixed code to determine features of interest [Ernst 04] Use of recurrent neural network with long short-term memory (LSTM) 37

Q&A 38

39