Presented at SAT 2014, Vienna, Austria (*Won the best student paper award)

Similar documents
Software Maintenance

GACE Computer Science Assessment Test at a Glance

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Lecture 1: Machine Learning Basics

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

CS Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

AQUA: An Ontology-Driven Question Answering System

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Exploration. CS : Deep Reinforcement Learning Sergey Levine

University of Groningen. Systemen, planning, netwerken Bosman, Aart

A Reinforcement Learning Variant for Control Scheduling

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Probability and Statistics Curriculum Pacing Guide

Artificial Neural Networks written examination

UNDERSTANDING DECISION-MAKING IN RUGBY By. Dave Hadfield Sport Psychologist & Coaching Consultant Wellington and Hurricanes Rugby.

Python Machine Learning

Lecture 1: Basic Concepts of Machine Learning

Transfer Learning Action Models by Measuring the Similarity of Different Domains

Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas

STAT 220 Midterm Exam, Friday, Feb. 24

Measurement. When Smaller Is Better. Activity:

Rule Learning With Negation: Issues Regarding Effectiveness

School of Innovative Technologies and Engineering

MASTERS VS. PH.D. WHICH ONE TO CHOOSE? HOW FAR TO GO? Rita H. Wouhaybi, Intel Labs Bushra Anjum, Amazon

Cognitive Modeling. Tower of Hanoi: Description. Tower of Hanoi: The Task. Lecture 5: Models of Problem Solving. Frank Keller.

An Interactive Intelligent Language Tutor Over The Internet

Discriminative Learning of Beam-Search Heuristics for Planning

Learning Methods for Fuzzy Systems

(Sub)Gradient Descent

Extending Place Value with Whole Numbers to 1,000,000

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

Reinforcement Learning by Comparing Immediate Reward

Unit 3: Lesson 1 Decimals as Equal Divisions

Learning and Transferring Relational Instance-Based Policies

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Chapter 2 Rule Learning in a Nutshell

Practical Research. Planning and Design. Paul D. Leedy. Jeanne Ellis Ormrod. Upper Saddle River, New Jersey Columbus, Ohio

Course Content Concepts

How to read a Paper ISMLL. Dr. Josif Grabocka, Carlotta Schatten

EGRHS Course Fair. Science & Math AP & IB Courses

A Version Space Approach to Learning Context-free Grammars

CSC200: Lecture 4. Allan Borodin

Word Segmentation of Off-line Handwritten Documents

White Paper. The Art of Learning

Success Factors for Creativity Workshops in RE

May To print or download your own copies of this document visit Name Date Eurovision Numeracy Assignment

MYCIN. The MYCIN Task

OVERVIEW OF CURRICULUM-BASED MEASUREMENT AS A GENERAL OUTCOME MEASURE

Test How To. Creating a New Test

Getting Started with Deliberate Practice

Major Milestones, Team Activities, and Individual Deliverables

A Case Study: News Classification Based on Term Frequency

The Role of Architecture in a Scaled Agile Organization - A Case Study in the Insurance Industry

12- A whirlwind tour of statistics

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

South Carolina College- and Career-Ready Standards for Mathematics. Standards Unpacking Documents Grade 5

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Types of Research EDUC 500

Multi-label classification via multi-target regression on data streams

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

STA 225: Introductory Statistics (CT)

The Strong Minimalist Thesis and Bounded Optimality

Rule-based Expert Systems

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Ohio s Learning Standards-Clear Learning Targets

Reducing Features to Improve Bug Prediction

Learning From the Past with Experiment Databases

Disciplinary Literacy in Science

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Rule Learning with Negation: Issues Regarding Effectiveness

Some Principles of Automated Natural Language Information Extraction

TabletClass Math Geometry Course Guidebook

Measures of the Location of the Data

Memorandum. COMPNET memo. Introduction. References.

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

Theme 1 Crea,ng Communica,ve Spaces

B. How to write a research paper

Detailed course syllabus

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

TRANSNATIONAL TEACHING TEAMS INDUCTION PROGRAM OUTLINE FOR COURSE / UNIT COORDINATORS

An Evaluation of the Interactive-Activation Model Using Masked Partial-Word Priming. Jason R. Perry. University of Western Ontario. Stephen J.

Axiom 2013 Team Description Paper

LEARNER VARIABILITY AND UNIVERSAL DESIGN FOR LEARNING

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Relationships Between Motivation And Student Performance In A Technology-Rich Classroom Environment

TEACHER'S TRAINING IN A STATISTICS TEACHING EXPERIMENT 1

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Self Study Report Computer Science

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Transcription:

by Zack Newsham 1, Vijay Ganesh 1, Sebastian Fischmeister 1, Gilles Audemard 2, and Laurent Simon 3 1 University of Waterloo, 2 University of Artois and 3 University of Bordeaux Presented at SAT 2014, Vienna, Austria (*Won the best student paper award)

So#ware Engineering & SAT/SMT Solvers An Indispensable Tac:c for Any Strategy Formal Methods Program Analysis/ Synthesis SOFTWARE SAT/SMT ENGINEERING Solvers Automatic Testing Programming Languages 2

SAT/SMT Solver Research Story A 1000x Improvement in the Last Few Years Solver- based programming languages (e.g., Scala with Z3) Rich type systems with constraints (e.g., Liquid Types and Liquid Haskell) Constraint- based DSL for analysis (e.g., Doop and muz) Concolic Tes:ng* Equivalence Checking Auto Configura:on Bounded MC Program Analysis AI 3

What is a SAT/SMT Solver? Automa:on of Logic Logic Formula (q p r) (q p r)... Solver SAT UNSAT Rich logics (Modular arithme:c, Arrays, Strings,...) Boolean sa:sfiability problem is NP- complete, Quan:fied Boolean sa:sfiability problem is PSPACE- complete,... Prac:cal, scalable, usable, automa:c Enable novel so#ware reliability approaches 4

Modern CDCL SAT Solver Architecture Key Steps and Data- structures Input SAT Instance Propagate() (BCP) No Conflict? Key steps Decide() Propagate() (Boolean constant propaga:on) Conflict analysis and learning() (CDCL) Backjump() Forget() Restart() All Vars Assigned? Conflict Analysis() CDCL: Conflict- Driven Clause- Learning Conflict analysis is a key step Results in learning a learnt clause Prunes the search space Return SAT Decide() Return UNSAT TopLevel Conflict? BackJump() Key data- structures (Solver state) Stack or trail of par:al assignments (AT) Input clause database Conflict clause database Conflict graph Decision level (DL) of a variable 5

Problem Statement Why are SAT Solvers efficient for Industrial Instances Conflict- driven clause learning (CDCL) Boolean SAT solvers are remarkably efficient for large industrial instances This is true for industrial instances from a diverse set of applica:ons These instances may have tens of millions of variables and clauses This phenomenon is surprising since Boolean sa:sfiability is an NP- complete problem believed to be intractable in general Why is this so?

Scien:fic Mo:va:on to Understand Why SAT Works The Laws of SAT Solving A scien:fic approach, as opposed to trial- and- error Lead to bejer, and more importantly predictable solvers Predic:ve model that cheaply computes solver running :me by analyzing SAT input Complexity- theore:c understanding, a la smoothed analysis As yet unforeseen applica:ons may benefit from a deeper understanding of SAT solving (more on this later)

The Laws of SAT Solving Sub Problems We break the problem statement down to smaller subproblems 1. On which class of instances do SAT solvers perform well? I.e., a precise mathematical characterization of instances on which solvers work well 2. An abstract algorithmic description of SAT solvers 3. A complexity-theoretic analysis that provides meaningful asymptotic bounds In this talk, I focus on Question 1, and briefly touch upon some potential answers for Question 2.

A (partial) answer to question 1 A graph-theoretic characterization of SAT instances, as opposed to measuring the size of instances only in terms of number of variables and clauses Industrial SAT instances have good community structure (also confirmed by previous work by Jordi Levy et al.) Community structure of the graph of SAT instances strongly affect solver performance Result #1: Hard random instances have low Q (0.05 Q 0.13) Result #2: Number of communities and Q of SAT instances are more predictive of CDCL solver performance than other measures Result #3: Strong correlation between community structure and LBD (Literal Block Distance) in Glucose solver

SOURCE: mrpp example from SAT 2013 compe::on viewed using our SATGraf tool

Community structure [GN03,CNM04,OL13] of a graph is measure of how separable or well-clustered the graph is It is characterized using a metric called Q (quality factor) that ranges from 0 to 1 Informally, if a graph has lots of small clusters that are weakly connected (easily separable) to each other then such a graph is said to have high Q If a graph looks like a giant hairy ball then it has low Q

SOURCE: mrpp example from SAT 2013 compe::on viewed using our SATGraf tool

SOURCE: unif- k3- r4.267- v421- c1796- S4839562527790587617 randomly- generated example from SAT 2013 compe::on

How to compute community structure? The decision version of the Q maximization problem is NP-complete [Brandes et al., 2006] Many efficient approximate algorithms proposed, e.g., [CNM04] and [0L13] We use the above two algorithms for our experiments Our results with both algorithms are similar

A (partial) answer to question 1 A graph-theoretic characterization of SAT instances, as opposed to measuring the size of instances only in terms of number of variables and clauses Industrial SAT instances have good community structure (also confirmed by previous work by Jordi Levy et al.) Community structure of the graph of SAT instances strongly affect solver performance Result #1: Hard random instances have low Q (0.05 Q 0.13) Result #2: Number of communities and Q of SAT instances are more predictive of CDCL solver performance than other measures Result #3: Strong correlation between community structure and LBD (Literal Block Distance) in Glucose solver

Community Structure and Random Instances Experiments #1: Hypothesis and Defini:ons Hypothesis tested: Is there a range of Q values for randomly generated instances, that are hard for CDCL solvers; regardless of the number of clauses/variables Are randomly generated instances outside this range uniformly easy

Community Structure and Random Instances Experiments #1: Setup Randomly generated 550,000 SAT instances for the experiment Varied N V between 500 and 2000 in increments of 100 Varied N cl between 2000 and 10000 in increments of 1000 Varied target Q between 0 and 1 in increments of 0.01 Varied Number of communi:es between 20 and 400 in increments of 20 Experiments using MiniSAT Timeout of 900 seconds per run Run solver on inputs in a random order Average the running :me over several runs

Community Structure and Random Instances Experiments Performed (#1) Plojed Q against :me No:ced significant increase in execu:on :me when 0.05 Q 0.13 Also recomputed the results using a stra:fied sample Used due to high number of instances within target range Randomly sample the data taking 250 results from each 0.1 range of Q between 0 and 0.9 Almost the same result: 0.05 Q 0.12

Community Structure and Random Instances Experiments Performed (#1) Huge increase in running :me of randomly generated instances when 0.05 Q 0.13

Community Structure and Industrial Instances Experiments #2: Hypothesis and Defini:ons Hypothesis tested: Are the community modularity and number of communi:es bejer correlated with the running :me of CDCL solvers than tradi:onal metrics Is the correla:on bejer for industrial instances than randomly generated or hand cra#ed ones

Community Structure and Industrial Instances Experiments #2: Hypothesis and Defini:ons Instances used Approximately 800 instances from the SAT 2013 compe::on. For the remaining we couldn t compute community structure due to resource constraints Using OL algorithm to compute community structure for the 800 instances. Much faster and more scalable All experimental results are for Minipure Obtained from the SAT 2013 compe::on website Used sta:s:cal tool R to perform standard linear regression

Community Structure and Industrial Instances Experiments Performed (#2) Performed linear regression on the solver running :me twice Once with community structure metrics (and variables/clauses) Once without Compared the adjusted R 2 (variability) from both experiments Variability measures how good the models predicted results are, compared with the actual results Varies from 0 to 1 The lower the variability (higher the R 2 ) the more predic:ve the model

Community Structure and Industrial Instances Experiments Performed (#2) Timeouts included A large por:on (Approximately 60%) of the instances :medout Not ideal, but without them there isn t enough data log(:me) used Timeouts Wide distribu:on between instances that finished and :medout Data standardized to have mean = 0 and standard devia:on = 1 Standard prac:ce when regressors are in different scales.

Community Structure and Industrial Instances Experiments Performed (#2) Model #1 - R 2 ~ 0.5 log(:me) ~ CL * V * Q * CO * QCOR * CLVR * denotes interac:on terms between factors CL = number of clauses V = number of variables CO = number of communi:es QCOR = ra:o of Q to communi:es CLVR = ra:o of clauses to variables Model #2 - R 2 ~ 0.33 log(:me) ~ CL * V * CLVR

Community Structure and Industrial Instances Experiments #2: Results and Interpreta:on The regressions show us that the model with the community structure metrics is a bejer predictor of running :me than tradi:onal metrics, i.e. number of clauses/variables.

Literal Block Distance (LBD) and Communi:es Experiment #3: Hypothesis and Defini:ons Hypothesis tested The number of communi:es in a conflict clause correlates strongly with its LBD measure What is LBD? (Glucose solver [AS09]) LBD measure M of a learnt clause C is a rank based on the number N of dis:nct decision levels the vars in C belong to The lower the value of N the bejer the clause C is LBD is a powerful measure of the u:lity of a conflict clause

Literal Block Distance (LBD) and Communi:es Experiment #3: Hypothesis and Defini:ons LBD and Clause dele:on Integral to the efficiency of modern solvers Without clause dele:on, conflict clause produc:on quickly consumes available memory LBD is a useful in determining which clauses to delete Which clauses to delete? LBD to the rescue Periodically delete conflict clauses with bad LBD rank As we will see, clauses with bad LBD rank are shared by many communi:es

Literal Block Distance (LBD) and Communi:es Experiment #3: Intui:on The number of communi:es in a conflict clause The number of communi:es N in a conflict clause C is the number of dis:nct communi:es the variables in C belong to Intui:on behind the hypothesis High quality conflict clauses tend to span very few communi:es, i.e. N is small High quality conflict clauses are likely to cause more propaga:on per decision variable, and hence are likely to have low LBD LBD picks out high quality conflict clauses

Literal Block Distance (LBD) and Communi:es Experiment #3: Setup Instances considered 189 SAT 2013 industrial category instances out of 300 We were only able to compute communi:es for these 189 The rest caused memory- out errors Step 1 For each of the 189 instances, compute: Community structure The number of communi:es a learnt clause spans LBD of every learnt clause (only for the first 20,000 due to resource constraints)

Literal Block Distance (LBD) and Communi:es Experiments Performed (#3) Step 2 LBD of every learnt clause considered, was correlated with the number of communi:es it spans Thousands of data points over the 189 instances Correlate LBD and number of communi:es using heatmaps Heatmap of LBD and communi:es of learnt clauses Difficult to correlate thousands of data points over hundreds of instances One heatmap per SAT instance

Literal Block Distance (LBD) and Communi:es Experiments #3: Results and Interpreta:on Result Most industrial instances have a very strong correla:on between LBD and communi:es

Impact of Community Structure and Solver Running Time Scope for Improvement Consider different regression techniques The non- normality of the data stops us from es:ma:ng confidence intervals Try experiments on more solvers Glucose, MiniSAT and Minipure were the solvers we considered so far Compare different random genera:on techniques, and different graph representa:on for SAT instances Make the community- structure based model more robust by adding other features of SAT instances Compare against other models proposed based on backdoors and graph- width Construct a predic:ve model

The Laws of SAT Solving We Provided an Answer to Ques:on 1 We break the problem statement down to smaller subproblems 1. On which class of instances do SAT solvers perform well? I.e., a precise mathematical characterization of instances on which solvers work well 2. An abstract algorithmic description of SAT solvers 3. A complexity-theoretic analysis that provides meaningful asymptotic bounds In this talk, I focus on Question 1, and briefly touch upon some potential answers for Question 2.

Input Branching Heuristic and Propagation (Induction) Partial assignments (Long conflict clause) Shorter conflict clauses Conflict Detection and Analysis (Deduction) Output: SAT/UNSAT

A (partial) answer to question 1 A graph-theoretic characterization of SAT instances, as opposed to measuring the size of instances only in terms of number of variables and clauses Industrial SAT instances have good community structure (also confirmed by previous work by Jordi Levy et al.) Community structure of the graph of SAT instances strongly affect solver performance Result #1: Hard random instances have low Q (0.05 Q 0.13) Result #2: Number of communities and Q of SAT instances are more predictive of CDCL solver performance than other measures (for the Minipure solver) Result #3: Strong correlation between community structure and LBD (Literal Block Distance) in Glucose solver