Innovation Crossover Preliminary Research Report IT/Cyber Machine Learning/Artificial Intelligence

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Python Machine Learning

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Speech Recognition at ICSI: Broadcast News and beyond

Learning Methods for Fuzzy Systems

Word Segmentation of Off-line Handwritten Documents

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Information System Design and Development (Advanced Higher) Unit. level 7 (12 SCQF credit points)

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

Major Milestones, Team Activities, and Individual Deliverables

CNS 18 21th Communications and Networking Simulation Symposium

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Laboratorio di Intelligenza Artificiale e Robotica

DOCTOR OF PHILOSOPHY HANDBOOK

An Introduction to Simio for Beginners

Circuit Simulators: A Revolutionary E-Learning Platform

Top US Tech Talent for the Top China Tech Company

Axiom 2013 Team Description Paper

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Knowledge-Based - Systems

Reinforcement Learning by Comparing Immediate Reward

Master s Programme in Computer, Communication and Information Sciences, Study guide , ELEC Majors

DOUBLE DEGREE PROGRAM AT EURECOM. June 2017 Caroline HANRAS International Relations Manager

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

A Pipelined Approach for Iterative Software Process Model

From Virtual University to Mobile Learning on the Digital Campus: Experiences from Implementing a Notebook-University

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

Data Fusion Models in WSNs: Comparison and Analysis

Developing a Distance Learning Curriculum for Marine Engineering Education

Laboratorio di Intelligenza Artificiale e Robotica

Five Challenges for the Collaborative Classroom and How to Solve Them

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

GACE Computer Science Assessment Test at a Glance

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

An Open Letter to the Learners of This Planet

SYLLABUS- ACCOUNTING 5250: Advanced Auditing (SPRING 2017)

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

Execution Plan for Software Engineering Education in Taiwan

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Device Design And Process Window Analysis Of A Deep- Submicron Cmos Vlsi Technology (The Six Sigma Research Institute Series) By Philip E.

Improving Fairness in Memory Scheduling

Eller College of Management. MIS 111 Freshman Honors Showcase

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

An Introduction and Overview to Google Apps in K12 Education: A Web-based Instructional Module

Creating Meaningful Assessments for Professional Development Education in Software Architecture

Changing User Attitudes to Reduce Spreadsheet Risk

Automating the E-learning Personalization

Software Development Plan

Welcome. Paulo Goes Dean, Eller College of Management Welcome Our region

K 1 2 K 1 2. Iron Mountain Public Schools Standards (modified METS) Checklist by Grade Level Page 1 of 11

Measurement & Analysis in the Real World

Fundraising 101 Introduction to Autism Speaks. An Orientation for New Hires

Android App Development for Beginners

Generative models and adversarial training

Computer Science PhD Program Evaluation Proposal Based on Domain and Non-Domain Characteristics

INPE São José dos Campos

Modeling user preferences and norms in context-aware systems

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Cambridge NATIONALS. Creative imedia Level 1/2. UNIT R081 - Pre-Production Skills DELIVERY GUIDE

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

CENTRAL MAINE COMMUNITY COLLEGE Introduction to Computer Applications BCA ; FALL 2011

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Please find below a summary of why we feel Blackboard remains the best long term solution for the Lowell campus:

Forget catastrophic forgetting: AI that learns after deployment

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

InTraServ. Dissemination Plan INFORMATION SOCIETY TECHNOLOGIES (IST) PROGRAMME. Intelligent Training Service for Management Training in SMEs

Time series prediction

Unit purpose and aim. Level: 3 Sub-level: Unit 315 Credit value: 6 Guided learning hours: 50

An Evaluation of E-Resources in Academic Libraries in Tamil Nadu

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Carnegie Mellon University Department of Computer Science /615 - Database Applications C. Faloutsos & A. Pavlo, Spring 2014.

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

ELA Grade 4 Literary Heroes Technology Integration Unit

Book Review: Build Lean: Transforming construction using Lean Thinking by Adrian Terry & Stuart Smith

Learning to Schedule Straight-Line Code

Telekooperation Seminar

CROSS COUNTRY CERTIFICATION STANDARDS

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Digital Technology Merit Badge Workbook

CIS 121 INTRODUCTION TO COMPUTER INFORMATION SYSTEMS - SYLLABUS

Xinyu Tang. Education. Research Interests. Honors and Awards. Professional Experience

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

THE VIRTUAL WELDING REVOLUTION HAS ARRIVED... AND IT S ON THE MOVE!

LEGO MINDSTORMS Education EV3 Coding Activities

IMPROVED MANUFACTURING PROGRAM ALIGNMENT W/ PBOS

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

Lecture 1: Basic Concepts of Machine Learning

CS Machine Learning

Lecture 1: Machine Learning Basics

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

MYCIN. The MYCIN Task

A Case-Based Approach To Imitation Learning in Robotic Agents

LIBRARY AND RECORDS AND ARCHIVES SERVICES STRATEGIC PLAN 2016 to 2020

Hi I m Ryan O Donnell, I m with Florida Tech s Orlando Campus, and today I am going to review a book titled Standard Celeration Charting 2002 by

Transcription:

Innovation Crossover Preliminary Research Report IT/Cyber Machine Learning/Artificial Intelligence Context/Scope This paper represents research conducted by OVO Innovation for the NSWC Crane Innovation Crossover event October 12-13, 2016. This research is intended to provide more insight into key challenges that were identified within the four technology clusters (Advanced Manufacturing, Cyber/IT, Life Sciences and DoD Technologies) first documented in the Battelle report. OVO consultants interviewed subject matter experts (SMEs) from the private sector, academia and the government identified by NSWC Crane to gather insights into key challenges in each cluster. This report is meant to inform the participants of the Innovation Crossover event and identify new research and new technologies that might address the key challenges. This research was collected during August and September, 2016. The reports were submitted by OVO to NSWC Crane in late September 2016. Introductory Narrative The Innovation Crossover event, scheduled for 12-13 October 2016 in Bloomington is the culmination of months of planning and hard work. Some of this preparatory work involved the initial Battelle study which identified key technology clusters (Advanced Manufacturing, Life Sciences, Cyber/IT and DoD Technologies) in southern Indiana. From these clusters NSWC Crane and its contractor OVO Innovation conducted further, more detailed research, to examine detailed challenges and opportunities in each technology cluster. The reports attached document the research OVO conducted with subject matter experts identified by NSWC Crane in academia, industry and in the government. The reports are meant to document specific challenges within each technology cluster that could become areas of joint research and cooperation across the three constituents in southern Indiana. The reports are provided to you to help you prepare for your participation in the upcoming Innovation Crossover event and to frame both the challenges and active research underway to address these challenges. 1

Problem Statement Machine Learning/Artificial Intelligence Challenge Crane Problem Definition Context: A key challenge of moving deep learning forward is to make it more computationally feasible as the current state-of-the-art requires significant amount of training data and CPU cycles. These should also be discussed on where we draw the boundary on what AI should or should not be used in certain decisions. Further, applying new machine learning techniques/technologies to real-world applications where the system currently depends on continual operator input, database access, data downloads, etc. 2

Problem Context Machine Learning/AI Challenges divide into three themes Inputs the data collected and presented to a machine for learning Processing challenges dealing with how the machine processes data to learn and produce useful results Outputs confidence in the answers the machine produces 3

Problem Context Input Challenges Dealing with multiplicity of data sources and data types in big data and machine learning How do we integrate the diverse modalities such as numerical, categorical, text, transactional, audio, video, etc. when building models and mining patterns? Where do we get the data? What open data is relevant? There has been an explosion of large publically available data sets (e.g., 10 years ago overhead imagery was very expensive; today you can get it from Google Maps) they need to be tuned to the needs of the machine that will be learning. 4

Problem Context Input Challenges Having enough data collected and labeled so it can be applied by machine learning Machine learning is trying to learn good from bad. This requires enough and known data to provide to the machine. How do we take data and put it in a format that we can use? Scale: Google, Facebook, Microsoft have billions of data points. However, if we have a highly dimensional space we actually have very sparse data. We need think of data at a different scale. Humans and can t handle complex large data sets, so we ask machines to look for patterns. They need many more data points and that data has to be understood to make sure we provide appropriate data for the machine to learn. 5

Problem Context Input Challenges Generating data when we don t have enough. Taking small data sets that we are confident about and adding data to them Need to rethink the vastness of the data that is required. Humans can handle small problems. Machine learning is used to discover patterns incomprehensible to humans, which means we need much more data. How can we take small data sets we can understand and add data to them that we are confident will work? (See bootstrapping in processing) To generate data, we sometimes use simulation. How can we create good enough simulators to create data? 6

Problem Context Input Challenges Distributed data storage and I/O (this is an input and output challenge) Data is stored across many machines in the cloud. Data is not in the same location, the hardware is different, different latencies, different storages and I/O. Each operation shuffling data in and out of memory takes 8x the operations step. Shuffling data in and out of disks will be a next big challenge. Getting and storing data accounts for a huge portion of any computation effort and varies depending on the type of computation/domain. We need to design adaptive, intelligent (learning) optimal data placement and retrieval strategies from distributed duplicated storage devices, especially hybrid architectures including conventional, flash, solid state, etc. drives in order to store and access massive amounts of data for machine learning. 7

Problem Context Overview Process Challenges The systems that we are engineering are so complex that it s becoming impossible to preengineer the systems. To sit down and define the algorithms is beyond the human engineering capabilities to program them. This is why we have to develop machines that learn. Humans were never programmed we went to school. We weren t rewired, we were taught. The engineer is not programming a machine he is teaching a machine. This is a major paradigm shift and presents new challenges for processing 8

Problem Context Process Challenges Taking small certified data sets and combining with other data that is relevant to the problem This relates to the input challenge of not having enough data humans can handle small problems; machine learning is used to discover patterns incomprehensible to humans and require vast data. How can we take small data sets about which we are confident and add data to it This is sometimes called bootstrapping and is used when the machine learns from relevant data but not sensitive data (e.g., classified) that can later be applied. How can we make sure that we can add new data in a meaningful way to data that already gets us close? 9

Problem Context Learning in the field Process Challenges Supervised learning in many domains is inadequate because we can t generate sufficient training. The machine goes to school to learn on training data. Then it goes to the field, and there it doesn t get new data, so it stops learning. The challenge becomes how a machine in the field can learn on the job while engaged in field. This is sometimes called reinforcement learning. 10

Problem Context Process Challenges Choosing the right machine learning algorithm How do we keep track of this rapidly changing field? There are many machine learning algorithms and more being developed how do we pick the appropriate one? How do you explore the space and understand which machine learning would work best? How as a community do you develop and track the rapidly state of the art? How to keep up to date and not reinvent? After selecting the algorithm, there are lots of different parameters that need to be set how do you do that? 11

Problem Context Learning from the operator. Process Challenges One goal for machine learning is to process data automatically. However, there are many cases where the machine could learn from the operator if it could query the operator in a way that the operator could respond and help it learn. How can we make the machine smart enough to have a symbiotic relationship with the operator and ask the operator how do I do this, what do I do in this case? How can the machine prompt the operator? Reinforcement learning: merging sensing and control in complex physical systems This is, in some senses, opposite of learning from the operator. How can we remove human biases about how to solve a problem? For example, in many robotic applications, we teach machines to find the edges before picking something up. But what if finding edges isn t important? How can we use sensing to help figure out to learn to control? How do we present the problem in a way that is appropriate? 12

Problem Context Explainability Output Challenges A huge challenge in machine learning using AI is explaining how the machine made the decision it made. The machine might be very successful at addressing a problem, but it can t explain how it did it or explain a new decision point. Most operators need a justification, not just a black box decision, especially if the situation is new. Explanations help humans make better decisions. A recommendation needs to be scrutable capable of being understood. Otherwise humans cannot tell if there is a flaw in the explanation. How do we achieve greater explainability and predict the bounds under which the algorithm will work? What performance can we sacrifice for confident answers? 13

Technologies Graphical Processing Units (GPUs), originally developed for accelerating gaming (and, as a result, became cost effective) are popular for machine learning. Manufacturers (e.g., Intel, NVIDIA), and companies (e.g., IBM, Facebook, Amazon) are investing in hardware to accelerate machine learning Most researchers do not build custom machines for machine learning Optimization in distributed data storage and I/O was identified as an input challenge 14

Relevance Machine learning is very important because it will allow machines to make suggestions for decisions in highly complex environments/problems. Decisions with explanations allow humans to make better decisions, which can extend the expertise of humans because we not only have a decision to something complex but we know why. This means we can also teach humans to make better decisions. Machine learning can help in almost any domain: from medical decisions to commercial (e.g., shopping, recommendations), machine learning can help sift through vast amounts of data to make relevant decisions. Who benefits? The human race through more lives saved, lower costs, greater efficiencies, etc. 15

Relevance DOD perspective 1: we cannot afford to put the same amount of man power as our adversaries are. A game changer is the combination of humans and machines and machine learning is one of the biggest drivers. How do we make machines more intelligent that used to require human expertise? That allow us to do it better or faster than thousands of people? Doing so would make us safer and meet commitments with resources we can apply. DOD perspective 2: The only way to overcome anti-access/areadenial (A2AD) is through a highly integrated sensing and weapons approach using many coordinated sensors and many coordinated weapons. We really can t engineer this we need machine learning. So from a defense standpoint, if the US is maintain its peacekeeper role for open shipping of the seas, we have to develop machine learning to do it. 16

Scope For the purpose of this challenge, the technical scope was primarily the inputs (data), outputs (decisions with explanation), and processing of machine learning, not hardware Machine learning is applicable to a huge number of domains from DOD to medical to energy to commercial anywhere where decisions must be made in complex environments, data, and/or problems. 17

Work/Research Underway General: Research underway parallels the challenge areas of input, processing, and output How to get more and better data (e.g., data sets, labeling, open data initiatives) How to deal with data to make it more useful (e.g., bootstrapping, reinforcement learning) Changing the structure of deep learning models to allow explainability (including tradeoffs between accuracy and being able to explain) Deep learning improving and creating new algorithms Human machine interactions (including Natural Language Processing) Optimizing data storage and I/O 18

Work/Research Underway Academic and Consortium organizations identified by interviewed SMEs Neural Information Processing Systems (NIPS) foundation provides a good survey of research. https://nips.cc/about The International Conference on Machine Learning is a leading conference on machine learning. http://icml.cc The ischools organization is a consortium of Information Schools dedicated to advancing the information field many universities doing machine learning research are members. http://ischools.org/ Knowledge Discovery & Web Mining Lab at University of Louisville. http://webmining.spd.louisville.edu/ Machine Learning Department at Carnegie Mellon University http://www.ml.cmu.edu/ Machine Learning at Berkeley https://ml.berkeley.edu/ Machine Learning at University of Washington https://www.cs.washington.edu/research/ml Center for Machine Learning and Applications (CMLA) at The Pennsylvania State University. http://www.cse.psu.edu/research/cmla 19

Work/Research Underway Commercial Organizations identified by interviewed SMEs Amazon Google Facebook Microsoft IBM Watson Yahoo Funding Organizations identified by interviewed SMEs National Science Foundation Office of Naval Research Department of Defense Department of Energy DARPA On August 10, 2016 DARPA released the BAA for Explainable Artificial Intelligence Program http://www.darpa.mil/program/explainable-artificial-intelligence 20

Summary Machine learning systems seek to understand vast amounts of data and provide decision making at a revolutionary level. Many people are already familiar with popular examples such as recommendation engines for movies, books and IBM s Watson. The ramifications for defense, medical, information, safety, commercial, and many other applications are astounding. Machine language has had a number of successes, yet faces many challenges. The Subject Matter Experts identified by NSWC Crane identified challenges in three major areas (see following slide) 21

Summary The Subject Matter Experts identified by NSWC Crane identified challenges in three major areas Input Challenges Dealing with multiplicity of data sources and data types in big data and machine learning Having enough data collected and labeled so it can be applied by machine learning Generating data when we don t have enough. Taking small data sets that we are confident about and adding data to them Distributed data storage and I/O Process Challenges Choosing the right machine learning algorithm Taking small certified data sets and combining with other data that is relevant to the problem Learning in the field Learning from the operator Reinforcement learning: merging sensing and control in complex physical systems Output Challenges Explainability: Explaining how the machine made the decision it made 22

Summary Addressing these challenges has the potential to Significantly enhance US defense Significantly advance decisions for medical, commercial, safety, and countless other domains Advance the human race These challenges and the their potential solutions have been widely recognized and funded by governments and commercial companies and research, including at universities around NSWC Crane, abounds 23

Sources Subject Matter Experts consulted / interviewed Dr. Robert Cruise, NSWC Crane Dr. Mark H. Linderman, AFRL Dr. Olfa Nasraoui, University of Louisville Dr. Lee Seversky, AFRL 24