Building security that thinks

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Lecture 1: Machine Learning Basics

Python Machine Learning

Lecture 1: Basic Concepts of Machine Learning

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Computerized Adaptive Psychological Testing A Personalisation Perspective

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

CS Machine Learning

Laboratorio di Intelligenza Artificiale e Robotica

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

A Case Study: News Classification Based on Term Frequency

Welcome to. ECML/PKDD 2004 Community meeting

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Generative models and adversarial training

CSL465/603 - Machine Learning

Probabilistic Latent Semantic Analysis

Seminar - Organic Computing

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

(Sub)Gradient Descent

Modeling user preferences and norms in context-aware systems

Software Maintenance

Rule Learning With Negation: Issues Regarding Effectiveness

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

MYCIN. The MYCIN Task

Top US Tech Talent for the Top China Tech Company

Laboratorio di Intelligenza Artificiale e Robotica

Learning Methods in Multilingual Speech Recognition

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

An OO Framework for building Intelligence and Learning properties in Software Agents

Axiom 2013 Team Description Paper

On-Line Data Analytics

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Implementing a tool to Support KAOS-Beta Process Model Using EPF

Calibration of Confidence Measures in Speech Recognition

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Rule Learning with Negation: Issues Regarding Effectiveness

K5 Math Practice. Free Pilot Proposal Jan -Jun Boost Confidence Increase Scores Get Ahead. Studypad, Inc.

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

Australian Journal of Basic and Applied Sciences

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Mining Association Rules in Student s Assessment Data

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Linking Task: Identifying authors and book titles in verbose queries

Discriminative Learning of Beam-Search Heuristics for Planning

Content-free collaborative learning modeling using data mining

EXECUTIVE SUMMARY. Online courses for credit recovery in high schools: Effectiveness and promising practices. April 2017

Assignment 1: Predicting Amazon Review Ratings

Applications of data mining algorithms to analysis of medical data

For the Ohio Board of Regents Second Report on the Condition of Higher Education in Ohio

Statistics and Data Analytics Minor

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Assessing and Providing Evidence of Generic Skills 4 May 2016

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Bayllocator: A proactive system to predict server utilization and dynamically allocate memory resources using Bayesian networks and ballooning

Speech Emotion Recognition Using Support Vector Machine

Kristin Moser. Sherry Woosley, Ph.D. University of Northern Iowa EBI

AUTOMATED FABRIC DEFECT INSPECTION: A SURVEY OF CLASSIFIERS

1.1 Background. 1 Introduction

November 17, 2017 ARIZONA STATE UNIVERSITY. ADDENDUM 3 RFP Digital Integrated Enrollment Support for Students

IS FINANCIAL LITERACY IMPROVED BY PARTICIPATING IN A STOCK MARKET GAME?

Word Segmentation of Off-line Handwritten Documents

Knowledge Transfer in Deep Convolutional Neural Nets

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Learning and Transferring Relational Instance-Based Policies

Switchboard Language Model Improvement with Conversational Data from Gigaword

Ericsson Wallet Platform (EWP) 3.0 Training Programs. Catalog of Course Descriptions

Natural Language Processing. George Konidaris

Financial aid: Degree-seeking undergraduates, FY15-16 CU-Boulder Office of Data Analytics, Institutional Research March 2017

Human Emotion Recognition From Speech

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Writing Research Articles

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

PROJECT RELEASE: Towards achieving Self REgulated LEArning as a core in teachers' In-SErvice training in Cyprus

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Success Factors for Creativity Workshops in RE

Universidade do Minho Escola de Engenharia

Machine Learning and Development Policy

Michael Grimsley 1 and Anthony Meehan 2

VOL VISION 2020 STRATEGIC PLAN IMPLEMENTATION

Modeling function word errors in DNN-HMM based LVCSR systems

The History of Language Teaching

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Rachel Edmondson Adult Learner Analyst Jaci Leonard, UIC Analyst

Artificial Neural Networks

Physics 270: Experimental Physics

Emergency Management Games and Test Case Utility:

UNEP-WCMC report on activities to ICRI

Cooking Matters at the Store Evaluation: Executive Summary

Transcription:

Christopher Morales, Head of Security Analytics Vectra Building security that thinks Machine learning fundamentals for cybersecurity professionals

What makes a machine intelligent? Artificial Intelligence Programs with the ability to learn and reason like humans Machine Learning Algorithms with the ability to learn without being explicitly programmed Deep Learning Subset of machine learning in which artificial neural networks adapt and learn from vast amounts of data

Types of machine learning (ML) Task driven Supervised Random forest Support vector machine Deep Learning Unsupervised Clustering Data driven

Supervised Machine Learning Classification Predicting a label Am I hungry? Yes No Regression Predicting a quantity Do I have $25? Yes No Go to sleep. Go to restaurant. Buy a hamburger. BUILDING SECURITY THAT THINKS MACHINE LEARNING FUNDAMENTALS

Unsupervised Machine Learning Clustering Create groups (clusters) based on the similarities of the examples BUILDING SECURITY THAT THINKS MACHINE LEARNING FUNDAMENTALS

Deep learning Transfer learning Task is reused as the starting point for a model on a second task BUILDING SECURITY THAT THINKS MACHINE LEARNING FUNDAMENTALS

Comparing traditional machine learning to deep learning Traditional Machine Learning CAR NOT CAR INPUT FEATURE EXTRACTION CLASSIFICATION OUTPUT Deep Learning CAR NOT CAR INPUT FEATURE EXTRACTION + CLASSIFICATION OUTPUT

The right tool for the job Machine learning is about making decisions based on the amount and type of information you have. Each algorithm solves a different problem. BUILDING SECURITY THAT THINKS MACHINE LEARNING FUNDAMENTALS

Applying machine learning to find the bad guys Traditional signatures Data science Short-lived reactive intelligence Long-lived predictive intelligence How the threat looks Find threats that you ve seen before Snapshot in time No local context What the threat does Find what all threats have in common Learning over time Local learning and context

Combine data science with security research Attacker Behavior models High-fidelity detection of things attackers must do No signatures: find known and unknown Security Research Identify, prioritize, and characterize fundamental attacker behaviors Validate models Data Science Determine best approach to identify behavior Develop and tune models

Example: External remote access External Remote Access Deep learning model Identifies targeted behavior even on unknown tools JQSnicker Security Research Data Science Training Set Recurrent Neural Net (Deep Learning) nopen

Example: Using a stolen admin credential Security Research Suspicious Kerberos Client Suspicious Admin Suspicious Remote Exec Data Science Authenticate using a stolen credential Administer a host using the stolen credential Move laterally using credential for remote execution (RPC) Learn normal user, services, domain controller for each host and identify mismatches Learn which systems each host administers, via which protocols, and identify abnormal administration Learn normal RPC usage (target, UUID, named pipe, account tuples) for each host and identify abnormal usage

Detecting mayhem based on probabilistic relationships Standard C&C Custom C&C Initial infection Botnet monetization Opportunistic threats Targeted threats Internal recon Lateral movement Acquire data Exfiltrate data Custom C&C & RAT

Build vs. buy If you re purchasing other people s ML Ask about the data they use: where they get it from, how much of it they actively operate on and how they ensure it isn t polluted If you re building your own ML How good is your data science team? How do you ensure that the data acquisition process has integrity? Does the data include the right features to detect the use cases you care about? How much heavy lifting is left for you? ML may find anomalies, but will your IR team be equipped to deal with them? BUILDING SECURITY THAT THINKS MACHINE LEARNING FUNDAMENTALS

What it takes to a build an algorithm Collect advanced attack samples Come up with advanced attacks Security Researchers Abstract the behavior and form a theory Collect positive and negative samples Security Researchers + Data Scientists Extract features out of the samples Work the theory on offline data Refine into detection model Improve and redeploy Deploy and test on live data Review results Design UI Product Designer Develop UI Developers Put detection into production Improve and redeploy Check efficacy; improve where necessary

Five questions to ask cybersecurity AI vendors 1. What type of machine learning algorithms does your product use? 2. How many machine learning algorithms does your product have, and how are they categorized? How frequently do you update them and release new algorithms? 3. How long until machine learning algorithms can trigger detections in a new environment? How many algorithms require a learning period, and how long does that take? 4. How does your product prioritize critical and high-risk hosts that require immediate attention from an analyst? 5. What is the workload reduction your product provides for security analysts? What kind of efficiency increase can be expected?

Five key takeaways 1. Machines learn from much more data than human learning. 2. Machines can access open data from a host of tasks worldwide and access millions of data points in milliseconds. 3. Machines can multitask through many more actions than humans. 4. Machine learning is not subject to human biases. 5. Machines don t stop learning when they reach the best we can do.

Thank You