Christopher Morales, Head of Security Analytics Vectra Building security that thinks Machine learning fundamentals for cybersecurity professionals
What makes a machine intelligent? Artificial Intelligence Programs with the ability to learn and reason like humans Machine Learning Algorithms with the ability to learn without being explicitly programmed Deep Learning Subset of machine learning in which artificial neural networks adapt and learn from vast amounts of data
Types of machine learning (ML) Task driven Supervised Random forest Support vector machine Deep Learning Unsupervised Clustering Data driven
Supervised Machine Learning Classification Predicting a label Am I hungry? Yes No Regression Predicting a quantity Do I have $25? Yes No Go to sleep. Go to restaurant. Buy a hamburger. BUILDING SECURITY THAT THINKS MACHINE LEARNING FUNDAMENTALS
Unsupervised Machine Learning Clustering Create groups (clusters) based on the similarities of the examples BUILDING SECURITY THAT THINKS MACHINE LEARNING FUNDAMENTALS
Deep learning Transfer learning Task is reused as the starting point for a model on a second task BUILDING SECURITY THAT THINKS MACHINE LEARNING FUNDAMENTALS
Comparing traditional machine learning to deep learning Traditional Machine Learning CAR NOT CAR INPUT FEATURE EXTRACTION CLASSIFICATION OUTPUT Deep Learning CAR NOT CAR INPUT FEATURE EXTRACTION + CLASSIFICATION OUTPUT
The right tool for the job Machine learning is about making decisions based on the amount and type of information you have. Each algorithm solves a different problem. BUILDING SECURITY THAT THINKS MACHINE LEARNING FUNDAMENTALS
Applying machine learning to find the bad guys Traditional signatures Data science Short-lived reactive intelligence Long-lived predictive intelligence How the threat looks Find threats that you ve seen before Snapshot in time No local context What the threat does Find what all threats have in common Learning over time Local learning and context
Combine data science with security research Attacker Behavior models High-fidelity detection of things attackers must do No signatures: find known and unknown Security Research Identify, prioritize, and characterize fundamental attacker behaviors Validate models Data Science Determine best approach to identify behavior Develop and tune models
Example: External remote access External Remote Access Deep learning model Identifies targeted behavior even on unknown tools JQSnicker Security Research Data Science Training Set Recurrent Neural Net (Deep Learning) nopen
Example: Using a stolen admin credential Security Research Suspicious Kerberos Client Suspicious Admin Suspicious Remote Exec Data Science Authenticate using a stolen credential Administer a host using the stolen credential Move laterally using credential for remote execution (RPC) Learn normal user, services, domain controller for each host and identify mismatches Learn which systems each host administers, via which protocols, and identify abnormal administration Learn normal RPC usage (target, UUID, named pipe, account tuples) for each host and identify abnormal usage
Detecting mayhem based on probabilistic relationships Standard C&C Custom C&C Initial infection Botnet monetization Opportunistic threats Targeted threats Internal recon Lateral movement Acquire data Exfiltrate data Custom C&C & RAT
Build vs. buy If you re purchasing other people s ML Ask about the data they use: where they get it from, how much of it they actively operate on and how they ensure it isn t polluted If you re building your own ML How good is your data science team? How do you ensure that the data acquisition process has integrity? Does the data include the right features to detect the use cases you care about? How much heavy lifting is left for you? ML may find anomalies, but will your IR team be equipped to deal with them? BUILDING SECURITY THAT THINKS MACHINE LEARNING FUNDAMENTALS
What it takes to a build an algorithm Collect advanced attack samples Come up with advanced attacks Security Researchers Abstract the behavior and form a theory Collect positive and negative samples Security Researchers + Data Scientists Extract features out of the samples Work the theory on offline data Refine into detection model Improve and redeploy Deploy and test on live data Review results Design UI Product Designer Develop UI Developers Put detection into production Improve and redeploy Check efficacy; improve where necessary
Five questions to ask cybersecurity AI vendors 1. What type of machine learning algorithms does your product use? 2. How many machine learning algorithms does your product have, and how are they categorized? How frequently do you update them and release new algorithms? 3. How long until machine learning algorithms can trigger detections in a new environment? How many algorithms require a learning period, and how long does that take? 4. How does your product prioritize critical and high-risk hosts that require immediate attention from an analyst? 5. What is the workload reduction your product provides for security analysts? What kind of efficiency increase can be expected?
Five key takeaways 1. Machines learn from much more data than human learning. 2. Machines can access open data from a host of tasks worldwide and access millions of data points in milliseconds. 3. Machines can multitask through many more actions than humans. 4. Machine learning is not subject to human biases. 5. Machines don t stop learning when they reach the best we can do.
Thank You