Machine Learning and Privacy. Vitaly Shmatikov

Similar documents
Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Python Machine Learning

(Sub)Gradient Descent

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Softprop: Softmax Neural Network Backpropagation Learning

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Model Ensemble for Click Prediction in Bing Search Ads

Artificial Neural Networks written examination

CS Machine Learning

Lecture 1: Basic Concepts of Machine Learning

A study of speaker adaptation for DNN-based speech synthesis

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Evolutive Neural Net Fuzzy Filtering: Basic Description

Knowledge Transfer in Deep Convolutional Neural Nets

Active Learning. Yingyu Liang Computer Sciences 760 Fall

INPE São José dos Campos

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

CSL465/603 - Machine Learning

Word Segmentation of Off-line Handwritten Documents

Tun your everyday simulation activity into research

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

arxiv: v1 [cs.lg] 15 Jun 2015

Laboratorio di Intelligenza Artificiale e Robotica

Attributed Social Network Embedding

Laboratorio di Intelligenza Artificiale e Robotica

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Indian Institute of Technology, Kanpur

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

A Review: Speech Recognition with Deep Learning Methods

Finding Your Friends and Following Them to Where You Are

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

OFFICE SUPPORT SPECIALIST Technical Diploma

Human Emotion Recognition From Speech

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Modeling function word errors in DNN-HMM based LVCSR systems

arxiv: v1 [cs.lg] 7 Apr 2015

Learning From the Past with Experiment Databases

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Modeling function word errors in DNN-HMM based LVCSR systems

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Green Belt Curriculum (This workshop can also be conducted on-site, subject to price change and number of participants)

arxiv: v2 [cs.cv] 30 Mar 2017

An Introduction to Simio for Beginners

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

Knowledge-Based - Systems

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Axiom 2013 Team Description Paper

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Learning Methods for Fuzzy Systems

Assignment 1: Predicting Amazon Review Ratings

arxiv: v1 [cs.cv] 10 May 2017

EGE. Netspace/iinet. Google. Edmodoo. /enprovides. learning. page, provider? /intl/en/abou t. Coordinator. post in forums, on. message, Students to

Introduction to Simulation

Speech Emotion Recognition Using Support Vector Machine

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Probability and Statistics Curriculum Pacing Guide

Soft Computing based Learning for Cognitive Radio

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Speech Recognition at ICSI: Broadcast News and beyond

FIGURE 8.2. Job Shadow Workplace Supervisor Feedback Form.

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Generative models and adversarial training

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition

EXPO MILANO CALL Best Sustainable Development Practices for Food Security

Lecture 10: Reinforcement Learning

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

MYCIN. The MYCIN Task

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

WHEN THERE IS A mismatch between the acoustic

Linking Task: Identifying authors and book titles in verbose queries

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Welcome to. ECML/PKDD 2004 Community meeting

Calibration of Confidence Measures in Speech Recognition

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

An Online Handwriting Recognition System For Turkish

On-Line Data Analytics

Probabilistic Latent Semantic Analysis

Driving Author Engagement through IEEE Collabratec

On the Formation of Phoneme Categories in DNN Acoustic Models

Transcription:

Machine Learning and Privacy Vitaly Shmatikov

Typical Task: Classification Training Set Query Classification result airplane automobile ship truck slide 4

Deep Neural Networks input output slide 5

activation signals Deep Neural Networks bias 1 w_0 neurons in the previous layer a_1 a_2 a_k w_1 w_2 w_k parameters f(wa) activation function next layer Learn parameters using Stochastic Gradient Descent (SGD) slide 6

Parameter Training using SGD w_13 w_11 w_12 w_23 w_21 w_22 w_33 w_31 w_32 w_41 w_42 w_43 w_1i w_2j w_3k w_4n Find parameters that minimize the classification error slide 7

Parameter Training using SGD ) 1Feed-Forward input output slide 8

Parameter Training using SGD ) 2Back-propagation error slide 9

Parameter Training using SGD ) 3Gradient Descent error E slide 10

Parameter Training using SGD Parameter Update Repeat for new batches of training data slide 11

2014 Users data Services Threats Collection of sensitive personal data Anonymization and re-identification Inference attacks Side channels slide 12

2018 Users data Machine learning Services Do trained models leak sensitive data? Is it possible to train a good model while respecting privacy of training data? Is it possible to keep the model itself private? slide 13

Model Inversion Fredrikson et al. Given an output of a machine learning model, infer something about the input unexpected attributes slide 14

Model Inversion in Action Model given patient s genome determine correct warfarin dosage Privacy breach : given patient s warfarin dosage, infer information about patient s genome What does this chart measure? slide 15

Does Inference Breach Privacy? Model training set slide 16

Recommended Reading Frank McSherry. Statistical inference considered harmful https://github.com/frankmcsherry/blog/blob/master/posts/2016-06-14.md slide 17

Machine Learning as a Service Model Prediction API Training API Input from users, apps Classification DATA Sensitive! Transactions, preferences, online and offline behavior slide 18

Exploiting Trained Models Model Prediction API Training API Input from the training set Classification DATA Input not from the training set Classification recognize the difference slide 19

ML Against ML Model without knowing the specifics of the actual model! Prediction API Training API Input from the training set Classification DATA Train a model to Input not from the training set Classification recognize the difference slide 20

Training Attack Model using Shadow Models Target Model Shadow Model 1 Shadow Model 2 Shadow Model k Train 1 classification classification Test 1 Train 2 Test 2 Train k Test k classification IN OUT IN OUT IN OUT Train the attack model to predict if an input was a member of the training set (in ) or a non-member (out) slide 21

Training Data for Shadow Models Real: must be similar to training data of the target model (drawn from same distribution) Synthetic: sample feature values from (known) marginal distributions Synthetic: exploit target model Sample from inputs classified by the target model with high confidence Confidence of target model s predictions input space target s training inputs slide 22

Synthesizing Shadow Training Data Hill-climb the space of possible inputs to find those classified by the target model with high confidence Sample from these inputs to synthesize the training dataset for shadow models If many candidate inputs rejected by the target model, re-randomize some features and try again slide 23

Membership Inference Attack Input (data) Output (classes and confidence values) airplane automobile ship truck Was this image part of the training set? slide 24

Model Prediction API Training API target data record DATA Membership Inference Attack Was this record in the training set? Training Set slide 25

Minimum Attack Accuracy on 75% of classes 0.8 0.9 Purchase Dataset Classify Customers slide 26

Next Step: Reconstruction Model Prediction API Training API Partial record????? DATA Auxiliary information, public databases, accidentally revealed data INFER hidden parts of the customer record Example: store purchases or mobile phone locations slide 27

Why Do These Attacks Work? Model Overfitted! Prediction API Training API Membership Inference Reconstruction DATA slide 28

Attack Success vs. Test-Train Gap More overfitted slide 29

Privacy : Does the model leak information about data in the training set? Learning : Does the model generalize to data outside the training set? Model training set Overfitting is the common enemy data universe slide 30

Does Inference Breach Privacy? SCIENCE! Model training set PRIVACY BREACH! Privacy breach = risk of membership: Gap between what can be inferred from the model about a member of the training set and an arbitrary input from the population slide 31

Non-Members Risk of membership Baseline (use statistics) Members of Training Set Purchase Dataset Classify Customers Google API slide 32

Future Modern machine learning is both a threat and an opportunity for data privacy For once, privacy and utility are not in conflict: overfitting is the common enemy Overfitted models leak training data Overfitted models lack predictive power Need generalizability and accuracy slide 33

Utility Privacy-preserving machine learning Privacy slide 34