Multivariate models and machine learning for fmri

Multivariate models and machine learning for fmri Methods and Models in fmri, 15.11.2016 Jakob Heinzle heinzle@biomed.ee.ethz.ch Translational Neuromodeling Unit (TNU) Institute for Biomedical Engineering (IBT) University and ETH Zürich Many thanks to Sudhir Raman and Kay Brodersen for material 1 Translational Neuromodeling Unit

Overview Motivation Modelling Terminology Learning from data Multivariate Bayes in SPM Generative Embedding fmri Analysis and Classifcation 2

Why multivariate? Univariate approaches are excellent for localizing activations in individual voxels. * n.s. v 1 v 2 v 1 v 2 reward no reward

Why multivariate? Multivariate approaches can be used to examine responses that are jointly encoded in multiple voxels. n.s. n.s. v 1 v 2 v 1 v 2 v 2 orange juice apple juice v 1

A bit of history Multidymensional scaling Psychophysical rating fmri Two-dimensional projection of similarity measure for both psychophysical rating and fmri response. Edelman et al, Psychobiology, 1998 fmri Analysis and Classifcation 5

A bit of history Classification Studies Haxby et al, Science, 2001 fmri Analysis and Classifcation 6

A bit of history Classification Studies Kamitani and Tong, Nat Neurosci, 2005 fmri Analysis and Classifcation 7

Representational similarity analysis Idea: Compare the similarity of representations (correlation between activation patterns) between different stimuli. Allows for a comparison between monkey (neural firing pattern) and human (fmri activation patterns). Kriegeskorte et al, Neuron, 2008 fmri Analysis and Classifcation 8

Overview Motivation Modelling Terminology Learning from data Multivariate Bayes in SPM Generative Embedding fmri Analysis and Classifcation 9

Analysis steps Feature Extraction Classification Clustering Modelling Regression Inference Cross validation Performance Prediction Model Selection

Feature space F 1 F 2... F P S 1 1 0.5 Features Data Points S 2 0 5.7. 1 4. 1 5.3 S N 1 6.6 Discrete Continuous

Feature selection for fmri multivariate analysis Different features answer different questions. Reducing the dimensionality might reduce noise, but could also reduce relevant information. Model parameters Mean values Raw data Correlations between regions Model Parameters, e.g. DCM fmri Analysis and Classifcation 12

Model selection - Generalizability Model Fit Model Complexity Bishop (2006), Pitt & Miyung (2002), TICS fmri Analysis and Classifcation 13

Encoding and decoding models condition stimulus response prediction error encoding model gg: XX tt YY tt decoding model h: YY tt XX tt context (cause or consequence) XX tt R dd BOLD signal YY tt R vv fmri Analysis and Classifcation 14

Modelling goals Prediction Y h X Predictive Density

Modelling goals Model Selection Sparse Coding Distributed Coding Model Evidence

Overview Motivation Modelling Concepts Learning From Data Multivariate Bayes in SPM Generative Embedding

Learning from data Supervised Learning Unsupervised Learning Reinforcement Learning Semi-supervised Learning Labels for training data are known! Labels for training data are NOT known!

Supervised learning Independent variables X Function - f Continuous dependent variable Y Categorical

Classification X Function - f Y Kernel Methods Support Vector Machines φφ Kernel Function K xx ii, xx jj = φφ xx ii. φφ xx jj Kernel methods for pattern analysis, Taylor, Cristianini, 2004

Gaussian Processes Other popular classifiers C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, Deep Belief networks http://deeplearning.net/tutorial/dbn.html G.E. Hinton, S. Osindero, and Y. Teh, A fast learning algorithm for deep belief nets, Neural Computation, vol 18, 2006

Generative and Discriminative classifiers Generative classifiers Learn the parameters for the functions p(y) and p(x Y), e.g. Naïve Bayes Classifier Discriminative classifiers Learn the parameters for p(y X), e.g. logistic regression, SVM fmri Analysis and Classifcation 22

Cross-validation The generalization ability of a classifier can be estimated using a resampling procedure known as cross-validation. One example is 2-fold cross-validation: examples 1? training example 2?? test examples 3? 99 100...??... Model Selection Performance evaluation 1 2 folds Balanced Accuracy F1 Score performance evaluation fmri Analysis and Classifcation 23

Cross-validation Another commonly used variant is leave-one-out cross-validation. examples 1? training example 2?? test example 3?.................. 99? 100? 1 2 98 99 100 folds performance evaluation In fmri often leave one-run-out fmri Analysis and Classifcation 24

Performance Single Subject Binomial Test pp = PP XX kk HH 0 = 1 BB kk nn, ππ 0 Brodersen et al. 2013, NeuroImage k=30!!! Cross-validated data are not necessarily binomially distributed Permutation tests are better!!! fmri Analysis and Classifcation 25

Performance Mulitple subjects Random effects Fixed effects http://www.translationalneuromodeling.org/tapas/ Brodersen et al. 2013, NeuroImage fmri Analysis and Classifcation 26

Confounds GLM vs. MVPA Todd et al. 2013, NeuroImage fmri Analysis and Classifcation 27

Second level t-tests for accuracies? True β-values are normally distributed. True accuracies are not normal and truncated at chance. A possible solution is given by Allefeld et al. Allefeld et al. Neuroimage, 2016 fmri Analysis and Classifcation 28

Statistical testing with classification Within subjects: Permutation statistics Parametric tests ar not valid (assumptions not met), e.g. Biomialor t-test (c.f. Schreiber and Krekelberg, 2013). Across subjects: Assumptions for t-tests are not met Full Bayesian model (Bordersen et al. 2013, but assumptions are not met for CV) Use prevalence statistic proposed in Allefeld et al., 2016 fmri Analysis and Classifcation 29

Research questions for classification Overall classification accuracy accuracy 100 % Spatial deployment of discriminative regions 80% 50 % Left or right button? Truth or lie? Healthy or ill? classification task 55% Temporal evolution of discriminability Model-based classification accuracy 100 % 50 % Accuracy rises above chance Participant indicates decision within-trial time { group 1, group 2 } Pereira et al. (2009) NeuroImage, Brodersen et al. (2009) The New Collection fmri Analysis and Classifcation 30

Decoding «hidden» intentions searchlight approach Haynes et al., Current Biology, 2007 fmri Analysis and Classifcation 31

Decoding of free decisions Decoding of fingerpresses (red line). Participants freely choose timing and hand. Earliest information about left-right long before execution free will? Soon et al., Nat Neurosci, 2008 fmri Analysis and Classifcation 32

Decoding task preparation connectitivy based decoding SV-Classifier on connectivity graph (correlation) Discriminative maps Heinzle et al., J Neurosci, 2012 fmri Analysis and Classifcation 33

Unsupervised learning Building a representation of data Dimensionality Reduction Clustering Time series K-means Mixture models fmri Analysis and Classifcation 34

K-means clustering Cost function Algorithm 1. Initialize 2. Estimate assignments 3. Estimate cluster centroids 4. Repeat 2,3 until convergence Bishop PRML (2006) fmri Analysis and Classifcation 35

Clustering Mixture of Gaussians Bishop PRML (2006) fmri Analysis and Classifcation 36

Interpretation Cluster parameters Cluster 1 Cluster 2 Internal Criterion Model Evidence External Criterion - Purity Inferred Labels Subjects External Labels fmri Analysis and Classifcation 37

Motivation Modelling Learning from Data Multivariate Bayes in SPM Generative Embedding fmri Analysis and Classifcation 38

Encoding vs. Decoding models fmri Analysis and Classifcation 39

Encoding vs. Decoding models fmri Analysis and Classifcation 40

Coding Hypotheses Sparse vectors Spatial vectors Smooth vectors Distributed vectors Singular vectors of data UUUUVV TT = RRYY TT Support vectors UU = RRYY TT fmri Analysis and Classifcation 41

Coding Hypotheses Friston et al. 2008 NeuroImage fmri Analysis and Classifcation 42

Solved with variational Bayes Friston et al. 2008 NeuroImage fmri Analysis and Classifcation 43

Example Decoding of motion. Attention to motion dataset - Büchel & Friston 1999 Cerebral Cortex Experimental factors: 1. Photic 2. Motion 3. Attention Friston et al. 2008 NeuroImage fmri Analysis and Classifcation 44

Friston et al. 2008 NeuroImage fmri Analysis and Classifcation 45

Results Friston et al. 2008 NeuroImage fmri Analysis and Classifcation 46

Multivariate Bayes in SPM Motion contrast(s) < < 3 SPMmip [-36, -87, -3] < SPM{T 338 } 50 100 150 200 250 SPMresults:.\SPM-practical\attention\GLM Height threshold T = 4.874226 {p<0.05 (FWE)} 300 60 log-evidence maximum p = 100.00% 500 distribution of weights 40 400 20 0 frequency 300 200 100 adjusted response -20 0.5 0-0.5 1 2 3 4 5 partitions PPM: MVB_Motion (Motion) MVB_Motion (prior: sparse) target prediction prediction 0-0.04-0.02 0 0.02 0.04 Posterior probabilities voxel-weight at maxima p( w > 0) location (x,y,z) weight (w) p = 0.993-39.0,-90.0,-3.0mm q = 0.0254; p = 0.983-33.0,-99.0,-3.0mm q = -0.0216; p = 0.983-30.0,-99.0,3.0mm q = 0.0211; p = 0.982-42.0,-90.0,9.0mm q = 0.0201; p = 0.980-45.0,-75.0,-3.0mm q = 0.0168; p = 0.979-30.0,-84.0,6.0mm q = -0.0187; p = 0.977-39.0,-87.0,3.0mm q = -0.0196; p = 0.973-30.0,-84.0,-6.0mm q = -0.0204; p = 0.972-39.0,-81.0,-15.0mm q = 0.0166; p = 0.946-36.0,-84.0,12.0mm q = -0.0144; p = 0.933-48.0,-84.0,-3.0mm q = -0.0119; p = 0.929-39.0,-75.0,3.0mm q = -0.0160; 506 observed voxels; 360 and scans predicted contrast SNR (variance) 0.64 0.4 0.2 0-0.2-1 0 100 200 300 400 scans -0.4-1 -0.5 0 0.5 contrast fmri Analysis and Classifcation 47

Laminar activity related to novelty and episodic memory Maas et al. 2014 Nature Communications fmri Analysis and Classifcation 48

Motivation Modelling Principles Learning from Data Multivariate Bayes in SPM Generative Embedding fmri Analysis and Classifcation 49

Classifying Groups of Subjects Voxel activity Connectivity Subject 1 Subject 2. Dynamic causal model (DCM) High dimensionality Unusual cluster distributions Lack of interpretation Subject 1 Subject 2. Subject N Classification Clustering Subject N Group 1 Group 2 fmri Analysis and Classifcation 50

Generative Embedding Brodersen et al. PLOS computation biology 2011. fmri Analysis and Classifcation 51

DCM for speech processing fmri Analysis and Classifcation 52

Working memory in Schizophrenia 41 Schizophrenia patients (DSM IV,ICD 10), 42 controls Visual numeric n-back working memory task 1 900ms 5 3 5 500ms 4 2 9 8 9 Deserno et al (2012) The Journal of Neuroscience fmri Analysis and Classifcation 53

Model based clustering Brodersen et al 2014 Neuroimage fmri Analysis and Classifcation 54

Results healthy vs. schizophrenia patients Brodersen et al 2014 Neuroimage fmri Analysis and Classifcation 55

Within patients clustering Brodersen et al 2014 Neuroimage fmri Analysis and Classifcation 56

Be aware Interpretation of decoding or classification results is difficult. The decoded information must be in the data, but in what features exactly is often hard to find out fmri Analysis and Classifcation 57

Summary Summary Modelling Principles Learning from Data Multivariate Bayes in SPM Generative Embedding fmri Analysis and Classifcation 58

Acknowledgments Many thanks to K.E. Stephan, Sudhir S. Raman and K. Brodersen for sharing their teaching material. fmri Analysis and Classifcation 59