Multivariate models and machine learning for fmri Methods and Models in fmri, 15.11.2016 Jakob Heinzle heinzle@biomed.ee.ethz.ch Translational Neuromodeling Unit (TNU) Institute for Biomedical Engineering (IBT) University and ETH Zürich Many thanks to Sudhir Raman and Kay Brodersen for material 1 Translational Neuromodeling Unit
Overview Motivation Modelling Terminology Learning from data Multivariate Bayes in SPM Generative Embedding fmri Analysis and Classifcation 2
Why multivariate? Univariate approaches are excellent for localizing activations in individual voxels. * n.s. v 1 v 2 v 1 v 2 reward no reward
Why multivariate? Multivariate approaches can be used to examine responses that are jointly encoded in multiple voxels. n.s. n.s. v 1 v 2 v 1 v 2 v 2 orange juice apple juice v 1
A bit of history Multidymensional scaling Psychophysical rating fmri Two-dimensional projection of similarity measure for both psychophysical rating and fmri response. Edelman et al, Psychobiology, 1998 fmri Analysis and Classifcation 5
A bit of history Classification Studies Haxby et al, Science, 2001 fmri Analysis and Classifcation 6
A bit of history Classification Studies Kamitani and Tong, Nat Neurosci, 2005 fmri Analysis and Classifcation 7
Representational similarity analysis Idea: Compare the similarity of representations (correlation between activation patterns) between different stimuli. Allows for a comparison between monkey (neural firing pattern) and human (fmri activation patterns). Kriegeskorte et al, Neuron, 2008 fmri Analysis and Classifcation 8
Overview Motivation Modelling Terminology Learning from data Multivariate Bayes in SPM Generative Embedding fmri Analysis and Classifcation 9
Analysis steps Feature Extraction Classification Clustering Modelling Regression Inference Cross validation Performance Prediction Model Selection
Feature space F 1 F 2... F P S 1 1 0.5 Features Data Points S 2 0 5.7. 1 4. 1 5.3 S N 1 6.6 Discrete Continuous
Feature selection for fmri multivariate analysis Different features answer different questions. Reducing the dimensionality might reduce noise, but could also reduce relevant information. Model parameters Mean values Raw data Correlations between regions Model Parameters, e.g. DCM fmri Analysis and Classifcation 12
Model selection - Generalizability Model Fit Model Complexity Bishop (2006), Pitt & Miyung (2002), TICS fmri Analysis and Classifcation 13
Encoding and decoding models condition stimulus response prediction error encoding model gg: XX tt YY tt decoding model h: YY tt XX tt context (cause or consequence) XX tt R dd BOLD signal YY tt R vv fmri Analysis and Classifcation 14
Modelling goals Prediction Y h X Predictive Density
Modelling goals Model Selection Sparse Coding Distributed Coding Model Evidence
Overview Motivation Modelling Concepts Learning From Data Multivariate Bayes in SPM Generative Embedding
Learning from data Supervised Learning Unsupervised Learning Reinforcement Learning Semi-supervised Learning Labels for training data are known! Labels for training data are NOT known!
Supervised learning Independent variables X Function - f Continuous dependent variable Y Categorical
Classification X Function - f Y Kernel Methods Support Vector Machines φφ Kernel Function K xx ii, xx jj = φφ xx ii. φφ xx jj Kernel methods for pattern analysis, Taylor, Cristianini, 2004
Gaussian Processes Other popular classifiers C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, Deep Belief networks http://deeplearning.net/tutorial/dbn.html G.E. Hinton, S. Osindero, and Y. Teh, A fast learning algorithm for deep belief nets, Neural Computation, vol 18, 2006
Generative and Discriminative classifiers Generative classifiers Learn the parameters for the functions p(y) and p(x Y), e.g. Naïve Bayes Classifier Discriminative classifiers Learn the parameters for p(y X), e.g. logistic regression, SVM fmri Analysis and Classifcation 22
Cross-validation The generalization ability of a classifier can be estimated using a resampling procedure known as cross-validation. One example is 2-fold cross-validation: examples 1? training example 2?? test examples 3? 99 100...??... Model Selection Performance evaluation 1 2 folds Balanced Accuracy F1 Score performance evaluation fmri Analysis and Classifcation 23
Cross-validation Another commonly used variant is leave-one-out cross-validation. examples 1? training example 2?? test example 3?.................. 99? 100? 1 2 98 99 100 folds performance evaluation In fmri often leave one-run-out fmri Analysis and Classifcation 24
Performance Single Subject Binomial Test pp = PP XX kk HH 0 = 1 BB kk nn, ππ 0 Brodersen et al. 2013, NeuroImage k=30!!! Cross-validated data are not necessarily binomially distributed Permutation tests are better!!! fmri Analysis and Classifcation 25
Performance Mulitple subjects Random effects Fixed effects http://www.translationalneuromodeling.org/tapas/ Brodersen et al. 2013, NeuroImage fmri Analysis and Classifcation 26
Confounds GLM vs. MVPA Todd et al. 2013, NeuroImage fmri Analysis and Classifcation 27
Second level t-tests for accuracies? True β-values are normally distributed. True accuracies are not normal and truncated at chance. A possible solution is given by Allefeld et al. Allefeld et al. Neuroimage, 2016 fmri Analysis and Classifcation 28
Statistical testing with classification Within subjects: Permutation statistics Parametric tests ar not valid (assumptions not met), e.g. Biomialor t-test (c.f. Schreiber and Krekelberg, 2013). Across subjects: Assumptions for t-tests are not met Full Bayesian model (Bordersen et al. 2013, but assumptions are not met for CV) Use prevalence statistic proposed in Allefeld et al., 2016 fmri Analysis and Classifcation 29
Research questions for classification Overall classification accuracy accuracy 100 % Spatial deployment of discriminative regions 80% 50 % Left or right button? Truth or lie? Healthy or ill? classification task 55% Temporal evolution of discriminability Model-based classification accuracy 100 % 50 % Accuracy rises above chance Participant indicates decision within-trial time { group 1, group 2 } Pereira et al. (2009) NeuroImage, Brodersen et al. (2009) The New Collection fmri Analysis and Classifcation 30
Decoding «hidden» intentions searchlight approach Haynes et al., Current Biology, 2007 fmri Analysis and Classifcation 31
Decoding of free decisions Decoding of fingerpresses (red line). Participants freely choose timing and hand. Earliest information about left-right long before execution free will? Soon et al., Nat Neurosci, 2008 fmri Analysis and Classifcation 32
Decoding task preparation connectitivy based decoding SV-Classifier on connectivity graph (correlation) Discriminative maps Heinzle et al., J Neurosci, 2012 fmri Analysis and Classifcation 33
Unsupervised learning Building a representation of data Dimensionality Reduction Clustering Time series K-means Mixture models fmri Analysis and Classifcation 34
K-means clustering Cost function Algorithm 1. Initialize 2. Estimate assignments 3. Estimate cluster centroids 4. Repeat 2,3 until convergence Bishop PRML (2006) fmri Analysis and Classifcation 35
Clustering Mixture of Gaussians Bishop PRML (2006) fmri Analysis and Classifcation 36
Interpretation Cluster parameters Cluster 1 Cluster 2 Internal Criterion Model Evidence External Criterion - Purity Inferred Labels Subjects External Labels fmri Analysis and Classifcation 37
Motivation Modelling Learning from Data Multivariate Bayes in SPM Generative Embedding fmri Analysis and Classifcation 38
Encoding vs. Decoding models fmri Analysis and Classifcation 39
Encoding vs. Decoding models fmri Analysis and Classifcation 40
Coding Hypotheses Sparse vectors Spatial vectors Smooth vectors Distributed vectors Singular vectors of data UUUUVV TT = RRYY TT Support vectors UU = RRYY TT fmri Analysis and Classifcation 41
Coding Hypotheses Friston et al. 2008 NeuroImage fmri Analysis and Classifcation 42
Solved with variational Bayes Friston et al. 2008 NeuroImage fmri Analysis and Classifcation 43
Example Decoding of motion. Attention to motion dataset - Büchel & Friston 1999 Cerebral Cortex Experimental factors: 1. Photic 2. Motion 3. Attention Friston et al. 2008 NeuroImage fmri Analysis and Classifcation 44
Friston et al. 2008 NeuroImage fmri Analysis and Classifcation 45
Results Friston et al. 2008 NeuroImage fmri Analysis and Classifcation 46
Multivariate Bayes in SPM Motion contrast(s) < < 3 SPMmip [-36, -87, -3] < SPM{T 338 } 50 100 150 200 250 SPMresults:.\SPM-practical\attention\GLM Height threshold T = 4.874226 {p<0.05 (FWE)} 300 60 log-evidence maximum p = 100.00% 500 distribution of weights 40 400 20 0 frequency 300 200 100 adjusted response -20 0.5 0-0.5 1 2 3 4 5 partitions PPM: MVB_Motion (Motion) MVB_Motion (prior: sparse) target prediction prediction 0-0.04-0.02 0 0.02 0.04 Posterior probabilities voxel-weight at maxima p( w > 0) location (x,y,z) weight (w) p = 0.993-39.0,-90.0,-3.0mm q = 0.0254; p = 0.983-33.0,-99.0,-3.0mm q = -0.0216; p = 0.983-30.0,-99.0,3.0mm q = 0.0211; p = 0.982-42.0,-90.0,9.0mm q = 0.0201; p = 0.980-45.0,-75.0,-3.0mm q = 0.0168; p = 0.979-30.0,-84.0,6.0mm q = -0.0187; p = 0.977-39.0,-87.0,3.0mm q = -0.0196; p = 0.973-30.0,-84.0,-6.0mm q = -0.0204; p = 0.972-39.0,-81.0,-15.0mm q = 0.0166; p = 0.946-36.0,-84.0,12.0mm q = -0.0144; p = 0.933-48.0,-84.0,-3.0mm q = -0.0119; p = 0.929-39.0,-75.0,3.0mm q = -0.0160; 506 observed voxels; 360 and scans predicted contrast SNR (variance) 0.64 0.4 0.2 0-0.2-1 0 100 200 300 400 scans -0.4-1 -0.5 0 0.5 contrast fmri Analysis and Classifcation 47
Laminar activity related to novelty and episodic memory Maas et al. 2014 Nature Communications fmri Analysis and Classifcation 48
Motivation Modelling Principles Learning from Data Multivariate Bayes in SPM Generative Embedding fmri Analysis and Classifcation 49
Classifying Groups of Subjects Voxel activity Connectivity Subject 1 Subject 2. Dynamic causal model (DCM) High dimensionality Unusual cluster distributions Lack of interpretation Subject 1 Subject 2. Subject N Classification Clustering Subject N Group 1 Group 2 fmri Analysis and Classifcation 50
Generative Embedding Brodersen et al. PLOS computation biology 2011. fmri Analysis and Classifcation 51
DCM for speech processing fmri Analysis and Classifcation 52
Working memory in Schizophrenia 41 Schizophrenia patients (DSM IV,ICD 10), 42 controls Visual numeric n-back working memory task 1 900ms 5 3 5 500ms 4 2 9 8 9 Deserno et al (2012) The Journal of Neuroscience fmri Analysis and Classifcation 53
Model based clustering Brodersen et al 2014 Neuroimage fmri Analysis and Classifcation 54
Results healthy vs. schizophrenia patients Brodersen et al 2014 Neuroimage fmri Analysis and Classifcation 55
Within patients clustering Brodersen et al 2014 Neuroimage fmri Analysis and Classifcation 56
Be aware Interpretation of decoding or classification results is difficult. The decoded information must be in the data, but in what features exactly is often hard to find out fmri Analysis and Classifcation 57
Summary Summary Modelling Principles Learning from Data Multivariate Bayes in SPM Generative Embedding fmri Analysis and Classifcation 58
Acknowledgments Many thanks to K.E. Stephan, Sudhir S. Raman and K. Brodersen for sharing their teaching material. fmri Analysis and Classifcation 59