Communities in Networks. Peter J. Mucha, UNC Chapel Hill

Similar documents
CSC200: Lecture 4. Allan Borodin

BMBF Project ROBUKOM: Robust Communication Networks

Python Machine Learning

Curriculum Design Project with Virtual Manipulatives. Gwenanne Salkind. George Mason University EDCI 856. Dr. Patricia Moyer-Packenham

Attributed Social Network Embedding

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

Seminar - Organic Computing

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are:

Spring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes

AP Statistics Summer Assignment 17-18

Activity 2 Multiplying Fractions Math 33. Is it important to have common denominators when we multiply fraction? Why or why not?

DRAFT Strategic Plan INTERNAL CONSULTATION DOCUMENT. University of Waterloo. Faculty of Mathematics

Networks and the Diffusion of Cutting-Edge Teaching and Learning Knowledge in Sociology

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

STA 225: Introductory Statistics (CT)

CROSS COUNTRY CERTIFICATION STANDARDS

CS Machine Learning

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Pair Programming. Spring 2015

Networks in Cognitive Science

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

A study of speaker adaptation for DNN-based speech synthesis

Algebra 2- Semester 2 Review

Firms and Markets Saturdays Summer I 2014

Lecture 1: Machine Learning Basics

Book Review: Build Lean: Transforming construction using Lean Thinking by Adrian Terry & Stuart Smith

Artificial Neural Networks written examination

CS224W Final Project Finding Current Topics in News Media via Networks of Words

Comment-based Multi-View Clustering of Web 2.0 Items

Aalya School. Parent Survey Results

Abu Dhabi Indian. Parent Survey Results

The Effect of Collaborative Partnerships on Interorganizational

Abu Dhabi Grammar School - Canada

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Online Master of Business Administration (MBA)

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

SUMMARY REPORT MONROE COUNTY, OH OFFICIAL RESULTS PRIMARY ELECTION MARCH 6, 2012 RUN DATE:03/20/12 11:03 AM STATISTICS REPORT-EL45 PAGE 001

ASTR 102: Introduction to Astronomy: Stars, Galaxies, and Cosmology

Beyond the Pipeline: Discrete Optimization in NLP

Introduction to Causal Inference. Problem Set 1. Required Problems

(I couldn t find a Smartie Book) NEW Grade 5/6 Mathematics: (Number, Statistics and Probability) Title Smartie Mathematics

Pre-AP Geometry Course Syllabus Page 1

Discriminative Learning of Beam-Search Heuristics for Planning

Getting Started with Deliberate Practice

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Team Formation for Generalized Tasks in Expertise Social Networks

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

arxiv: v2 [cs.cv] 30 Mar 2017

Political Science Department Program Learning Outcomes

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

An Introduction to Simio for Beginners

Texas Wisconsin California Control Consortium Group Highlights

How People Learn Physics

Go fishing! Responsibility judgments when cooperation breaks down

SARDNET: A Self-Organizing Feature Map for Sequences

Understanding and Changing Habits

Argument structure and theta roles

Average Daily Membership Proposed Change to Chapter 8 Rules and Regulations for the Wyoming School Foundation Program

A simulated annealing and hill-climbing algorithm for the traveling tournament problem

A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

Calibration of Confidence Measures in Speech Recognition

Create Quiz Questions

Spinners at the School Carnival (Unequal Sections)

An Empirical and Computational Test of Linguistic Relativity

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Summary / Response. Karl Smith, Accelerations Educational Software. Page 1 of 8

ACADEMIC AND COLLEGE PLANNING NIGHT

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Let s Meet the Presidents

Program Assessment and Alignment

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Active Ingredients of Instructional Coaching Results from a qualitative strand embedded in a randomized control trial

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Evolution of Symbolisation in Chimpanzees and Neural Nets

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

ZACHARY J. OSTER CURRICULUM VITAE


AAC/BOT Page 1 of 9

Georgetown University at TREC 2017 Dynamic Domain Track

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME

Information and Instructions

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Intel-powered Classmate PC. SMART Response* Training Foils. Version 2.0

University of Illinois

IN-STATE PROGRAMS. NC Summer Institute in Choral Art Young singers work with renowned conductors. Website:

arxiv: v1 [cs.cl] 2 Apr 2017

Notetaking Directions

CLINICAL TRAINING AGREEMENT

Physical Features of Humans

CS 100: Principles of Computing

Evolutive Neural Net Fuzzy Filtering: Basic Description

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

SSIS SEL Edition Overview Fall 2017

Issues in the Mining of Heart Failure Datasets

Transcription:

Communities in Networks Peter J. Mucha, UNC Chapel Hill

Outline & Acknowledgements 1. What is community detection and why is it useful? 2. How do you calculate communities? Descriptive: e.g., Modularity Generative: e.g., Stochastic Block Models 3. Where is community detection going in the future? Skyler Cranmer, James Fowler, Jeff Henderson, Jim Moody, J.-P. Onnela, Mason Porter Dani Bassett, Kaveri Chaturvedi, Saray Shai, Dane Taylor Natalie Stanley, Mandi Traud, Andrew Waugh, James Wilson Eric Kelsic, Kevin Macon, Thomas Richardson JSMF, UCRF (UNC), ARO, CDC, NICHD, NIDDK, NIGMS, NSF Apologies that this presentation will seriously err on the self-absorbed side. It s a big field, and I do not promise to cover even a small piece of it here.

Philosophical Disclaimer Jim Moody (paraphrased): I ve been accused of turning everything into a network. PJM (in response): I m accused of turning everything into a network and a graph partitioning problem. Structure Function Images by Aaron Clauset

Karate Club Example This partition optimizes modularity, which measures the number of intra-community ties (relative to a random model) If your method doesn t work on this network, then go home.

Karate Club Club Cris Moore (left) is the inaugural recipient of the Zachary Karate Club Club prize, awarded on behalf of the community by Aric Hagberg (right). (9 May 2013)

Community Detection Firehose Overview Hard/rigid v. soft/overlapping clusters cf. biclustering methods and mathematics of expander graphs A community should describe a cohesive group : varying formulations/algorithms Linkage clustering (average, single), local clustering coefficients, betweeness (geodesic, random walk), spectral, conductance, Classic approach in CS: Spectral Graph Partitioning Need to specify number of communities sought Conductance MDL, Infomap, OSLOM, (many other things I ve missed) Stochastic Block Models: generative with in/out probabilities between labeled groups Modularity: a good partition has more total intra-community edge weight than one would expect at random (but according to what model?) Communities in Networks, M. A. Porter, J.-P. Onnela & P. J. Mucha, Notices of the American Mathematical Society 56, 1082-97 & 1164-6 (2009). Community Detection in Graphs, S. Fortunato, Physics Reports 486, 75-174 (2010). Community detection in networks: A user guide, S. Fortunato & D. Hric, Physics Reports 659, 1-44 (2016). Case studies in network community detection, S. Shai, N. Stanley, C. Granell, D. Taylor & P. J. Mucha, arxiv:1705.02305.

Modularity (see Newman & Girvan and other Newman papers) GOAL: Assign nodes to communities in order to maximize quality function Q NP-Complete [Brandes et al. 2008] ~ enumerate possible partitions Numerous packages developed/developing e.g. igraph library (R, python), NetworkX, Louvain Need appropriate null model

Modularity (see Newman & Girvan and other Newman papers) ER degree distribution (binomial/poisson) is not a good model for many real-world data sets Independent edges, constrained to expected degree sequence same as observed. Requires P ij = f(k i )f(k j ), quickly yielding γ resolution parameter ad hoc (default = 1) [Reichardt & Bornholdt, PRE 2006; Lambiotte et al., 2008 & 2015]

Null Models for Modularity Quality Functions Erdős Rényi (Bernoulli) Newman-Girvan* Leicht-Newman* (directed) Barber* (bipartite)

Louvain Method (Blondel et al., Fast unfolding of communities in large networks, 2008)

Facebook Traud et al., Comparing community structure to characteristics in online collegiate social networks (2011) Traud et al., Social structure of Facebook networks (2012) Caltech 2005: Colors indicate residential House affiliations Purple = Not provided

Facebook Traud et al., Comparing community structure to characteristics in online collegiate social networks (2011) Traud et al., Social structure of Facebook networks (2012) Caltech 2005: Colors indicate residential House affiliations

Facebook Traud et al., Comparing community structure to characteristics in online collegiate social networks (2011) Traud et al., Social structure of Facebook networks (2012) Caltech 2005: Colors indicate residential House affiliations Purple = Not provided

U.S. Congressional Roll Call as a similarity network Waugh et al., Party polarization in Congress: a network science approach (2009) 85 th Senate Adjacency matrix of similarities is dense and weighted, cf. other typical networks (see committees: weighted but sparse)

U.S. Congressional Roll Call as a similarity network Waugh et al., Party polarization in Congress: a network science approach (2009) 85 th Senate 108 th Senate

Moody & Mucha, Portrait of political party polarization (2013)

Parker et al., Network Analysis Reveals Sex- and Antibiotic Resistance- Associated Antivirulence Targets in Clinical Uropathogens (2015)

Parker et al., Network Analysis Reveals Sex- and Antibiotic Resistance- Associated Antivirulence Targets in Clinical Uropathogens (2015)

Software Other great codes to know: http://www.mapequation.org/ https://graph-tool.skewed.de/

Self loops of weight r as a form of resolution parameter Arenas et al., Analysis of the structure of complex networks at different resolution levels (2008) (see also Shai et al., Case studies in network community detection, 2017)

Other good references on the slides that follow

Multilayer Networks Mucha et al., Community structure in time-dependent, multiscale, and multiplex networks (2010) Ordered Categorical Kivelä et al., Multilayer Networks (2014)

Multilayer Modularity Mucha et al., Community structure in time-dependent, multiscale, and multiplex networks (2010) Generalized Lambiotte et al. (2008) connection between modularity and autocorrelation under Laplacian dynamics to re-derive null models for bipartite (Barber), directed (Leicht-Newman), and signed (Traag et al.) networks, specified in terms of one-step conditional probabilities intra-slice adjacency data and null inter-slice identity arcs Same formalism works for more general multilayer networks, with sum over inter-layer connections within same community

Bassett et al. Dynamic reconfiguration of human brain networks during learning (2011)

Cranmer et al., Kantian fractionalization predicts the conflict propensity of the international system (2015) Identified communities of nation states in multiplex international relations of trade, IGOs, democracies Granger causal relationship to total system-level conflict Negligible contribution from joint democracy layer

Stanley et al., Clustering network layers with the strata multilayer stochastic block model (2016)

See mapequation.org Phys. Rev. X 6, 011036 (2016)

Stanley et al., Clustering network layers with the strata multilayer stochastic block model (2016)

Stanley et al., Clustering network layers with the strata multilayer stochastic block model (2016)

Taylor et al., Enhanced detectability of community structure in multilayer networks through layer aggregation (2016)

Taylor et al., Enhanced detectability of community structure in multilayer networks through layer aggregation (2016)

Community Detection Firehose Overview Hard/rigid v. soft/overlapping clusters cf. biclustering methods and mathematics of expander graphs A community should describe a cohesive group : varying formulations/algorithms Linkage clustering (average, single), local clustering coefficients, betweeness (geodesic, random walk), spectral, conductance, Classic approach in CS: Spectral Graph Partitioning Need to specify number of communities sought Conductance MDL, Infomap, OSLOM, (many other things I ve missed) Stochastic Block Models: generative with in/out probabilities between labeled groups Modularity: a good partition has more total intra-community edge weight than one would expect at random (but according to what model?) Communities in Networks, M. A. Porter, J.-P. Onnela & P. J. Mucha, Notices of the American Mathematical Society 56, 1082-97 & 1164-6 (2009). Community Detection in Graphs, S. Fortunato, Physics Reports 486, 75-174 (2010). Community detection in networks: A user guide, S. Fortunato & D. Hric, Physics Reports 659, 1-44 (2016). Case studies in network community detection, S. Shai, N. Stanley, C. Granell, D. Taylor & P. J. Mucha, arxiv:1705.02305.

Outline & Summary 1. What is community detection and why is it useful? 2. How do you calculate communities? Descriptive: e.g., Modularity Generative: e.g., Stochastic Block Models 3. Where is community detection going in the future? Networks appear in many disciplines Network representations provide a flexible framework for studying general data types, leveraging methods of social network analysis and network science. Community detection is a powerful tool for exploring and understanding network structures, including multilayer networks. Network structures identify essential features for modeling and understanding data in applications.

Special thanks to Mucha Research Group 2016 17