The origin of Indo-European languages

Similar documents
Chapter 5: Language. Over 6,900 different languages worldwide

Language. Name: Period: Date: Unit 3. Cultural Geography

Approved Foreign Language Courses

ROSETTA STONE PRODUCT OVERVIEW

The Ohio State University. Colleges of the Arts and Sciences. Bachelor of Science Degree Requirements. The Aim of the Arts and Sciences

Uncertainty concepts, types, sources

Conversation Task: The Environment Concerns Us All

Timeline. Recommendations

Idaho Public Schools

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Basic German: CD/Book Package (LL(R) Complete Basic Courses) By Living Language

Lecture 1: Machine Learning Basics

DETECTING RANDOM STRINGS; A LANGUAGE BASED APPROACH

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Monticello Community School District K 12th Grade. Spanish Standards and Benchmarks

The Evolution of Random Phenomena

TEKS Correlations Proclamation 2017

University of New Orleans

Introduction to Simulation

Turkish Vocabulary Developer I / Vokabeltrainer I (Turkish Edition) By Katja Zehrfeld;Ali Akpinar

Section V Reclassification of English Learners to Fluent English Proficient

MYCIN. The MYCIN Task

The Good Judgment Project: A large scale test of different methods of combining expert predictions

An Empirical and Computational Test of Linguistic Relativity

Columbia High School

Scientific Method Investigation of Plant Seed Germination

Critical Analysis of Evolution Grade 10

Undergraduate Program Guide. Bachelor of Science. Computer Science DEPARTMENT OF COMPUTER SCIENCE and ENGINEERING

English-German Medical Dictionary And Phrasebook By A.H. Zemback

Hands-on Books-closed: Creating Interactive Foldables in Islamic Studies. Presented By Tatiana Coloso

Success Factors for Creativity Workshops in RE

The International Coach Federation (ICF) Global Consumer Awareness Study

The MEANING Multilingual Central Repository

Fountas-Pinnell Level P Informational Text

The Linguistic Territoriality Principle: Heterogeneity and Freedom Problems

Rule-based Expert Systems

Undergraduate Programs INTERNATIONAL LANGUAGE STUDIES. BA: Spanish Studies 33. BA: Language for International Trade 50

Level 6. Higher Education Funding Council for England (HEFCE) Fee for 2017/18 is 9,250*

L1 and L2 acquisition. Holger Diessel

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Information for Candidates

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Busuu The Mobile App. Review by Musa Nushi & Homa Jenabzadeh, Introduction. 30 TESL Reporter 49 (2), pp

Maynooth University Study Abroad in Ireland

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma

Toward Probabilistic Natural Logic for Syllogistic Reasoning

Unit 8 Pronoun References

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

The Language of Football England vs. Germany (working title) by Elmar Thalhammer. Abstract

LANGUAGE DIVERSITY AND ECONOMIC DEVELOPMENT. Paul De Grauwe. University of Leuven

Learning Methods in Multilingual Speech Recognition

Math Placement at Paci c Lutheran University

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

History. 344 History. Program Student Learning Outcomes. Faculty and Offices. Degrees Awarded. A.A. Degree: History. College Requirements

New Paths to Learning with Chromebooks

BIODIVERSITY: CAUSES, CONSEQUENCES, AND CONSERVATION

Intensive Writing Class

CEFR Overall Illustrative English Proficiency Scales

Roadmap to College: Highly Selective Schools

How the Guppy Got its Spots:

International Advanced level examinations

Unit: Human Impact Differentiated (Tiered) Task How Does Human Activity Impact Soil Erosion?

Getting Started with Deliberate Practice

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

Age Effects on Syntactic Control in. Second Language Learning

JOU 6191 Contemporary Issues in Journalism From Muckraker to Blogger The Journalist of Yesterday, Today and Tomorrow

International Conference on Education and Educational Psychology (ICEEPSY 2012)

While you are waiting... socrative.com, room number SIMLANG2016

AP Calculus AB. Nevada Academic Standards that are assessable at the local level only.

STANDARDS. Essential Question: How can ideas, themes, and stories connect people from different times and places? BIN/TABLE 1

the contribution of the European Centre for Modern Languages Frank Heyworth

Managerial Decision Making

Mongoose On The Loose/ Larry Luxner/ Created by SAP District

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Science Fair Project Handbook

Indo-European language and culture: An introduction (review)

Teacher: Mlle PERCHE Maeva High School: Lycée Charles Poncet, Cluses (74) Level: Seconde i.e year old students

MAJORS, OPTIONS, AND DEGREES

Visit us at:

Sample Pages. To browse ebook titles, visit

Lecture 1: Basic Concepts of Machine Learning

Effect of Word Complexity on L2 Vocabulary Learning

2016/17 Big History: Sample Semester-Long Course Plan Content Pacing Guide

CORPUS ANALYSIS CORPUS ANALYSIS QUANTITATIVE ANALYSIS

CITY COLLEGE OF SAN FRANCISCO Transfer Credit Agreement Catalog

Language contact in East Nusantara

Artificial Neural Networks written examination

Language Center. Course Catalog

Detecting English-French Cognates Using Orthographic Edit Distance

Simulation of Multi-stage Flash (MSF) Desalination Process

Plain Language NAGC Review

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

Texas Wisconsin California Control Consortium Group Highlights

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Universities as Laboratories for Societal Multilingualism: Insights from Implementation

Developing a TT-MCTAG for German with an RCG-based Parser

GDP Falls as MBA Rises?

School of Languages, Literature and Cultures

Transcription:

9/7/7 A new hybrid hypothesis for the origin and spread of the Indo-European languages Russell Gray,Max Planck Institute for the Science of Human History, Jena Theories of Indo-European Origin The origin of Indo-European languages the most intensively studied, yet still most recalcitrant, problem of historical linguistics Diamond and Bellwood, Science, 2003 Talk structure.the challenge(s) 2. Bayesian phylolinguistics made easy 3.The Indo-European debate goes Bayesian 4. Archaeogenetics to the rescue? 5. A new hypothesis for the origin of PIE the quest for the origins of the Indo-Europeans has all the fascination of an electric light in the open air on a summer night: it tends to attract every species of scholar or would-be savant who can take pen to hand. Mallory, 989.

9/7/7 So why do I care? The formation of different languages and of distinct species, and the proofs that both have been developed through a gradual process, are curiously parallel. We find in distinct languages striking homologies due to community of descent, and analogies due to a similar process of formation Tree of life Tree of languages The Descent of Man, 87 Darwin s notebook, 837 Schleicher, 865 Retentions vs innovations Sympliesiomorphies vs synapomorphies Glottoclock Molecular clock Karl Brugmann 884 Willi Hennig 950/966 Morris Swadesh 952 t = log c/2 log r Zuckerkandl & Pauling 962 c = % shared cognates r» 8% (200 word list) 2

9/7/7 Phylogenetic explosion in biology 0.7 Keyword Phylogen* in Scopus database http://treetapper-dev. blogs pot.co m/ Talk structure Percentage of Total Publications 0.6 0.5 0.4 0.3 0.2.The challenge(s) 2. Bayesian phylolinguistics made easy 3.The Indo-European debate goes Bayesian 4. Archaeogenetics to the rescue? 5. A new hypothesis for the origin of PIE 0. 980 990 2000 200 Year Which tree is more likely Why use computers? Languages # rooted trees 3 3 4 5 5 05 6 945 7 0395 8 3535 9 2027025 0 34459425 20 8.2 X 0 2 50 2.7 X 0 76 00 3.4 X 084 (2n-5)!! 3

9/7/7 Modern Bayesian Phylogenetic Inference What is the ancestral state?. Data 2. Model 3. Priors 4. Tree search Depends on the tree And the model assumptions about the relative probabilities of character state changes matter Likelihood calculation 2 Three models of lexical evolution. Equal probability of cognate gains and losses 2. Dollo (gains can only occur once, assumes stochastic clock)) 3. Covarion 4

9/7/7 MCMC search (Markov chain Monte Carlo) Convergence? Bayesian MCMC inference posterior probability of the trees given the priors, the data and the model No one tree to rule them all Posterior sample of trees reveals uncertainty Densitree visualisation A B C D A B C D A B C D 20% 50% 30% Adapted from Cui et al. 203. What we are NOT doing What is to be gained?. Counting cognates to get % shared cognates 2. Pairwise comparisons of languages 3. Assuming constant rates Level playing field no cherry picking, explicit optimality criterion to evaluate subgrouping hypotheses Quantify uncertainty Estimate dates 5

9/7/7 Talk structure Data.The challenge(s) 2. Bayesian phylolinguistics made easy 3.The Indo-European debate goes Bayesian Dyen et al. (992) - 84 languages, 2449 cognate sets Added three extinct languages Swadesh list - 200 items of basic vocabulary numerals, kinship terms, terms for body parts, and basic verbs Relatively resistant to borrowing Recognized borrowing removed - e.g. English mountain borrowed from French montagne was not coded as cognate. Binary coding English And Big Fire Meat Rub Water German Und Gross 2 Feuer Fleisch 2 Reiben Wasser Dutch En Groot 2 Vuur Vleesch 2 Wrijven Water Swedish Och 2 Stor 3 Eld 2 Kott 3 Gnida 2 Vatten Icelandic Og 2 Stor 3 Eldr 2 Hold 3 Nua 3 Vatn Danish Og 2 Stor 3 Ild 2 Kod 4 Gnide 2 Vand Greek Ke 3 Meghalos 4 Fotia 3 Kreas 5 Trivo 4 Nero 2 Bayesian MCMC tree-estimation MrBayes (v.2 & v.3) - Huelsenbeck & Ronquist 0 independent runs with 4 chains,300,000 generations First 300,000 discarded as burnin Sample every 0,000 Majority rule consensus tree of 0,000 MCMC trees Fren ch /I berian Italic Celtic linguists don t do dates April & Robert McMahon (2006) No rth German i c West Germanic German i c Sl avi c Balto-Slavic Baltic Indic Indo-Iranian Iranian Albanian Arme n ia n Greek Toc ha ria n Hi tti te 6

9/7/7 3!!! calibration points (ranges) 450 AD 800 AD 50 AD 300 AD 50 AD 250 AD 650 BC 300 AD Fren ch /I b eri an West German i c No rth German i c Italic Germanic Celtic Date of proto Indo-European estimated using penalised maximum likelihood rate smoothing Gray & Atkinson (2003) Nature, 426, 435-439. >000 BC > 200 BC 800 BC 500 BC Indic Iranian Indo-Iranian 50 AD 000 AD 400 BC 00 AD >500 BC 40 BC 350 AD 800 BC 300 BC Sl avi c Baltic Balto-Slavic Albanian Greek Arme n ia n Toc ha ria n Hi tti te Responses to Gray & Atkinson 2003 It is a very good paper. X#@$?&! (unrepeatable) Criticisms of our method. Wrong answer linguistic paleontology rejects this time date Larry Trask, Sussex Univ. 2. Reliance on lexical data Lexical data is the least reliable type of data - Don Ringe, Univ. of Pennsylvania 3. Model misspecification The argument from the wheel English wheel Hom. Greek kuklos OHGerman wel Sanskrit cakra OIcelandic hjol Proto-Indo-European word *kwekwlo- wheel How could a language that was last spoken around 0,000 years ago have words for things that were not invented until 4000 years later? used techniques that are not appropriate for their data." - Tandy Warnow, Univ. of Texas 4. Cognate sets not independent Larry Trask. 7

9/7/7 The argument from the wheel Other explanations Should we believe the lexical data? Constraining the tree to Ringe et al. 2002 topology gives similar date estimates Borrowing new technology Semantic shift ì kola wheel Old Russian *kwel- (to turn, rotate) è kuklos wheel - Greek î cakra wheel - Sanskrit Center of gravity/center of diversity Austronesian expansion Sequence, timing, pulses and pauses Gray, Drummond & Greenhill. 2009. Sc ience, 323, 479-483. Bayesian phylogeography Gray,Drummond, & Greenhill (2009) Science, 323, 479-483. Lemey et al.,mbe, 200 8

9/7/7 Alex Alekseyenko Quentin Atkinson Remco Bouckaert Alexei Drummond Michael Dunn Russell Gray Simon Greenhill Philippe Lemey Marc Suchard The team Bayesian phylogeography. Data (basic vocab) 2. Location (language ranges) 3. Diffusion model 4. Calibration data to date language divergences 5. Bayesian MCMC inference of phylogeny in BEAST New (improved) data http://ielex.mpi.nl/ Michael Dunn, MPI Nijmegen +language location data + model of spatial diffusion + Bayesian inference of phylogeny in BEAST Substitution models Simple binary reversible model (0 çè ) Binary covarion model slow(0 çè ) fast (0 çè ) Stochastic dollo model (0 è one gain, many loses) Clock Uncorrelated lognormal relaxed clock (Drummond et al., 2006) 9

9/7/7 = test origin hypotheses Bouckaert et al (202) Science. -7000-6000 0.95-5000 0.99 0.99 0.96-4000 -3000 0.85 0.86-2000 Hittite -000 0.0 000 Celtic Italic Germanic Balto-Slavic Albanian Greek Armenian Indo-Iranian Tocharian 2000 Posterior distribution on root location Bayes factor Phylogeographic analysis Anatolian vs. steppe I Anatolian vs. steppe II RRW: All languages 75.0 59.3 RRW: Ancient languages only 404.2 582.6 RRW: Contemporary languages only 2.0.4 Landscape aware: Diffusion 298.2 4.9 Bouckaert et al (203) Science. Correction. Celtic Italic Germanic 0.78 Balto-Slavic 0.36 0.69 Indo-Iranian 0.36 0.48 Anatolian Tocharian Armenian Greek Albanian 8000 7000 6000 5000 4000 3000 2000 000 0 Time (years ago) Bouckaert et al (202) Science. 0

9/7/7 RRW Posterior distribution on root location X X X RRW: All languages 380.4 625.2 RRW: Constrained 74.0 45.4 RRW: Ancient only 828 + RRW: Contemporary only* 73 + Big problem inconsistent data coding (Swadesh policy of most commonly used lexeme not consistently followed) 20 Butterflies in the Chang et al data 5 Number of Languages 0 5 0 0 00 200 Number of Records COBL Ancient Greek

9/7/7 Better data, multi-state, better model that allows but does not enforce direct ancestry, new results. Chang et al 204 New Jena results Umbrian Latin 0.94 Gaulish 0.97 0.99 0.98 0.99 0.94 Romanian Vlach Sardinian_Logudoro Sardinian_Cagliari Sardinian_Nuoro Italian Ladin Romansh Friulian Catalan Portuguese Spanish Provencal Walloon French Gothic Old_Norse Old_Swedish 0.99 Old_English 0.99 Old_High_German Icelandic Faroese Norwegian_Riksmal Swedish Danish English Frisian Dutch Flemish German Luxembourgish 2.0.0 0.0 2

9/7/7 Talk structure. The challenge(s) 2. Bayesian phylolinguistics made easy 3. The Indo-European debate goes Bayesian 4. Archaeogenetics to the rescue? Haak et al 205 Talk structure. The challenge(s) 2. Bayesian phylolinguistics made easy 3. The Indo-European debate goes Bayesian 4. Archaeogenetics to the rescue? 5. A new hypothesis for the origin of PIE 3

9/7/7 A hybrid model for the origin and spread of Indo-European Haak, Heggarty, Krause & Gray A hybrid model for the origin and spread of Indo-European Haak, Heggarty, Krause & Gray Thanks to Colleagues Cormac Anderson (Jena, MPG-SHH) Quentin Atkinson (Auckland) Remco Bouckaert (Auckland) Alexei Drummond (Auckland) Michael Dunn (Nijmegen) Simon Greenhill (ANU, Jena) Wolfgang Haak (Jena) Paul Heggarty (Jena, MPG-SHH) Johannes Krause (Jena) Philippe Lemey (Leuven) Marc Suchard (UCLA) 4