Towards a Human Language Project for Multilingual Europe AI and Interpretation Georg Rehm German Research Center for Artificial Intelligence (DFKI) GmbH Language Technology Lab Berlin, Germany META-NET, General Secretary georg.rehm@dfki.de
SCIC Universities Conference (19/20 April 2018) 2
Artificial Intelligence Data Intelligence Current breakthroughs based on Machine Learning ( Deep Learning ) Also still in use: symbolic, rule-based methods and systems Huge data sets + powerful algorithms + extremely fast hardware Enormous potential for disruptions in all sectors and areas SCIC Universities Conference (19/20 April 2018) 3
Translation and Interpretation Since approx. 2015, with breakthroughs in neural technologies, Machine Translation has been getting better and better. All areas of AI look for super-human performance but language is fundamentally different and much more complex. Neural AI approaches cannot understand language, they process it according to huge underlying data sets. In many use cases, mistakes can be tolerated. But: translation and interpretation are often mission-critical! Mistakes can have serious consequences (politics, medicine). SCIC Universities Conference (19/20 April 2018) 4
Speech Translation Example: Lecture Translator University lectures are automatically transcribed and translated, in near-real time, into several languages Students can follow the translation through a web interface Example: Presentation Translator Presenter can have the speech automatically translated Translations are displayed as subtitles Example: Call Translator Internet telephony provider offers automatic voice translation SCIC Universities Conference (19/20 April 2018) 5
Issues and Limitations The three example applications work surprisingly well for general-domain language and input. But: They are far from being perfect. They aren t robust. They cannot cope with unforeseen situations. They cannot understand language as humans do. They are not (yet?) suited for conference interpretation. Limitations as regards their fields of application. Interpretation is often mission-critical. Human interpreters won t be replaced anytime soon. SCIC Universities Conference (19/20 April 2018) 6
https://slator.com/features/ai-interpreter-fail-at-china-summit-sparks-debate-about-future-of-profession/ SCIC Universities Conference (19/20 April 2018) 7
LT Current Developments LT in Europe: World class research, strong SME base, thousands of LSPs; immense fragmentation; need for coordination. Need for High-Quality LT: translation, interpretation, MDSM etc. The European Language Challenge cannot be it must not be abandoned or outsourced! Need for Language Technology, made i Europe, for Europe! STOA Workshop in the EP (January 2017): Language equality in the digital age towards a Human Language Project SCIC Universities Conference (19/20 April 2018) 8
Human Language Project Goal: Deep Natural Language Understanding by 2030 Vision: EU FET Flagship Project (10+ years) Broad coverage, high quality, high precision Create approaches, algorithms, data sets, resources Across modalities: text, text types, speech, video etc. Artificial Intelligence including cognition, perception, vision, cross-modal, cross-platform, cross-culture etc. Linguistics Language Technology Machine Learning SCIC Universities Conference (19/20 April 2018) 9
Summary & Conclusions AI is disrupting all industries including translation and, increasingly, also interpretation. But: perfect, robust, precise language technologies (incl. written/spoken MT and interpretation) are still far away. Linguists are increasingly needed new profiles emerging The machine will support human experts and help them become more efficient it will not replace them. The Human Language Project is still a vision. Its goal: develop new breakthroughs in Language Technology. SCIC Universities Conference (19/20 April 2018) 10
Recommendation SCIC Speech Repository 4,000 speeches (3,000 public + 1,000 private) Extremely interesting data set and language resource for Language Technology researchers! Many R&D groups currently work on TED talk data sets Recommendation: establish bridges between SCIC and research groups for spoken language translation Help build the next generation of AI tools for interpreters AI tools that are tailored to the needs and wishes, topics and domains of conference interpreters in the EC/EP SCIC Universities Conference (19/20 April 2018) 11
Thank you! Dr. Georg Rehm DFKI Berlin georg.rehm@dfki.de http://de.linkedin.com/in/georgrehm https://www.slideshare.net/georgrehm Strategic Research and Innovation Agenda Language Technologies for Multilingual Europe Towards a Human Language Project SRIA Editorial Team Version 1.0 December 2017 SCIC Universities Conference (19/20 April 2018) 12
Multilingualism is at the heart of the European idea 24 EU languages all have the same status Dozens of regional and minority languages as well as languages of immigrants and trade partners Many economic and social challenges: The Digital Single Market needs to be multilingual Cross-border, cross-lingual, cross-cultural communication
(published in 2013) (31 volumes; published in 2012) 60 research centres in 34 countries (founded in 2010) Chair of Executive Board: Jan Hajic (CUNI) Dep.: J. van Genabith (DFKI), A. Vasiljevs (Tilde) General Secretary: Georg Rehm (DFKI) Multilingual Europe Technology Alliance. 826 members in 67 countries T4ME (META-NET) CESAR META-NORD METANET4U
Basque Bulgarian* Catalan Croatian* Czech* Danish* Dutch* English* Estonian* Finnish* French* Galician German* Greek* Hungarian* Icelandic Irish* Italian* Latvian* Lithuanian* Maltese* Norwegian Polish* Portuguese* Romanian* Serbian Slovak* Slovene* Spanish* Swedish* Welsh http://www.meta-net.eu/whitepapers * Official EU language
Speech MT excellent good moderate fragmentary weak or no support through LT English French, Spanish Catalan, Dutch, German, Hungarian, Italian, Polish, Romanian Basque, Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Galician, Greek, Icelandic, Irish, Latvian, Lithuanian, Maltese, Norwegian, Portuguese, Serbian, Slovak, Slovene, Swedish, Welsh Resources Text Analytics excellent excellent excellent good English good English good English moderate Dutch, French, German, Italian, Spanish moderate Czech, Dutch, Finnish, French, German, Italian, Portuguese, Spanish moderate Czech, Dutch, French, German, Hungarian, Italian, Polish, Spanish, Swedish fragmentary Basque, Bulgarian, Catalan, Czech, Danish, Finnish, Galician, Greek, Hungarian, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovene, Swedish fragmentary Basque, Bulgarian, Catalan, Danish, Estonian, Galician, Greek, Hungarian, Irish, Norwegian, Polish, Serbian, Slovak, Slovene, Swedish fragmentary Basque, Bulgarian, Catalan, Croatian, Danish, Estonian, Finnish, Galician, Greek, Norwegian, Portuguese, Romanian, Serbian, Slovak, Slovene weak or no support through LT Croatian, Estonian, Icelandic, Irish, Latvian, Lithuanian, Maltese, Serbian, Welsh weak or no support through LT Croatian, Icelandic, Latvian, Lithuanian, Maltese, Romanian, Welsh weak or no support through LT Icelandic, Irish, Latvian, Lithuanian, Maltese, Welsh
Source: META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, September 2012. Georg Rehm and Hans Uszkoreit (series editors)
We carried out the study in 2010/2012. While support for many languages has improved in the meantime, the overall picture remains mostly the same. Source: META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, September 2012. Georg Rehm and Hans Uszkoreit (series editors)