The European Language Resources Coordination ELRC 2 nd Conference Prof. Josef van Genabith, Dr. Andrea Lösch German Research Center for Artificial Intelligence (DFKI) 1
2
Translation How do we bridge language gaps? How do we make sure that information does not stay in silos? How do we make sure that nobody is discriminated against because of language? How do we treat all languages in the same? Translation! Human translation Machine translation To support HT and sometimes also MT on its own 3
Human languages are complex! Human languages are: Elegant Efficient Flexible Complex One word/sentence may mean many things Many ways of saying the same thing Meaning depends on context Literal and figurative language (metaphor) Language and culture (different ways of conceptualising the same thing) 4
Language is complex We cannot compute it exactly We tried: rule-based LT What do we do? Machine Learning Learns from data Approximate solution not perfect Robust Scalable 5
6
Data 7
Data for MT 8
Data for MT 9
Translation, the EC and Beyond Translation everywhere Industry Culture Travel Education EC prime producer and consumer for translation One of largest translation operations on the planet Long term & expert user of MT in public service State-of-the-art SMT based on EU funded research: the Moses SMT system 10
Why ELRC? EC has decided to expand translation to the needs of the public services in the member states Automated Translation platform of the Connecting Europe Facility (CEF AT) to facilitate multilingual communication and exchange of documents in key public administration scenarios: Consumer rights, health, public procurement, social security, culture, justice. Public online services: Open Data Portal, Europeana, Online Dispute Resolution, ejustice etc. (DSIs of CEF ) 11
Why ELRC? EC has good data for its own needs: EU parliamentary debates, EU laws etc. It doesn t have the right kind of data for the needs of national public services and the DSIs Training on Harry Potter and translation weather reports.! It needs the right kind of data! Who has the best data for their needs? The national public services of the member states! ELRC: working for the EC with the national public services to obtain this data for the EC to provide MT services back to national public services and CEF DSIs 12
Who is ELRC? The ELRC Consortium German Research Center for Artificial Intelligence (DFKI) Josef van Genabith, Andrea Lösch Evaluations and Language Resources Distribution Agency (ELDA) Khalid Choukri TILDE Andrejs Vasiljevs ILSP (Institute for Language and Speech Processing) Stelios Piperidis PLUS: 30 ELRC Technological NAPs (one per CEF affiliated country) 30 ELRC Public Services NAPs (one per CEF affiliated country) Legal advisors (e.g. irights) 13
ELRC Workshops Past workshops: 24.09.15 Greece 29.09.15 Germany 05.08.15 Latvia 23.11.15 Hungary 01.12.15 Cyprus 08.12.15 Slovenia 15.12.15 Czech Republic 26.01.16 Spain 28.01.16 Ireland 11.02.16 Estonia 19.02.16 Finland 24.02.16 Lithuania 26.02.16 Malta 01.03.16 Portugal 07.03.16 Copenhagen 09.03.16 Poland 10.03.16 Sweden 15.03.16 Italy 18.03.16 Bulgaria 23.03.16 Romania 14.04.16 Slovakia 15.04.16 Austria 19.04.16 Netherlands 21.04.16 Croatia 11.05.16 France 08.06.16 Norway 14.06.16 Luxemburg 14
15
ELRC Workshops Localized workshops in each of the 30 participating countries Target audience: National public service administrations Goals: To raise awareness about the importance of language data held by public administrations for public administrations To understand the needs of national public service administrations with regard to automated translation To jointly identify relevant sources of multi-lingual language resources To discuss any technical and legal issues involved in the use of data for automated translation 16
17
18
Contribute and Share your Data for a better Europe 19
ELRC Helpdesk Continuous support for data contributors Accessible through ELRC website: http://www.lr-coordination.eu/helpdesk Phone, Skype, Email Services: Answering any technical questions related to the use, production, collection, processing, and sharing of language resources. Answering any legal questions related to the use, production, collection, processing, and sharing of language resources. Response times: 24 hours (simple query) Up to 5 days (complex query) 20
Contribute and Share your Data for a better Europe 21
22
23
24
25
26
Contribute and Share your Data for a better Europe 27
Supporting our languages is supporting Europe, and supporting Europe is supporting our languages! 28