Automatic Machine Translation in Broadcast News Domain

Size: px
Start display at page:

Download "Automatic Machine Translation in Broadcast News Domain"

Transcription

1 Automatic Machine Translation in Broadcast News Domain Alexandre Gusmão L 2 F/INESC-ID Lisboa Rua Alves Redol, 9, Lisboa, Portugal {ajag}@l2f.inesc-id.pt Abstract. This paper describes the automatic translation system, from Portuguese into English, in broadcast news domain, developded in L 2 F, Laboratório de Língua Falada, in INESC-ID. It presents a brief introduction about the stat of the art in automatic translation and describes the tools used during the construction of this system and all the experiences done. In the end of this paper a table resumes all this experiences, as well as the evolution of BLEU obtained values. 1 Introduction The purpose of this paper is the development of an automatic translation system from Portuguese into English, from speech to text, for broadcast news. There are various inherent advantages to the building of such a system, the main one of which is, for example, to allow better comprehension of the news on the part of foreign listeners, as well as to help people with some kind of hearing problem. In the approach to the building of a machine translation system, several approaches were studied, such as IBM models, syntax-based models and phrasebased models. The systems based on statistical methods will be predominantly discussed in this paper. In order for the statistical methods to be used, a significant quantity of parallel texts is needed regarding the certain subject domain. The lack of parallel texts in the field of broadcast news will be one of the major problems to be addressed during the realization of this paper. In this dissertation several experiments will be demonstrated, which were carried out towards the building of a translation system from Portuguese into English in the field of broadcast news, the problems faced and the adopted solutions, as well as the results obtained from every experiment.

2 2 State of Art Various approaches to machine translation were made, where initially those with better results were the rule-based ones. However, this approach will not be considered in detail in this dissertation. On the other hand, through the work of Brown, the interest in the study of statistical methods was revived, and those deserve a more detailed study throughout this paper. Concerning the rule-based translation systems[1], plenty of linguistic information is needed and it is very difficult to write rules which cover all the language. These systems can be classified as direct systems, of transfer and interlingua. The figure 1 illustrates these systems. As far as the statistical-based transla- Fig. 1. Vauquois Triangle tion systems[2] are concerned, instead of using rigorous linguistic rules, they use probability distributions. These systems offer some advantages, such as the use of probabilities is easier, depending on the task to be performed, these can be added and multiplied, also the existence of algorithms which learn automatically to estimate the value of the probability, without human intervention, and contrary to the non-statistical approach, these systems do not require the manual development of linguistic rules.

3 However, these systems offer some disadvantages as well, such as difficult adaptation of the same system to different subject domains and the fact that these systems do not take into account the syntactic information of the phrases. In the statistical approach the distribution probability P(F E) of all the possible pairs (F, E) is considered and the translated phrase and the one with the highest probability is selected, that is E = argmax E P(F E).P(E) (1) where P(F E) stands for a translation model and P(E) stands for a fluency model. The translation models attribute a probability to each alignment between the input phrase and the output phrase. An alignment[3] is no more than a set of connections between the source phrase and the target phrase, where each word from the target phrase is connected only to one word from the source phrase. The IBM Models[4] are some of the models used for that purpose. There are 5 IBM Models, model 1 considers all possible connections between the source phrase and the target phrase and in model 2 the word order in the phrases now influences the probability value. Models 3, 4 and 5 consider aspects such as word fertility (number of words from the target phrase connected to a word from the source phrase), Identity (translation) of words in the target language and finally the position occupied by each word in the target phrase. As to the phrase-based systems[5] (word sequences), the phrase translation includes operations such as phrase segmentation, translation of each of the phrases in the target language and its reordering, in a way to form phrases from that language. These phrase-based systems are trained based on parallel aligned texts, employing word-based translation models to align each phrase pair in the corpus, in terms of words. Dew to the extreme difficulty of the search task, it is necessary to use efficient algorithms, such as A* or Dynamic Programming Beam. Dew to the good results which these models have achieved, it was decided to build a phrase-based translation system. As to the syntax-based systems[6], those aim at the introduction of structural aspects of the language and that is why they employ some operations, such as word reorder, introduction of more words and finally their translation. The Figure 2 illustrates an example of those operations applications. The evaluation of the translation system can be done by human intervention or automatic metric, and the latter has been chosen to determine the quality of the system developed in this dissertation. Some of the most famous automatic evaluation metrics are WER (Word Error Rate)[7], PER (Position Independent Word Error Rate)[7], NIST (National Institute of Standards and Technology)[8] and BLEU (Bilingual Evaluation Understudy)[9]. This last metric will be mostly used to evaluate the quality of the built translation system. It is a measure which

4 Fig.2. Reordering, introduction and translation operations compares the number of words shared between the phrases candidates and the reference phrases and is also based on n-grams matching, that is, instead of checking whether each word from the translated phrase is found in the source phrase, it is checked whether a sequence of words (up to 4 words) is found in those same phrases. As to the speech translation, a system of that kind is built by two sequent systems, one for speech recognition into text and another for the translation of the text in the desired language. In the Laboratório de Língua Falada (L 2 F laboratory) there are some speech recognition systems, namely Audimus, a system for the recognition of Portuguese language, also for broadcast news. Among the several existing speech translation systems, to be highlighted is a system developed by the European project TC-STAR ( as being most similar to the system developed in this dissertation, due to the fact that it includes extensive vocabulary(more than words) and also for being a project whose subject domain (European Parliament sessions) approximates the broadcast news. For the development of a translation system for the broadcast news, advantage was taken from the speech recognition system Audimus already referred and a phrase-based translation system was built, best possibly adapted to the subject domains.

5 3 The Translation System In this chapter the main stages associated with the translation system used in this paper will be described, namely the creation of language model, standardization of training corpus, training of the system, phrasetable filter, tuning and evaluation of the system. In regard to the language model, it is frequently used in speech recognition and translation, where this model tries to predict the following word in a sequence of words. When a phrase is inputted in the translator, several possible translations of that same phrase are formed, and the job of the language model is to attribute a certain weight to each one of these translations. As to the training corpus standardization, it is necessary to perform this task in a way that the corpus vocabulary used in the system training is compatible with the vocabulary coming from the speech recognizer. Typical standardization tasks are transformation of abbreviations to their full name, as well as roman numbers, decimal numbers, dates, currency symbols, etc. Following the building of the language model and the standardization of the training corpus is the training of the system through, for example, the Moses tool. Alignments are obtained between the words of the two languages and probabilities are attributed to each one of these alignments. All these alignments are kept in a phrase-table. Since the vocabulary to be used for the training of the system will be based on the European Parliament sessions, it is necessary to filter the phrase-table so that it will contain only words belonging to the broadcast news domain. In that way the system will be quicker in terms of calculation and will be based on the broadcast news area. In a later stage a system tuning is carried out, in other words, different values are combined for the features used by the Moses tool, throughout the successive translation iterations until the BLEU value obtained starts to converge to a final value. This is one of the stages of the translation system building process which takes more time and also one of the most important. In order to perform this task it is essential to use a development corpus based exclusively on the subject domain of the task in question, in this case the broadcast news area. At last, after the translation system is built, it is necessary to evaluate it. For this automated evaluation metrics are used, as for example the metric BLEU. The higher its value, the better the quality of the translations done by the system. To build the translation system some specific tools were used. Some of those tools will be further on discussed as well as their purpose.

6 The SRILM[10] is a tool used for creating and applying statistical language models, mainly to be used in the speech recognition. In this way, it permits the training of the statistical language model, as well as its testing. SRILM is a tool comprised of a set of libraries C and a set of scripts which facilitate the execution of this tool s tasks. To train the system, the tool Moses[11] was used. Moses is a statistical automated translation system which allows the training of the translation models for whatever language pair in an automated way. It has a set of components which allows the training of language models, training of translation models, system tuning and evaluation of translation phrases. One of the main components which comprise the Moses is the GIZA++[12]. This tool is used for the training of the translation model and achievement of alignments referred to previously. Concerning the corpora to be used in the building of a translation system, this was the most problematic aspect. If the translator was based on the European Parliament sessions, there are no major problems, since these sessions are translated into various languages and therefore more than sufficient quantity of parallel texts exists for the training of a translation system in this context. However, the planned translation system will be based on broadcast news and after conducting a research on the corpora of that field, it was concluded that various words exist which are not translated in the corpora of the European Parliament. In that way, the lack of corpora in the domain of broadcast news to train the system will be one of the main problems to face. The adopted solution was to train the system with the European Parliament corpus while the sets of development and testing would be based on the broadcast news context. The building of the corpora was done with resort to the euronews website ( where a total of 914 phrases was obtained of which 457 are written in Portuguese and the other half corresponds to the translation of these phrases, in English. 4 Text translation and Evaluation This chapter will deal with various experiments conducted for the creation of a translator from text to text for the language pair Portuguese English in the field of broadcast news. The translator which was first developed had as bases the sessions of the European Parliament, and this system obtained a BLEU rate of 0,3531 (vocabulary with tokens and in small letters Condition A) and 0,3445 in the conditions of the Europart system (Condition B). In this system the phrase-table was filtered in order to contain only words from the broadcast news domain. Dew to this filtering and also to the fact that the testing corpus used to evaluate the system was the one from the European Parliament area, the BLEU value obtained was

7 of 0,2699 in the condition A and 0,2643 in the condition B. The next step was the definition of a baseline system. At first a standardization of the training corpus was done and the phrase-table which was formed during the model training was filtered so that this could contain only phrases whose words were found in the word list used in the development corpus. In regard to the training corpus of the system, this continued to belong to the European Parliament domain. As to the language model, the development and testing set, corpora based on the broadcast news domain was used. At the end, the baseline system obtained a BLEU value of 0,1705 in condition A and 0,1650 in condition B. 4.1 Experiment 1 In the first adaptation in relation to the baseline system, it was decided to change the development and testing corpus, since most of the times, some of the phrases were comparable but were not direct translations. Only few adjustments were made in relation to the phrases in Portuguese and in the end, the same system but with these changed corpora, obtained a BLEU value of 0,4776 in condition A and 0,4722 in condition B. 4.2 Experiment 2 In this experiment it was opted for a language model through the interpolation technique. Two corpora from different domains were used, one of the broadcast news and the other of newspaper. The metric perplexity was used for the evaluation of the created language model. Relatively to the language model of the broadcast news, a model with words was built with a rate of perplexity of 154,453. As to the language model of the newspapers, a model of words was built with a rate of perplexity of 132,607. After the interpolation of both models, a final language model was obtained with words and a rate of perplexity of 112, Experiment 3 In this experiment, the interpolated language model was used in the translation system already developed up to the moment, which obtained a BLEU value of 0,4861 in condition A and 0,4799 in condition B. 4.4 Experiment 4 In all experiments it is confirmed that when recase of the phrases obtained by the decoder was conducted, the BLEU value decreased slightly. In that way, it was decided to refine the way in which this tool was being trained, joining two corpora of different domains, one connected to the broadcast news and the other

8 to the newspapers. Since the texts in the newspapers can only approximate the broadcast news transcriptions, not making part of their area, the BLEU value did not increase, on the contrary, it decreased, obtaining in that way a value of One more experiment was then made in order to try to find any solution which will present positive results. The translation system was then trained with training corpus, in which not all words in English were in small letters, and in that way the system learns to capitalize the supposedly correct words. In the end, the BLEU value was not satisfactory, having obtained a value of 0, Experiment 5 As a last experiment to try to obtain a better BLEU value, some post-translation processing was used. In that way, a reevaluation is made of a set of 1000 hypothesis of each phrase, formed by the translation system using some new features. The new features used are the following: Difference in the number of words between the phrase in Portuguese and the phrase hypothesis in English; POS (Part of Speech) Usage of rules for correspondence between the language pair Portuguese and English and of penalization patterns in English. In regard to the feature difference in the number of words, several experiments were made using the combination between the number of words of the phrase in Portuguese, number of words of the phrase in English and the difference in number of words. All the possible combinations presented satisfactory BLEU values, but it was the difference in the number of words that was mostly highlighted, with the system obtaining a BLEU value of 0,5055. As to the POS feature, two concepts are presented. The first is related to the calculation of similarities between the POS tags in both languages. The determined tags between both languages are counted and a score is attributed to each phrase, according to the number of equivalences found between them. The other concept is related to the calculation of penalization patterns in which whenever the system comes upon a pattern classified as penalization pattern in the phrase in English, a penalization is attributed to the phrase in question. With the POS feature contribution, the translation system obtained a BLEU value of 0,4967. In the end, with the usage of the features POS and the difference in number of words between the phrase in Portuguese and English, the system obtained a final BLEU value of 0, Point of Comparison In a way to compare the created translation system with another, it was decided to translate all phrases of the testing corpus through the translation system

9 provided by the search engine google. After this task was performed, the BLEU value registered by this search engine was 0,4102, which is far below the value obtained by the system created in this paper. 5 Conclusion and Future Work In order to improve the translation quality in future work, some approaches can be developed in relation to OOVWs (out of vocabulary words), as for example a dictionary can be used where all words and respective translations are inserted. In this case, about words in Portuguese and in English do not exist in the corpus of the European Parliament, making it unthinkable to transcribe all these words and respective translations. Another solution is to use the website ( where it is possible to obtain all verbal terms of the verbs contained in the training corpus. Yet another alternative is to copy some words which are simultaneously in the training corpus in Portuguese and English, that is, words which do not have translation such as proper names. Of the conclusions drawn from this paper, the following are to point out: The use of: Training corpus belonging to the area equivalent to the desired aim (in this case it was not possible to use corpus inherent to the broadcast news); Language model interpolated with another from a similar context ( in that case the model used was a model belonging to the newspaper texts domain); Clean development corpus, that is, with correctly translated phrases; Some post translation processing, using the features described in the previous chapter, which can maximize the system s BLEU value, by choosing the best phrase among the N possible. The conjugation of all these assumptions resulted in the building of an automatic translation system in the broadcast news context, with a BLEU value of 0,5088. The used corpus for the training of the translation system was always based on the European Parliament sessions, since there are not sufficient resources available for the broadcast news context. Relatively to the corpus used for the construction of the language model, a interpolation between two corpora of different domains was carried out, one related to the broadcast news[13] and another related to the newspapers[14]. As for the development and test corpora, these were always based on the broadcast news domain. However, these corpora suffered some corrections in a way that the system manages to produce translations with a better quality and consequently obtain a better BLEU value. The 1 table illustrates the type of corpora used and the respective description for the broadcast news domain.

10 Type of corpus Language Model Training corpus Development corpus Test corpus Description Set of phrases, based of broadcast news and newspapers, only written in the destination language. Paralel corpus, based on European Parliament. Paralel corpus, based on broadcast news. Paralel corpus, based on broadcast news. Table 1. Corpus and description. For a better comprehension of all experiments carried out, the table 2 describes all those experiments, offering a brief description of them and the respective BLEU values obtained. BLEU Experience Description Condition A Condition B Experience 1 Baseline system, language model based on broadcast news, training corpus based on European Parliament, tuning and test corpora changed Experience 2 Language model Interpolation - - Experience 3 Translation System with an interpolated language model Experience 4 Enhancement of the training corpus of the recase system with newspaper texts Automatic capitalization system Experience 5 Reprocess of the obtained translations (new features) Table 2. Experiences References 1. D Jurafsky, J.M.: Speech and language processing (2000) Publisher: Prentice Hall. 2. Ney, H.: One decade of statistical machine translation: 1996:2005. In: Human Language Technology and Pattern Recognition, Germany, Lehrstuhl informatik VI-Computer Science Department, RWTH Aachen (2005) 3. Knight, K.: Translation with finite-state devices. CA (2006) 4. Brown, P.: The mathematics of machine translation: Parameter estimation. In: Computational Linguistics. Volume 19. (2003)

11 5. Koehn, P.: Introduction to statistical machine translation (2005) 6. Marcu, D.: Spmt: Statistical machine translation with syntactified target language phrases, 4640 Admiralty Way, Suite 1210, Marina del Rey, Language Weaver Inc (2006) CA Nicola Ueffing, H.N.: Lehrstuhl fur informatik vi. bayes decision rules and confidence measures for statistical machine translation. In: Computer Science Department RWTH, University Ahornstrasse 55, 52056, Aachen, Germany (2004) 8. Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. (2002) 9. Papinemi, K.: Bleu: a method for automatic evaluation of machine translation, IBM Research Report (2001) 10. Stolcke, A.: Srilm - an extensible language modeling toolkit. In: Proc. International Conference on Spoken Language Processing. Volume 2., Denver, CO (September 2002) Koehn, P.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, Prague, Czech Republic, Association for Computational Linguistics (June 2007) Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Volume 29. (2003) MacIntyre, R.: Ldc catalog number ldc98t31. (1998) 14. Graff, D.: Ldc catalog number ldc95t21, isbn (1995)

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer

More information

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 Teresa Herrmann, Mohammed Mediani, Jan Niehues and Alex Waibel Karlsruhe Institute of Technology Karlsruhe, Germany firstname.lastname@kit.edu

More information

The NICT Translation System for IWSLT 2012

The NICT Translation System for IWSLT 2012 The NICT Translation System for IWSLT 2012 Andrew Finch Ohnmar Htun Eiichiro Sumita Multilingual Translation Group MASTAR Project National Institute of Information and Communications Technology Kyoto,

More information

Noisy SMS Machine Translation in Low-Density Languages

Noisy SMS Machine Translation in Low-Density Languages Noisy SMS Machine Translation in Low-Density Languages Vladimir Eidelman, Kristy Hollingshead, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department of

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation AUTHORS AND AFFILIATIONS MSR: Xiaodong He, Jianfeng Gao, Chris Quirk, Patrick Nguyen, Arul Menezes, Robert Moore, Kristina Toutanova,

More information

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling Pratyush Banerjee, Sudip Kumar Naskar, Johann Roturier 1, Andy Way 2, Josef van Genabith

More information

The KIT-LIMSI Translation System for WMT 2014

The KIT-LIMSI Translation System for WMT 2014 The KIT-LIMSI Translation System for WMT 2014 Quoc Khanh Do, Teresa Herrmann, Jan Niehues, Alexandre Allauzen, François Yvon and Alex Waibel LIMSI-CNRS, Orsay, France Karlsruhe Institute of Technology,

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

Language Model and Grammar Extraction Variation in Machine Translation

Language Model and Grammar Extraction Variation in Machine Translation Language Model and Grammar Extraction Variation in Machine Translation Vladimir Eidelman, Chris Dyer, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department

More information

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

Greedy Decoding for Statistical Machine Translation in Almost Linear Time in: Proceedings of HLT-NAACL 23. Edmonton, Canada, May 27 June 1, 23. This version was produced on April 2, 23. Greedy Decoding for Statistical Machine Translation in Almost Linear Time Ulrich Germann

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Memory-based grammatical error correction

Memory-based grammatical error correction Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation Baskaran Sankaran and Anoop Sarkar School of Computing Science Simon Fraser University Burnaby BC. Canada {baskaran,

More information

South Carolina English Language Arts

South Carolina English Language Arts South Carolina English Language Arts A S O F J U N E 2 0, 2 0 1 0, T H I S S TAT E H A D A D O P T E D T H E CO M M O N CO R E S TAT E S TA N DA R D S. DOCUMENTS REVIEWED South Carolina Academic Content

More information

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General Grade(s): None specified Unit: Creating a Community of Mathematical Thinkers Timeline: Week 1 The purpose of the Establishing a Community

More information

A Quantitative Method for Machine Translation Evaluation

A Quantitative Method for Machine Translation Evaluation A Quantitative Method for Machine Translation Evaluation Jesús Tomás Escola Politècnica Superior de Gandia Universitat Politècnica de València jtomas@upv.es Josep Àngel Mas Departament d Idiomes Universitat

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

A heuristic framework for pivot-based bilingual dictionary induction

A heuristic framework for pivot-based bilingual dictionary induction 2013 International Conference on Culture and Computing A heuristic framework for pivot-based bilingual dictionary induction Mairidan Wushouer, Toru Ishida, Donghui Lin Department of Social Informatics,

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction CLASSIFICATION OF PROGRAM Critical Elements Analysis 1 Program Name: Macmillan/McGraw Hill Reading 2003 Date of Publication: 2003 Publisher: Macmillan/McGraw Hill Reviewer Code: 1. X The program meets

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

Re-evaluating the Role of Bleu in Machine Translation Research

Re-evaluating the Role of Bleu in Machine Translation Research Re-evaluating the Role of Bleu in Machine Translation Research Chris Callison-Burch Miles Osborne Philipp Koehn School on Informatics University of Edinburgh 2 Buccleuch Place Edinburgh, EH8 9LW callison-burch@ed.ac.uk

More information

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cl] 2 Apr 2017 Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,

More information

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH Don McAllaster, Larry Gillick, Francesco Scattone, Mike Newman Dragon Systems, Inc. 320 Nevada Street Newton, MA 02160

More information

Investigation on Mandarin Broadcast News Speech Recognition

Investigation on Mandarin Broadcast News Speech Recognition Investigation on Mandarin Broadcast News Speech Recognition Mei-Yuh Hwang 1, Xin Lei 1, Wen Wang 2, Takahiro Shinozaki 1 1 Univ. of Washington, Dept. of Electrical Engineering, Seattle, WA 98195 USA 2

More information

Deep Neural Network Language Models

Deep Neural Network Language Models Deep Neural Network Language Models Ebru Arısoy, Tara N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran IBM T.J. Watson Research Center Yorktown Heights, NY, 10598, USA {earisoy, tsainath, bedk, bhuvana}@us.ibm.com

More information

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Dublin City Schools Mathematics Graded Course of Study GRADE 4 I. Content Standard: Number, Number Sense and Operations Standard Students demonstrate number sense, including an understanding of number systems and reasonable estimates using paper and pencil, technology-supported

More information

The development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach

The development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach BILINGUAL LEARNERS DICTIONARIES The development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach Mark VAN MOL, Leuven, Belgium Abstract This paper reports on the

More information

Mandarin Lexical Tone Recognition: The Gating Paradigm

Mandarin Lexical Tone Recognition: The Gating Paradigm Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

The RWTH Aachen University English-German and German-English Machine Translation System for WMT 2017

The RWTH Aachen University English-German and German-English Machine Translation System for WMT 2017 The RWTH Aachen University English-German and German-English Machine Translation System for WMT 2017 Jan-Thorsten Peter, Andreas Guta, Tamer Alkhouli, Parnia Bahar, Jan Rosendahl, Nick Rossenbach, Miguel

More information

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda Content Language Objectives (CLOs) Outcomes Identify the evolution of the CLO Identify the components of the CLO Understand how the CLO helps provide all students the opportunity to access the rigor of

More information

Common Core Standards Alignment Chart Grade 5

Common Core Standards Alignment Chart Grade 5 Common Core Standards Alignment Chart Grade 5 Units 5.OA.1 5.OA.2 5.OA.3 5.NBT.1 5.NBT.2 5.NBT.3 5.NBT.4 5.NBT.5 5.NBT.6 5.NBT.7 5.NF.1 5.NF.2 5.NF.3 5.NF.4 5.NF.5 5.NF.6 5.NF.7 5.MD.1 5.MD.2 5.MD.3 5.MD.4

More information

The taming of the data:

The taming of the data: The taming of the data: Using text mining in building a corpus for diachronic analysis Stefania Degaetano-Ortlieb, Hannah Kermes, Ashraf Khamis, Jörg Knappen, Noam Ordan and Elke Teich Background Big data

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Training and evaluation of POS taggers on the French MULTITAG corpus

Training and evaluation of POS taggers on the French MULTITAG corpus Training and evaluation of POS taggers on the French MULTITAG corpus A. Allauzen, H. Bonneau-Maynard LIMSI/CNRS; Univ Paris-Sud, Orsay, F-91405 {allauzen,maynard}@limsi.fr Abstract The explicit introduction

More information

On the Combined Behavior of Autonomous Resource Management Agents

On the Combined Behavior of Autonomous Resource Management Agents On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science

More information

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282) B. PALTRIDGE, DISCOURSE ANALYSIS: AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC. 2012. PP. VI, 282) Review by Glenda Shopen _ This book is a revised edition of the author s 2006 introductory

More information

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds

More information

Providing student writers with pre-text feedback

Providing student writers with pre-text feedback Providing student writers with pre-text feedback Ana Frankenberg-Garcia This paper argues that the best moment for responding to student writing is before any draft is completed. It analyses ways in which

More information

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models Jung-Tae Lee and Sang-Bum Kim and Young-In Song and Hae-Chang Rim Dept. of Computer &

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels Jörg Tiedemann Uppsala University Department of Linguistics and Philology firstname.lastname@lingfil.uu.se Abstract

More information

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Vivek Kumar Rangarajan Sridhar, John Chen, Srinivas Bangalore, Alistair Conkie AT&T abs - Research 180 Park Avenue, Florham Park,

More information

Radius STEM Readiness TM

Radius STEM Readiness TM Curriculum Guide Radius STEM Readiness TM While today s teens are surrounded by technology, we face a stark and imminent shortage of graduates pursuing careers in Science, Technology, Engineering, and

More information

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS Julia Tmshkina Centre for Text Techitology, North-West University, 253 Potchefstroom, South Africa 2025770@puk.ac.za

More information

Regression for Sentence-Level MT Evaluation with Pseudo References

Regression for Sentence-Level MT Evaluation with Pseudo References Regression for Sentence-Level MT Evaluation with Pseudo References Joshua S. Albrecht and Rebecca Hwa Department of Computer Science University of Pittsburgh {jsa8,hwa}@cs.pitt.edu Abstract Many automatic

More information

Language Center. Course Catalog

Language Center. Course Catalog Language Center Course Catalog 2016-2017 Mastery of languages facilitates access to new and diverse opportunities, and IE University (IEU) considers knowledge of multiple languages a key element of its

More information

Cross-lingual Text Fragment Alignment using Divergence from Randomness

Cross-lingual Text Fragment Alignment using Divergence from Randomness Cross-lingual Text Fragment Alignment using Divergence from Randomness Sirvan Yahyaei, Marco Bonzanini, and Thomas Roelleke Queen Mary, University of London Mile End Road, E1 4NS London, UK {sirvan,marcob,thor}@eecs.qmul.ac.uk

More information

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode Diploma Thesis of Michael Heck At the Department of Informatics Karlsruhe Institute of Technology

More information

Constructing Parallel Corpus from Movie Subtitles

Constructing Parallel Corpus from Movie Subtitles Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

Calibration of Confidence Measures in Speech Recognition

Calibration of Confidence Measures in Speech Recognition Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

Combining Bidirectional Translation and Synonymy for Cross-Language Information Retrieval

Combining Bidirectional Translation and Synonymy for Cross-Language Information Retrieval Combining Bidirectional Translation and Synonymy for Cross-Language Information Retrieval Jianqiang Wang and Douglas W. Oard College of Information Studies and UMIACS University of Maryland, College Park,

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

Corpus Linguistics (L615)

Corpus Linguistics (L615) (L615) Basics of Markus Dickinson Department of, Indiana University Spring 2013 1 / 23 : the extent to which a sample includes the full range of variability in a population distinguishes corpora from archives

More information

Learning and Retaining New Vocabularies: The Case of Monolingual and Bilingual Dictionaries

Learning and Retaining New Vocabularies: The Case of Monolingual and Bilingual Dictionaries Learning and Retaining New Vocabularies: The Case of Monolingual and Bilingual Dictionaries Mohsen Mobaraki Assistant Professor, University of Birjand, Iran mmobaraki@birjand.ac.ir *Amin Saed Lecturer,

More information

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools Dr. Amardeep Kaur Professor, Babe Ke College of Education, Mudki, Ferozepur, Punjab Abstract The present

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment Akiko Sakamoto, Kazuhiko Abe, Kazuo Sumita and Satoshi Kamatani Knowledge Media Laboratory,

More information

Deploying Agile Practices in Organizations: A Case Study

Deploying Agile Practices in Organizations: A Case Study Copyright: EuroSPI 2005, Will be presented at 9-11 November, Budapest, Hungary Deploying Agile Practices in Organizations: A Case Study Minna Pikkarainen 1, Outi Salo 1, and Jari Still 2 1 VTT Technical

More information

Literature and the Language Arts Experiencing Literature

Literature and the Language Arts Experiencing Literature Correlation of Literature and the Language Arts Experiencing Literature Grade 9 2 nd edition to the Nebraska Reading/Writing Standards EMC/Paradigm Publishing 875 Montreal Way St. Paul, Minnesota 55102

More information

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Project in the framework of the AIM-WEST project Annotation of MWEs for translation Project in the framework of the AIM-WEST project Annotation of MWEs for translation 1 Agnès Tutin LIDILEM/LIG Université Grenoble Alpes 30 october 2014 Outline 2 Why annotate MWEs in corpora? A first experiment

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS Ruslan Mitkov (R.Mitkov@wlv.ac.uk) University of Wolverhampton ViktorPekar (v.pekar@wlv.ac.uk) University of Wolverhampton Dimitar

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract

More information

CS 446: Machine Learning

CS 446: Machine Learning CS 446: Machine Learning Introduction to LBJava: a Learning Based Programming Language Writing classifiers Christos Christodoulopoulos Parisa Kordjamshidi Motivation 2 Motivation You still have not learnt

More information

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016 AGENDA Advanced Learning Theories Alejandra J. Magana, Ph.D. admagana@purdue.edu Introduction to Learning Theories Role of Learning Theories and Frameworks Learning Design Research Design Dual Coding Theory

More information

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Life and career planning

Life and career planning Paper 30-1 PAPER 30 Life and career planning Bob Dick (1983) Life and career planning: a workbook exercise. Brisbane: Department of Psychology, University of Queensland. A workbook for class use. Introduction

More information

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and

More information

Matching Similarity for Keyword-Based Clustering

Matching Similarity for Keyword-Based Clustering Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web

More information

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de

More information

Abbey Academies Trust. Every Child Matters

Abbey Academies Trust. Every Child Matters Abbey Academies Trust Every Child Matters Amended POLICY For Modern Foreign Languages (MFL) September 2005 September 2014 September 2008 September 2011 Every Child Matters within a loving and caring Christian

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation

The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation 2014 14th International Conference on Frontiers in Handwriting Recognition The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation Bastien Moysset,Théodore Bluche, Maxime Knibbe,

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand

Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand 1 Introduction Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand heidi.quinn@canterbury.ac.nz NWAV 33, Ann Arbor 1 October 24 This paper looks at

More information

What the National Curriculum requires in reading at Y5 and Y6

What the National Curriculum requires in reading at Y5 and Y6 What the National Curriculum requires in reading at Y5 and Y6 Word reading apply their growing knowledge of root words, prefixes and suffixes (morphology and etymology), as listed in Appendix 1 of the

More information

Module 9: Performing HIV Rapid Tests (Demo and Practice)

Module 9: Performing HIV Rapid Tests (Demo and Practice) Module 9: Performing HIV Rapid Tests (Demo and Practice) Purpose To provide the participants with necessary knowledge and skills to accurately perform 3 HIV rapid tests and to determine HIV status. Pre-requisite

More information

MISSISSIPPI OCCUPATIONAL DIPLOMA EMPLOYMENT ENGLISH I: NINTH, TENTH, ELEVENTH AND TWELFTH GRADES

MISSISSIPPI OCCUPATIONAL DIPLOMA EMPLOYMENT ENGLISH I: NINTH, TENTH, ELEVENTH AND TWELFTH GRADES MISSISSIPPI OCCUPATIONAL DIPLOMA EMPLOYMENT ENGLISH I: NINTH, TENTH, ELEVENTH AND TWELFTH GRADES Students will: 1. Recognize main idea in written, oral, and visual formats. Examples: Stories, informational

More information

A High-Quality Web Corpus of Czech

A High-Quality Web Corpus of Czech A High-Quality Web Corpus of Czech Johanka Spoustová, Miroslav Spousta Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics Charles University Prague, Czech Republic {johanka,spousta}@ufal.mff.cuni.cz

More information

Initial approaches on Cross-Lingual Information Retrieval using Statistical Machine Translation on User Queries

Initial approaches on Cross-Lingual Information Retrieval using Statistical Machine Translation on User Queries Initial approaches on Cross-Lingual Information Retrieval using Statistical Machine Translation on User Queries Marta R. Costa-jussà, Christian Paz-Trillo and Renata Wassermann 1 Computer Science Department

More information

A hybrid approach to translate Moroccan Arabic dialect

A hybrid approach to translate Moroccan Arabic dialect A hybrid approach to translate Moroccan Arabic dialect Ridouane Tachicart Mohammadia school of Engineers Mohamed Vth Agdal University, Rabat, Morocco tachicart@gmail.com Karim Bouzoubaa Mohammadia school

More information

Discriminative Learning of Beam-Search Heuristics for Planning

Discriminative Learning of Beam-Search Heuristics for Planning Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University

More information

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade The third grade standards primarily address multiplication and division, which are covered in Math-U-See

More information

TIMSS Highlights from the Primary Grades

TIMSS Highlights from the Primary Grades TIMSS International Study Center June 1997 BOSTON COLLEGE TIMSS Highlights from the Primary Grades THIRD INTERNATIONAL MATHEMATICS AND SCIENCE STUDY Most Recent Publications International comparative results

More information

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA Alta de Waal, Jacobus Venter and Etienne Barnard Abstract Most actionable evidence is identified during the analysis phase of digital forensic investigations.

More information