A Hackathon for Classical Tibetan
|
|
- Brianna Tyler
- 5 years ago
- Views:
Transcription
1 A Hackathon for Classical Tibetan Orna Almogi, Lena Dankin, Nachum Dershowitz, Lior Wolf To cite this version: Orna Almogi, Lena Dankin, Nachum Dershowitz, Lior Wolf. A Hackathon for Classical Tibetan., Episciences.org, 2019, Special Issue on Computer- Aided Processing of Intertextuality in Ancient Languages. <hal v3> HAL Id: hal Submitted on 30 Dec 2018 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. Public Domain L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
2 A Hackathon for Classical Tibetan Orna Almogi 1, Lena Dankin 2*, Nachum Dershowitz 2,3, Lior Wolf 2 1 Universität Hamburg, Germany 2 Tel Aviv University, Israel 3 Institut d Études Avancées de Paris, France * Corresponding author: Lena Dankin, lenadank@tau.ac.il Abstract We describe the course of a hackathon dedicated to the development of linguistic tools for Tibetan Buddhist studies. Over a period of five days, a group of seventeen scholars, scientists, and students developed and compared algorithms for intertextual alignment and text classification, along with some basic language tools, including a stemmer and word segmenter. Keywords Tibetan; Buddhist studies; hackathon; stemming; segmentation; intertextual alignment; text classification. I INTRODUCTION In February 2016, a group of four Tibetologists (from the University of Hamburg), one digital humanities scholar (from Europe), and twelve computer scientists (from Israel and Europe) got together in Kibbutz Lotan in the Arava region of Israel with the stated goal of developing algorithmic methods for advancing Tibetan Buddhist textual studies. Participants were either recruited by the organizers or responded to an announcement on several mailing lists. See Figure 1. Most of the computer scientists had background in machine learning, and a few of them also had experience with natural language processing (NLP) research, but without any prior experience with Tibetan texts. The computer scientist organizers were quite familiar with programming workshops and contests and thought that the challenges presented by Tibetan texts would pose an ideal opportunity to explore the hackathon format. The hackathon is a short and intense event where computer scientists collaborate to develop software. For that purpose, it was essential to recruit as many software developers as possible. Some of the recruited students had participated in other hackathons. The plan was to have a focused hacking event, with specific goals to work towards, goals that had been provided by the Tibetan scholars. The six-hour drive down from Tel Aviv (including a stop to admire desert flora) afforded an opportunity for everyone to get to know each other. The back seats of the van were piled high with computer equipment and the kibbutz was to provide the necessary fast internet connection. The isolation of the kibbutz created an intense working environment and encouraged long hours; the stark natural beauty of the location contributed to a shared sense of tranquility of purpose. Several hackathons have been conducted for the purpose of the development of tools for digital humanities, before and after ours. In June 2015, the etrap team organized a 1
3 hackathon for text reuses ( Twenty-three participants from fifteen different institutes worked on the detection of textual reuses across data in different languages and from different genres, using the TRACER tool [Büchler, 2013; Büchler et al., 2014]. More recently, in May 2017, another hackathon took place in Helsinki, which brought together historians, linguists, psychologists, and computer scientists to work on four different tasks, including the analysis of the written media with the political elite ( In November 2017, The National Library of Israel hosted a 24-hour hackathon dedicated to the goal of developing state-of-the-art tools and applications for national cultural treasures using a iiif server for their large collection of images ( The two main tasks that confronted our group that week (February 14-18) were (1) to develop algorithms for finding intertextual parallels that are only approximately the same, and (2) to experiment with algorithmic classification methods for identifying authorship and style. In both cases, the concern was centered on language issues specific to Tibetan. After a quick lesson in Tibetan, the Buddhist canon, and modern Tibetan encoding conventions for the benefit of the less knowledgeable, the group split into four loose teams, devoted to the following goals: (A) dataset preparation; (B) language tool development; (C) intertextual alignment; and (D) text classification. We describe each of these efforts in turn in the sections that follow. Each team consisted of a few computer scientists, chosen based on the individual background, experience, and interests, plus a Tibetan scholar who provided annotated data sets and analyzed results. Twice a day we held synchronization round-ups, where each team briefed everyone about their progress, discussed their next steps, and raised problems they stumbled across. II PRELIMINARIES Tibetan is a monosyllabic language (Tibetan morphemes normally consist of one syllable) belonging to the Tibeto-Burman branch of the Sino-Tibetan family. The language is ergative, with a plethora of (usually monosyllabic) grammatical particles, which are often omitted. Occasionally, the same syllable can be written using one of several orthographic variations, for example, sogs and stsogs. In the case of verbs, the syllable has various inflectional forms that are often homophones, a fact that can result in variants in reading due to scribal errors or lack of standardization. An example of such inflectional forms is sgrub, bsgrubs, bsgrub, sgrubs (present, past, future and imperative, respectively), all of which are homophones. The intransitive form of the verb offers even more inflectional forms that yield homophones with their transitive counterpart, ʼgrub and grub (present/future and past, respectively). See [Beyer 1992] for details about the language. The Tibetan Buddhist canon consists of two parts: the Kangyur (bkaʼ ʼgyur), which commonly comprises 108 volumes containing what is believed by tradition to be the Word of the Buddha, texts that were mostly translated directly from the Sanskrit original (with some from other languages and others indirectly via Chinese); and the Tengyur (bstan ʼgyur), commonly comprising about 210 volumes consisting of canonical commentaries, treatises, and various kinds of manuals that were written in the seventh to thirteenth centuries and likewise mostly translated from Sanskrit, with some works from other languages and a few originally written in Tibetan. Overall, this corpus contains 77 million occurrences (tokens) of 81,000 different syllable types. The average transcribed syllable length is 3.5 and the average number of syllables in a single document is
4 III HACKATHON TASKS A. Dataset preparation A prerequisite for the main goals of the hackathon was data with which to work, that is, texts to compare and classify. For this, we took Tibetan Buddhist texts obtained from various sources. These included the Tibetan Buddhist canon in digital form (we used a modified form of the ACIP files of the Kangyur and Tengyur provided by Paul Hackett of Columbia University) and several sets of autochthonous Tibetan Buddhist texts of various authors (compiled by Eric Werner of Universität Hamburg). In addition, it was necessary to prepare test suites with manually prepared gold standard answers, so that the performance of algorithms for finding parallel passages and for classifying texts could be measured. The passages were selected from various sources, particularly from (a) two doxographical texts (ʼgrub mthaʼ), the gzhung lugs rnam byed by Phywa pa Chos kni sengge ( ) and the ʼGrub mthaʼ mdzod by Klong chen pa Dri med ʼod zer ( ), the latter including borrowed passages from the former [Werner, 2014], and (b) Rong zom Chos kyi bzang poʼs (11th c.) collected writings, which features numerous cases of parallel passages. These Tibetan works were provided in textual form, transcribed according to the Wylie convention [Wylie, 1959]. In this system, Tibetan is transliterated into Latin characters without diacritics; thus various Tibetan letters are represented by two or three Latin consonants. The decision to work with transliterated texts was made partly because they were the ones available at the time, but also because the computer scientists didn t understand Tibetan script, so this transliteration made it possible for them to progress quickly without the need to acquire a new alphabet. The texts had to be cleaned by removing sigla and by standardizing punctuation. B. Language tools Since syllables having the same base form may take many different surface forms, stemming is a crucial stage in almost every text-processing task one would like to perform in Tibetan, as for many other languages. So, to support present and future analysis of Tibetan texts, developing a stemmer was one of the first orders of business. Usually, in Indo-European and Semitic languages, stemming is performed on the word level. However, in Tibetan, in which multisyllabic words are not separated by spaces or other marks, a syllable-based stemming mechanism is required even in order to segment the text into lexical items. Stemming is not the same as (grammatical) lemmatization, and the stemming process can result in a stem that is not itself a lexical entry in a dictionary. Moreover, unlike Indo-European languages, stemming of Tibetan is mostly relevant to verbs and verbal nouns (which are common in the language). Despite being inaccurate in some cases, stemming (for Tibetan, as for other languages) can improve tasks such as word segmentation and the detection of intertextual parallels [Klein et al., 2014]. Even for Tibetan words consisting of more than one syllable, stemming each substantial syllable (i.e. excluding grammatical particles) makes sense since all the inflections are embedded at the syllable level. For instance, the words brtag dbyad (analysis) and brtags dpyad (analyzed) are stemmed to rtog dpyod (to analyze, analysis). 3
5 The stemmer we developed is a rule-based application that works in the following manner: first, the syllable is divided into a sequence of Tibetan letters. This stage is required because the Wylie transliteration scheme represents some Tibetan letters by more than one character (e.g. zh, tsh). There is, fortunately, no ambiguity in the process of segmentation into Tibetan letters. By design, the transliteration ensures that whenever a sequence of two or three characters represents a single letter, it cannot also be interpreted in context as a sequence of distinct Tibetan letters. For the analysis of the Tibetan syllable we used an octuple (8-component) scheme: Each Tibetan syllable should contain one core letter and one vowel. Other positions (subscript, superscript, coda, prescript, postscript, and appended particle) are not obligatory. Each position contains a single letter, except for that of the appended particle, which can be any of six syllables. The stem of a syllable is defined by us as consisting of the core letter or stacked letter (which, in turn, consists of the core letter and a superscript or a subscript, or both), the vowel (syllabic contractions contain two vowels at most), and the coda (if extant). Syllables can be considered stemmically identical if these are consistent, despite additions or omissions of a prescript and/or a postscript. The final stage of the stemming is normalization, since there are groups of Tibetan letters that can be replaced one with another without changing the basic meaning of the syllable (in inflectional forms). Since the goal is to group all syllables that are ultimately stemmically identical into one and the same stem, we normalized all tuples according to an elaborate set of rules. The stemmer, as described, extracts the information encoded in each Wylie transliterated syllable and makes it explicit. An important task, given two syllables, is to evaluate their stemmic similarity. Some substitutions can be considered silent or synonymous; others change the meaning completely; and there is a continuous spectrum in between. Metric learning algorithms were used to assess the relative importance of each substitution. Another important language task is word segmentation, that is, grouping syllables into words (lexical units). Since no spaces or special characters are used to mark word boundaries, the reader has to rely on language models to detect the word boundaries. As opposed to the stemming task, we had recourse to an annotated corpus for the segmentation task, that is, a word-segmented corpus, with which it was possible to train a supervised model. The training data that was used, consisting of 37,000 sentences, was obtained from the Tibetan in Digital Communication project ( The approach taken at the hackathon was based on a flavor of recurrent neural networks (RNNs) called long short-term memory (LSTM) [Hochreiter & Schmidhuber, 1997]. LSTMs have been used in the past for word segmentation of Chinese text [Chen et al., 2015]. The tuple representation of syllables was used for this purpose; see details in [Almogi et al., 2016]. Several LSTM setups were compared; the best configuration yielded an F1 score of In addition, a more traditional algorithm, the conditional random field (CRF), was applied to the data, yielding a lower F1 score of This technique was previously applied on Tibetan script in [Liu et al., 2011]. It bears noting that our efforts to train a word2vec model [Mikolov et al., 2013] to represent Tibetan syllables did not result in a solid representation, in the sense that pairs of vectors with high (cosine) similarity did not usually represent synonyms. For that reason, the vector representation that was developed for the stemmer was also essential for the word segmentation task. 4
6 Both the stemmer and word segmenter have been made publicly available and can be accessed from Additional details may be found in [Almogi et al., 2016]. C. Intertextual alignment The primary goal of the hackathon was to develop and compare tools for finding parallel passages between Tibetan texts that are the result of either acknowledged citations (with or without attributions) or borrowing (i.e. with no acknowledgement whatsoever). Generally, for determining the history of composition or relative chronology of a text, passages need not match precisely. That is, in addition to the fact that orthographical differences or omission/addition of grammatical particles are of no great significance, it is often the case that cited or borrowed passages are not necessarily reproduced verbatim, but are often slightly paraphrased or shortened, or both. For determining the identity of persons involved in the composition of the text and its transmission that is, the author, translator, scribe, or editor the precision of the match is of greater significance, and even variation in orthography or omission/addition of grammatical particles may be relevant. In this regard, however, textual scholars take into consideration that texts were often copied and edited and that through these processes changes could have been introduced into the text, either deliberately particularly in terms of standardization of orthography and verb inflection, employment of particles, and even substitutions of terminology in cases of archaism or unintentionally. Broadly speaking, there are two cases of interest: (a) an approximate alignment of what could be considered to be exactly the same text, that is, an alignment that allows variants that are considered accidental or non-substantial (that is, variations regarding omission/addition or different forms of the same grammatical particles, orthography, inflectional forms in the case of verbs, archaism vs. standardization, and the like), and (b) an approximate alignment of passages that contained the same text but in modified form of some sort, that is, an alignment that allows substantial variants in addition to the non-substantial ones (omission/addition of a substantial syllable, replacement of a substantial syllable by a completely different one, omission/addition of a string of syllables, occurrence of the same syllables in a different order, and the like). To address the problem of substantial variants that could occur also when a (more or less) exact citation or borrowing was intended, that is, such that have been intentionally introduced by either the author himself or by the scribes and editors during the process of transmission, or such that have been unintentionally crept in during the processes of composition and copying, a limited number of substantial variants must be admitted as well. Three algorithms competed with one another on this task during the hackathon. 1. One algorithm was TRACER [Büchler, 2013; Büchler et al., 2014], based on the bag of words representation method. TRACER is a general text reuse detection algorithm with a seven-level architecture. Each step is configurable and can be optimized to specific text reuse tasks and corpora. The steps are preprocessing, featuring, selection, scoring, and post-processing. This approach is called feature-based linking, where only text-reuse units with shared features are compared, as opposed to the comparison of the full text of passages, all against all. All passages are compared by comparing the words they contain, ignoring word order. 2. Another method was based on Agents for Actors (AfA) [Küster, 2013], a digital humanities framework for distributed microservices for text analysis. AfA was originally developed 5
7 for the purpose of identifying allusions to Shakespearean passages in transcriptions of dialogues in films (hence actors in its name). This algorithm compares passages both on the letter and the word level, and therefore catches variations at the orthographic and formulation levels, respectively. While its primary use is to identify references and allusions in texts, in the hackathon, the algorithm was tested to see how well it can also serve to identify parallel passages for very different types of texts in an unrelated language. 3. The third approach was based on an adaptation of the method of [Barsky et al., 2008], designed for matching DNA subsequences, to our problem, as described in [Klein et al., 2014]. This algorithm looks for all against all approximate matches (within some given threshold of difference between passages) by rephrasing the problem as finding maximal paths in a matching graph. That method was modified during the hackathon to work with syllable stems as the basic building block, rather than the individual character level used before. This change improved both the run time and the quality of the results. Since, on average, a syllable has 4 characters, the speedup was two orders of magnitude. As for the results, p@10 ( precision at ten, the fraction of the top ten results that are of relevance) increased from 0.67 to 1, and p@20 increased from 0.37 to The improvement were due to the fact that with character-wise alignment syllables can share many letters but have no semantic similarity; see [Labenski et al., 2016; Labenski, 2016]. An infrastructure subteam, in addition to keeping everything up and running, parallelized the implementation of the third algorithm to run on a Sparc cluster of computers, located at Tel Aviv University. This is necessary for the ultimate goal, considering the large size of the corpus. The idea is simple: divide the texts into overlapping chunks; then run the original algorithm on all chunks in parallel; finally, piece all the results together. All three algorithms were tested on a test set that was designed during the hackathon. The two doxological texts mentioned above and known to contain many shared passages were chosen, and 24 pairs of parallel passages were manually annotated. Out of the 24 pairs, the TRACER algorithm retrieved 13 pairs, the AFA algorithm retrieved 12 pairs, and the APBT algorithm retrieved 16. By finding cited or borrowed passages within the corpora of Indo-Tibetan (i.e. translated) and Tibetan (i.e. autochthonous) Buddhist literature, several research questions can be better addressed: determining the history of composition of individual texts; determining relative chronology of groups of texts; determining the intellectual scholarly milieu in which the texts emerged; and determining the intellectual history behind the texts (viz. terminology and concepts). After identifying parallel passages, one can assess the frequencies of letter/syllable/word replacements in the aligned passages of selected texts or text groups. This can serve to help answer further research questions like: determining editorial policies and processes, such as standardization of orthography, standardization of employment of grammatical particles (i.e. according to the so-called sandhi rules); and identifying processes of revisions of translated texts. D. Text classification 6
8 The second major task that was addressed at the hackathon was the question of author profiling. While the question as to what extent the issue of authorship can be addressed in the case of translated texts is yet to be looked into carefully, some general research questions related to authorship fall under the purview of machine classification. These include the following: (a) distinguishing between translated texts and autochthonous texts; (b) identifying the period in which a text was composed, viz. Old Tibetan (7 11th c.), Classical Tibetan I (11 14th c.), or Classical Tibetan II (15 20th c.); (c) determining whether a translated canonical work belongs to the early period of translation (snga ʼgyur) or the new period (phyi ʼgyur); (d) in the case of autochthonous literature, differentiating between the so-called revealed texts (texts that are portrayed as having been transmitted supernaturally) versus composed texts; and (e) identifying an author s intellectual milieu (e.g. affiliation with a particular school of thought). A series of experiments were performed on scriptures and treatises, early and late, translated and autochthonous texts. We tried several methods, including bag-of-word features and a perceptron classifier with stochastic gradient descent with features similar to [Volansky et al., 2015], mainly: mean syllable length; mean sentence length; frequency of verbal prefixes and function words; frequency of foreign (Sanskrit) words; and type-to-token ratio. For authorship detection, we first used an automatic word segmenter and then used n-gram frequency and bag-of-words as features. Such a method was shown to be useful in [Koppel et al., 2008]. We didn t advance further in this task, due to a shortage of time. Both parts of the canon were employed as training data to determine features that are peculiar for the Kangyur, the corpus containing scriptures, on the one hand, and the Tengyur, the corpus containing treatises, commentaries, manuals and the like, on the other. Numerous autochthonous texts, including the entire collected writings of Rong zom Chos kyi bzang po, the entire collected writings of Shākya mchog ldan ( ), several works by Sa kya paṇḍi ta Kun dgaʼ rgyal mtshan ( ), and several texts by Tsong kha pa Blo bzang grags pa ( ) were tested against the translated canonical texts in order to determine features of translated versus autochthonous works. In addition, selected individual texts were tested. For example, Sa skya paṇdi ta s Tshad ma rigs gter was compared with Dharmakīrti s (7th c.) Pramāṇavarttika in Tibetan translation, which enabled a comparison of autochthonous versus translated work on similar topics. The Mañjuśrīnāmasaṅgīti commentary ascribed to Rong zom pa (and at the same time included in the Tengyur as an Indian work in Tibetan translation) was compared with the canon in its entirety, as was the Tengyur alone with other works by Rong zom pa and additional autochthonous works, which provided a comparison of works whose origin has been considered doubtful with translated and autochthonous literature. The classification results are undergoing analysis by the Tibetan scholars. 7
9 Figure 1. Poster announcement of the hackathon. IV CONCLUSION The intense hackathon format proved to be quite exhilarating. Towards evening, each group reported on the day s accomplishments and vicissitudes. No single task was actually brought to completion on site, but the saplings were planted, and the ideas and prototype tools have continued to grow and develop in the ensuing weeks. 8
10 Based on our experience, we would recommend such a hackathon format for other welldefined interdisciplinary efforts in the computational humanities. It pays to come wellprepared to the event with clear goals and clean test data. And it is crucial to allocate resources for bringing the products and results of the hackathon to a stable and useful state after the event. As a matter of fact, the authors held a second hackathon one year later (February 2017) on a kibbutz in the Galilee, again for the development of tools for Tibetan Buddhist texts, but this time concentrating on manuscripts and computer-vision aspects. Acknowledgements We thank the staff at Kibbutz Lotan and all the hackathon participants (listed below). This research was supported in part by a grant (#I ) from the German-Israeli Foundation for Scientific Research and Development, and by the Khyentse Center for Tibetan uddhist Textual Scholarship, niversit t Hamburg, thanks to a grant by the Khyentse Foundation. N.D. s research benefitted from a fellowship at the Paris Institute for Advanced Studies (France), with the financial support of the French state, managed by the French National Research Agency s Investissements d avenir program (ANR-11-LABX Labex RFIEA+). Hackathon participants: rna Almogi, Kfir ar, Marco üchler, Lena Dankin, Nachum Dershowitz, Daniel Hershcovich, Yair Hoffman, Marc W. Küster, Daniel Labenski, Peter Naftaliev, Dimitri Pauls, Elad Shaked, Nadav Steiner, Lior Uzan, Dorji Wangchuk, Eric Werner, and Lior Wolf. Participating institutions: Tel Aviv University (School of Computer Science); Universität Hamburg (Khyentse Center for Tibetan Buddhist Textual Scholarship, Department for Indian and Tibetan Studies); Georg-August-Universität Göttingen (Göttingen Centre for Digital Humanities). References rna Almogi, Lena Dankin, Nachum Dershowitz, Yair Hoffman, Dimitri Pauls, Dorji Wangchuk, Lior Wolf, Stemming and segmentation for classical Tibetan, in: Revised Selected Papers of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing), Konya, Turkey (April 2016), Part I, A. Gelbukh, ed., Lecture Notes in Computer Science, vol. 9623, Springer-Verlag, Switzerland, pp , URL Marina arsky, lrike Stege, Alex Thomo, and Chris pton, A graph approach to the threshold all-against-all substring matching problem, ACM Journal of Experimental Algorithmics 12, Article 1.10, Stephan V. Beyer, The Classical Tibetan Language, SUNY Press, Albany, NY, Marco Büchler, Informationstechnische Aspekte des Historical Text Re-use, Ph.D. thesis, Fakultät für Mathematik und Informatik, Universität Leipzig, Germany, March Marco Büchler, Greta Franzini, Emily Franzini, and Maria Moritz, Scaling historical text re-use, in: Proceedings of the IEEE International Conference on Big Data 2014 (IEEE BigData 2014), pp , October Xinchi Chen, Xipeng Qiu, Chenxi Zhu, Pengfei Liu, and Xuanjing Huang, Long short-term memory neural networks for Chinese word segmentation, in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), Lisbon, Portugal, pp , September Sepp Hochreiter and ürgen Schmidhuber, Long short-term memory, Neural Comput. 9(8): , November
11 enjamin Klein, Nachum Dershowitz, Lior Wolf, rna Almogi, and Dorji Wangchuk, Finding inexact quotations within a Tibetan uddhist corpus, in: Digital Humanities (DH) 2014, pp , Lausanne, Switzerland, July URL Moshe Koppel, Jonathan Schler and Eran Messeri, Authorship attribution in law enforcement scenarios, in: Security Informatics and Terrorism - Patrolling the Web, P. Cantor and B. Shapira (Eds), IOS Press NATO Series. Marc W. Küster, Agents for Actors: A Digital Humanities framework for distributed microservices for text linking and visualization, in: Digital Humanities (DH) 2013, University of Nebraska Lincoln, pp , July Daniel Labenski, Finding Inter-textual Relations in Historical Texts, M.Sc. thesis, School of Computer Science, Tel Aviv University, Israel, URL Daniel Labenski, Elad Shaked, rna Almogi, Lena Dankin, Nachum Dershowitz, and Lior Wolf, Intertextuality in Tibetan texts (Abstract), in: Israeli Seminar on Computational Linguistics (ISCOL), Haifa, Israel, May URL Huidan Liu, Minghua Nuo, Longlong Ma, ian Wu, Yeping He, Tibetan word segmentation as syllable tagging using conditional random field, in Proceedings of The 25th Pacific Asia Conference on Language, Information and Computation (PACLIC 2011), pages , 2011 Thomas Mikolov, Kai Chen, Greg S. Corrado, and effrey Dean, Efficient estimation of word representations in vector space, arxiv: [cs.cl], Vered Volansky, Noam rdan, and Shuly Wintner, n the features of translationese, Digital Scholarship in the Humanities 30(1): , April Eric Werner, Phywa-pa Chos-kyi-seng-ge s ( ) depiction of Mahāyāna philosophy: A critical edition and annotated translation of the chapters on Yogācāra and Mādhyamaka philosophy from the gzhung lugs rnam byed, a doxography of the twelfth century, M.A. thesis, University of Hamburg, Germany, Turrell V. Wylie, A standard system of Tibetan transcription, Harvard Journal of Asiatic Studies 22: , December
OCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationDesigning Autonomous Robot Systems - Evaluation of the R3-COP Decision Support System Approach
Designing Autonomous Robot Systems - Evaluation of the R3-COP Decision Support System Approach Tapio Heikkilä, Lars Dalgaard, Jukka Koskinen To cite this version: Tapio Heikkilä, Lars Dalgaard, Jukka Koskinen.
More informationTowards a MWE-driven A* parsing with LTAGs [WG2,WG3]
Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Jakub Waszczuk, Agata Savary To cite this version: Jakub Waszczuk, Agata Savary. Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]. PARSEME 6th general
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationTeachers response to unexplained answers
Teachers response to unexplained answers Ove Gunnar Drageset To cite this version: Ove Gunnar Drageset. Teachers response to unexplained answers. Konrad Krainer; Naďa Vondrová. CERME 9 - Ninth Congress
More informationConstructing Parallel Corpus from Movie Subtitles
Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing
More informationA Novel Approach for the Recognition of a wide Arabic Handwritten Word Lexicon
A Novel Approach for the Recognition of a wide Arabic Handwritten Word Lexicon Imen Ben Cheikh, Abdel Belaïd, Afef Kacem To cite this version: Imen Ben Cheikh, Abdel Belaïd, Afef Kacem. A Novel Approach
More informationProblems of the Arabic OCR: New Attitudes
Problems of the Arabic OCR: New Attitudes Prof. O.Redkin, Dr. O.Bernikova Department of Asian and African Studies, St. Petersburg State University, St Petersburg, Russia Abstract - This paper reviews existing
More informationSouth Carolina English Language Arts
South Carolina English Language Arts A S O F J U N E 2 0, 2 0 1 0, T H I S S TAT E H A D A D O P T E D T H E CO M M O N CO R E S TAT E S TA N DA R D S. DOCUMENTS REVIEWED South Carolina Academic Content
More informationEntrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany
Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Jana Kitzmann and Dirk Schiereck, Endowed Chair for Banking and Finance, EUROPEAN BUSINESS SCHOOL, International
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationAQUA: An Ontology-Driven Question Answering System
AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.
More informationNCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches
NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches Yu-Chun Wang Chun-Kai Wu Richard Tzong-Han Tsai Department of Computer Science
More informationGeorgetown University at TREC 2017 Dynamic Domain Track
Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain
More informationExperiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling
Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationSystem Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering
More informationarxiv: v1 [cs.cl] 2 Apr 2017
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,
More informationMandarin Lexical Tone Recognition: The Gating Paradigm
Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationLiterature and the Language Arts Experiencing Literature
Correlation of Literature and the Language Arts Experiencing Literature Grade 9 2 nd edition to the Nebraska Reading/Writing Standards EMC/Paradigm Publishing 875 Montreal Way St. Paul, Minnesota 55102
More informationProduct Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments
Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationUser Profile Modelling for Digital Resource Management Systems
User Profile Modelling for Digital Resource Management Systems Daouda Sawadogo, Ronan Champagnat, Pascal Estraillier To cite this version: Daouda Sawadogo, Ronan Champagnat, Pascal Estraillier. User Profile
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationDickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks
3rd Grade- 1st Nine Weeks R3.8 understand, make inferences and draw conclusions about the structure and elements of fiction and provide evidence from text to support their understand R3.8A sequence and
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationThe College Board Redesigned SAT Grade 12
A Correlation of, 2017 To the Redesigned SAT Introduction This document demonstrates how myperspectives English Language Arts meets the Reading, Writing and Language and Essay Domains of Redesigned SAT.
More informationWeb as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics
(L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationSoftware Maintenance
1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories
More informationDetecting English-French Cognates Using Orthographic Edit Distance
Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National
More informationP. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas
Exploiting Distance Learning Methods and Multimediaenhanced instructional content to support IT Curricula in Greek Technological Educational Institutes P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou,
More informationELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading
ELA/ELD Correlation Matrix for ELD Materials Grade 1 Reading The English Language Arts (ELA) required for the one hour of English-Language Development (ELD) Materials are listed in Appendix 9-A, Matrix
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationRule Learning with Negation: Issues Regarding Effectiveness
Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX
More informationQuickStroke: An Incremental On-line Chinese Handwriting Recognition System
QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents
More informationMETHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS
METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS Ruslan Mitkov (R.Mitkov@wlv.ac.uk) University of Wolverhampton ViktorPekar (v.pekar@wlv.ac.uk) University of Wolverhampton Dimitar
More informationCLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction
CLASSIFICATION OF PROGRAM Critical Elements Analysis 1 Program Name: Macmillan/McGraw Hill Reading 2003 Date of Publication: 2003 Publisher: Macmillan/McGraw Hill Reviewer Code: 1. X The program meets
More informationPOS tagging of Chinese Buddhist texts using Recurrent Neural Networks
POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important
More informationSmart Grids Simulation with MECSYCO
Smart Grids Simulation with MECSYCO Julien Vaubourg, Yannick Presse, Benjamin Camus, Christine Bourjot, Laurent Ciarletta, Vincent Chevrier, Jean-Philippe Tavella, Hugo Morais, Boris Deneuville, Olivier
More informationStudents concept images of inverse functions
Students concept images of inverse functions Sinéad Breen, Niclas Larson, Ann O Shea, Kerstin Pettersson To cite this version: Sinéad Breen, Niclas Larson, Ann O Shea, Kerstin Pettersson. Students concept
More informationFinding Translations in Scanned Book Collections
Finding Translations in Scanned Book Collections Ismet Zeki Yalniz Dept. of Computer Science University of Massachusetts Amherst, MA, 01003 zeki@cs.umass.edu R. Manmatha Dept. of Computer Science University
More informationThe IDN Variant Issues Project: A Study of Issues Related to the Delegation of IDN Variant TLDs. 20 April 2011
The IDN Variant Issues Project: A Study of Issues Related to the Delegation of IDN Variant TLDs 20 April 2011 Project Proposal updated based on comments received during the Public Comment period held from
More informationThe taming of the data:
The taming of the data: Using text mining in building a corpus for diachronic analysis Stefania Degaetano-Ortlieb, Hannah Kermes, Ashraf Khamis, Jörg Knappen, Noam Ordan and Elke Teich Background Big data
More informationThe Internet as a Normative Corpus: Grammar Checking with a Search Engine
The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a
More informationSpecification of a multilevel model for an individualized didactic planning: case of learning to read
Specification of a multilevel model for an individualized didactic planning: case of learning to read Sofiane Aouag To cite this version: Sofiane Aouag. Specification of a multilevel model for an individualized
More informationStefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
More informationSARDNET: A Self-Organizing Feature Map for Sequences
SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu
More informationStrategies for Solving Fraction Tasks and Their Link to Algebraic Thinking
Strategies for Solving Fraction Tasks and Their Link to Algebraic Thinking Catherine Pearn The University of Melbourne Max Stephens The University of Melbourne
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationParallel Evaluation in Stratal OT * Adam Baker University of Arizona
Parallel Evaluation in Stratal OT * Adam Baker University of Arizona tabaker@u.arizona.edu 1.0. Introduction The model of Stratal OT presented by Kiparsky (forthcoming), has not and will not prove uncontroversial
More informationPostprint.
http://www.diva-portal.org Postprint This is the accepted version of a paper presented at CLEF 2013 Conference and Labs of the Evaluation Forum Information Access Evaluation meets Multilinguality, Multimodality,
More informationGERMAN STUDIES (GRMN)
Bucknell University 1 GERMAN STUDIES (GRMN) Faculty Professors: Katherine M. Faull, Peter Keitel (Director) Associate Professors: Bastian Heinsohn, Helen G. Morris-Keitel (Chair) German Studies provides
More informationA Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique
A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique Hiromi Ishizaki 1, Susan C. Herring 2, Yasuhiro Takishima 1 1 KDDI R&D Laboratories, Inc. 2 Indiana University
More informationPrentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Correlated to Nebraska Reading/Writing Standards (Grade 10)
Prentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Nebraska Reading/Writing Standards (Grade 10) 12.1 Reading The standards for grade 1 presume that basic skills in reading have
More informationData Fusion Models in WSNs: Comparison and Analysis
Proceedings of 2014 Zone 1 Conference of the American Society for Engineering Education (ASEE Zone 1) Data Fusion s in WSNs: Comparison and Analysis Marwah M Almasri, and Khaled M Elleithy, Senior Member,
More informationUniversiteit Leiden ICT in Business
Universiteit Leiden ICT in Business Ranking of Multi-Word Terms Name: Ricardo R.M. Blikman Student-no: s1184164 Internal report number: 2012-11 Date: 07/03/2013 1st supervisor: Prof. Dr. J.N. Kok 2nd supervisor:
More informationMaster Program: Strategic Management. Master s Thesis a roadmap to success. Innsbruck University School of Management
Master Program: Strategic Management Department of Strategic Management, Marketing & Tourism Innsbruck University School of Management Master s Thesis a roadmap to success Index Objectives... 1 Topics...
More informationPromoting open access to research results
Vol. 9, No 1, 2014 www.swiss-academies.ch Promoting open access to research results Position paper issued by the Swiss Academy of Medical Sciences Information on the preparation of this position paper
More informationProcedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing
Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 141 ( 2014 ) 124 128 WCLTA 2013 Using Corpus Linguistics in the Development of Writing Blanka Frydrychova
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationA Graph Based Authorship Identification Approach
A Graph Based Authorship Identification Approach Notebook for PAN at CLEF 2015 Helena Gómez-Adorno 1, Grigori Sidorov 1, David Pinto 2, and Ilia Markov 1 1 Center for Computing Research, Instituto Politécnico
More informationPrentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9)
Nebraska Reading/Writing Standards, (Grade 9) 12.1 Reading The standards for grade 1 presume that basic skills in reading have been taught before grade 4 and that students are independent readers. For
More informationLearning Methods for Fuzzy Systems
Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8
More informationApplications of memory-based natural language processing
Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal
More informationProgram Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading
Program Requirements Competency 1: Foundations of Instruction 60 In-service Hours Teachers will develop substantive understanding of six components of reading as a process: comprehension, oral language,
More informationCS 100: Principles of Computing
CS 100: Principles of Computing Kevin Molloy August 29, 2017 1 Basic Course Information 1.1 Prerequisites: None 1.2 General Education Fulfills Mason Core requirement in Information Technology (ALL). 1.3
More informationDisambiguation of Thai Personal Name from Online News Articles
Disambiguation of Thai Personal Name from Online News Articles Phaisarn Sutheebanjard Graduate School of Information Technology Siam University Bangkok, Thailand mr.phaisarn@gmail.com Abstract Since online
More informationArabic Orthography vs. Arabic OCR
Arabic Orthography vs. Arabic OCR Rich Heritage Challenging A Much Needed Technology Mohamed Attia Having consistently been spoken since more than 2000 years and on, Arabic is doubtlessly the oldest among
More informationFirms and Markets Saturdays Summer I 2014
PRELIMINARY DRAFT VERSION. SUBJECT TO CHANGE. Firms and Markets Saturdays Summer I 2014 Professor Thomas Pugel Office: Room 11-53 KMC E-mail: tpugel@stern.nyu.edu Tel: 212-998-0918 Fax: 212-995-4212 This
More informationPH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.)
PH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.) OVERVIEW ADMISSION REQUIREMENTS PROGRAM REQUIREMENTS OVERVIEW FOR THE PH.D. IN COMPUTER SCIENCE Overview The doctoral program is designed for those students
More informationMULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY
MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract
More informationImproved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form
Orthographic Form 1 Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form The development and testing of word-retrieval treatments for aphasia has generally focused
More informationA Comparison of Two Text Representations for Sentiment Analysis
010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational
More informationBooks Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny
By the End of Year 8 All Essential words lists 1-7 290 words Commonly Misspelt Words-55 working out more complex, irregular, and/or ambiguous words by using strategies such as inferring the unknown from
More informationWhat the National Curriculum requires in reading at Y5 and Y6
What the National Curriculum requires in reading at Y5 and Y6 Word reading apply their growing knowledge of root words, prefixes and suffixes (morphology and etymology), as listed in Appendix 1 of the
More informationNCEO Technical Report 27
Home About Publications Special Topics Presentations State Policies Accommodations Bibliography Teleconferences Tools Related Sites Interpreting Trends in the Performance of Special Education Students
More informationEdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,
More informationHISTORY COURSE WORK GUIDE 1. LECTURES, TUTORIALS AND ASSESSMENT 2. GRADES/MARKS SCHEDULE
HISTORY COURSE WORK GUIDE 1. LECTURES, TUTORIALS AND ASSESSMENT Lectures and Tutorials Students studying History learn by reading, listening, thinking, discussing and writing. Undergraduate courses normally
More informationGenerating Test Cases From Use Cases
1 of 13 1/10/2007 10:41 AM Generating Test Cases From Use Cases by Jim Heumann Requirements Management Evangelist Rational Software pdf (155 K) In many organizations, software testing accounts for 30 to
More informationLearning and Retaining New Vocabularies: The Case of Monolingual and Bilingual Dictionaries
Learning and Retaining New Vocabularies: The Case of Monolingual and Bilingual Dictionaries Mohsen Mobaraki Assistant Professor, University of Birjand, Iran mmobaraki@birjand.ac.ir *Amin Saed Lecturer,
More informationFlorida Reading Endorsement Alignment Matrix Competency 1
Florida Reading Endorsement Alignment Matrix Competency 1 Reading Endorsement Guiding Principle: Teachers will understand and teach reading as an ongoing strategic process resulting in students comprehending
More informationSchool Inspection in Hesse/Germany
Hessisches Kultusministerium School Inspection in Hesse/Germany Contents 1. Introduction...2 2. School inspection as a Procedure for Quality Assurance and Quality Enhancement...2 3. The Hessian framework
More informationSenior Stenographer / Senior Typist Series (including equivalent Secretary titles)
New York State Department of Civil Service Committed to Innovation, Quality, and Excellence A Guide to the Written Test for the Senior Stenographer / Senior Typist Series (including equivalent Secretary
More informationReading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-
New York Grade 7 Core Performance Indicators Grades 7 8: common to all four ELA standards Throughout grades 7 and 8, students demonstrate the following core performance indicators in the key ideas of reading,
More informationAxiom 2013 Team Description Paper
Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationThe development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach
BILINGUAL LEARNERS DICTIONARIES The development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach Mark VAN MOL, Leuven, Belgium Abstract This paper reports on the
More informationMachine Learning from Garden Path Sentences: The Application of Computational Linguistics
Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,
More informationHandbook for Graduate Students in TESL and Applied Linguistics Programs
Handbook for Graduate Students in TESL and Applied Linguistics Programs Section A Section B Section C Section D M.A. in Teaching English as a Second Language (MA-TESL) Ph.D. in Applied Linguistics (PhD
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationIntra-talker Variation: Audience Design Factors Affecting Lexical Selections
Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and
More informationModeling full form lexica for Arabic
Modeling full form lexica for Arabic Susanne Alt Amine Akrout Atilf-CNRS Laurent Romary Loria-CNRS Objectives Presentation of the current standardization activity in the domain of lexical data modeling
More informationPerformance Analysis of Optimized Content Extraction for Cyrillic Mongolian Learning Text Materials in the Database
Journal of Computer and Communications, 2016, 4, 79-89 Published Online August 2016 in SciRes. http://www.scirp.org/journal/jcc http://dx.doi.org/10.4236/jcc.2016.410009 Performance Analysis of Optimized
More informationTHE WEB 2.0 AS A PLATFORM FOR THE ACQUISITION OF SKILLS, IMPROVE ACADEMIC PERFORMANCE AND DESIGNER CAREER PROMOTION IN THE UNIVERSITY
THE WEB 2.0 AS A PLATFORM FOR THE ACQUISITION OF SKILLS, IMPROVE ACADEMIC PERFORMANCE AND DESIGNER CAREER PROMOTION IN THE UNIVERSITY F. Felip Miralles, S. Martín Martín, Mª L. García Martínez, J.L. Navarro
More information