A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles

Size: px
Start display at page:

Download "A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles"

Transcription

1 A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles Rayner Alfred 1, Adam Mujat 1, and Joe Henry Obit 2 1 School of Engineering and Information Technology, Universiti Malaysia Sabah, Jalan UMS, 88400, Kota Kinabalu, Sabah, Malaysia 2 Labuan School of Informatics Science, Universiti Malaysia Sabah, Labuan, Malaysia ralfred@ums.edu.my, adammujat@gmail.com, joehenryobit@yahoo.com Abstract. The Malay language is an Austronesian language spoken in most countries in the South East Asia region that includes Malaysia, Indonesia, Singapore, Brunei and Thailand. Traditional linguistics is well developed for Malay but there are very limited resources and tools that are available or made accessible for computer linguistic analysis of Malay language. Assigning part of speech (POS) to running words in a sentence for Malay language is one of the pipeline processes in Natural Language Processing (NLP) tasks and it is not well investigated. This paper outlines an approach to perform the Part of Speech (POS) tagging for Malay text articles. We apply a simple Rule-based Part of Speech (RPOS) tagger to perform the tagging operation on Malay text articles. POS tagging can be described as a task of performing automatic annotation of syntactic categories for each word in a text document. A rule-based POS tagger generally involves a POS tag dictionary and a set of rules in order to identify the words that are considered parts of speech. In this paper, we propose a framework that applies Malay affixing rules to identify the Malay POS tag and the relation between words in order to select the best POS tag for words that have two or more valid POS tags. The results show that the performance accuracy of the ruled-based POS tagger is higher compared to a statistical POS tagger. This indicates that the proposed RPOS tagger is able to predict any unknown word's POS at some promising accuracy. Keywords: Rule-Based POS Tagger, Computational Linguistic, Malay Affixing Rules, Malay Word Relation. 1 Introduction In Malaysia, the Malay language is officially known as Bahasa Malaysia, which translates as the "Malaysian language". The total number of speakers of Standard Malay is about 18 million. There are also about 170 million people who speak Indonesian, which is a form of Malay. Malay language is used as a national language for Malaysia and Indonesia and ranked fourth after Spanish for the most widely spoken languages on earth. Nevertheless, it is one of the least studied and known about, to the extent that it is even left out of rank orders of the world s major languages. Traditional linguistics is well developed for Malay but there are very limited resources and tools that A. Selamat et al. (Eds.): ACIIDS 2013, Part II, LNAI 7803, pp , Springer-Verlag Berlin Heidelberg 2013

2 A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles 51 are available or made accessible for computer linguistic analysis of Malay language. For example, the part of speech (POS) tagging for Malay text articles is one of the limited tools for computer linguistic analysis. POS is a process of tagging a text into corresponding part of speech tag based on the word definition and relation. A post of speech (POS) tagger for Malay language has some end product applications. Firstly, POS tagger for Malay language can be used as a grammar checker that identifies word relation based on word class, by checking the word class before and after the word. Next, a POS tagger for Malay language can also be used to classify question by identifying question focus [6] (e.g., a noun and verb after the interrogative word and keyword can be used to identify the question focus). For English language, a simple rule-based POS tagger was first introduced by Eric Brill [1]. In his work, he has illustrated that a rule-based tagger for English language can perform as good as taggers based upon probabilistic or statistical models. Statistical tagging for English text articles has been widely applied into tagged corpora using various approaches. Among the early technique was Hidden Markov Model (HMM) algorithm [12] which achieved the accuracy of more than 96% for English text articles. For Malay language, a statistical POS tagger using trigram Hidden Markov Model for tagging Malay text articles has been designed but only achieved the accuracy of 67.9%. The efforts in statistical POS tagger initiatives are mainly focused on European languages like English, German, Spanish etc [7,8,9,10,11]. The development of this research is mainly contributed by the availability of their language resources such as dictionaries and annotated corpus. Minority languages such as Malay language still need more supports in term of researches conducted in order to assist the development of tools for computer linguistic analysis of Malay language. In this work, a framework of a rule-based POS tagging for Malay language will be outlined, since Malay language has a very limited POS tagged corpus accessible for Malay language researchers. This is paper is organized as followed. Section 2 explains the background of the POS tagger for Malay language. Section 3 outlines the ruled-based POS tagger framework for Malay language articles. Section 4 describes the experimental design setup and discusses the experimental results. Section 5 discusses the results obtained from the experiments and finally, this paper is concluded with future works in Section 6. 2 Part of Speech Tagging for Malay Language Part of Speech (POS) tagging is a process of tagging a text into corresponding word class or part of speech, based on word definition and word relation. A simple rulebased POS tagger for Malay language applies a POS tag dictionary and affixing rules in order to identify the Malay word definition. The POS tag dictionary is manually extracted from the Malay thesaurus and stored in a text format [2]. Fig. 1. illustrates a snapshot of the Malay POS tag dictionary. Table 1 shows the list of POS tags for Malay language words.

3 52 R. Alfred, A. Mujat, and J.H. Obit adunan agak agaknya mengagak-agak teragak teragak-agak NN GUT PEN VB VB VB Fig. 1. Malay POS tag dictionary snapshot All the affixing rules that are applied in the proposed approach are studied and manually extracted from the Tatabahasa dewan edisi ketiga [3]. The derived word relations are based on the word types where some word types co-occur with words other word types (see Table 2). For instance, given a phrase in Malay language as follows, saya suka makan saya (NN) suka (JJ/VB/RB) makan (VB) where saya is a noun that co-occurs with the word suka which is classified as an adjective, a verb and an adverb. However, makan is a verb and only an adverb that is allowed to co-exist with the word makan. Thus, we will have the following word relations saya suka makan saya (NN) suka (RB) makan (VB) Table 1. POS tag list for Malay Word Type (English language) Noun Verb Adjective Function Subtype (English language) Subtype (Malay language) Tag NN Proper noun NNP VB JJ Conjunction Kata hubung CC Interjection Kata seru UH Interrogative Kata tanya WP Command Kata perintah CO Kata pangkal ayat PNG Auxiliary (Amplifier) Kata bantu AUX Kata penguat GUT Particles Kata penegas RP Negation Kata naïf NEG Kata pemeri MER Preposition Kata sendi nama IN Kata pembenar BNR Direction Kata arah DR Cardinal number Kata bilangan CD Kata penekan PEN Kata pembenda BND Adverb Adverb RB

4 A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles 53 Table 2. Word Type Relation Word Type Noun (NN) Verb (VB) Adjective (JJ) Adverb (RB) Direction (DR) Preposition (IN) Auxiliary (AUX) Cardinal number Penekan (PEN) Pembenda (BND) Conjunction (CC) Penguat (GUT) Interrogative (WP) Pangkal ayat (PNG) Valid Sequences of Word Types adjective (JJ), adverb (RB),verb (VB),noun (NN),preposition (IN) auxiliary (AUX), adverb (RB), noun (NN), penekan (PEN), pembenda (BND) penguat (GUT), preposition (IN) verb (VB), preposition (IN), adjective (JJ), noun (AUX) noun (NN), preposition (IN) noun (NN), verb (VB), adjective (JJ) adjective (JJ), verb (VB), preposition (IN) noun (NN) adverb (RB), noun (NN), conjunction (CC) conjunction (CC), noun (NN) noun (NN), verb (VB), preposition (IN), adjective (JJ) adjective (JJ) noun (NN), verb (VB) noun (NN) Most Malay POS tagging systems apply a POS tag dictionary and affixing rules acquisition for POS (see section 3), because of the unavailability of resources such as tagged POS tag corpus. 2.1 Analysis of Affixed Word Bali analyzes Malay affixed words by identifying affixed words, segmenting them and finally interpreting the affixed words in Malay language [4]. In Malay, the form of words can be simple or complex. Affixed words are complex words generated by a morphological process called affixation that includes prefixation, suffixation, circumfixation, and infixation. Prefixation is the process of adding a prefix at the left side of the base and suffixation is the adding of a suffix to the right side of the base (See Fig. 2.). Circumfixation is the simultaneous adding of a discontinuous morphological unit called circumfix at the left and right sides of the base [4]. A circumfix is a combination of a prefix and a suffix treated as a single morphological unit. In Malay language, infixation is the insertion of an infix just after the first consonant of the base. affixed word Proclitic (Ku, Kau ) Circumflex Base Enclitic ( ku, kau, mu, nya) Particle Prefix Infix Suffix Fig. 2. Clitics, Affixing and Particle in Malay word In Bali s work, she has identified the affixing words, the clitics and particles and their relations. A word containing clitics and particle cannot be affixed but affixed word may have clitic and particle. In Fig. 2, it is shown that an affixed word can be

5 54 R. Alfred, A. Mujat, and J.H. Obit the host of one and only one clitic and/or one and only one particle. A clitic attached before the base is called proclitic and a clitic attached after the base is called enclitic. Fig. 2 shows the structure of an affixed word in Malay with the addition of a clitic (proclitic or enclitic) and particle [4]. In Malay language, there are two proclitics, four enclitics and three particles. Ku and Kau are two proclitics that generate passive word. On the other hand, ku, kau, mu and nya are four enclitics that are functioning as an object pronoun of active verb and a possessive adjective. In addition to that, the enclitic nya is also functioning as a subjective pronoun of passive verb and a definite article. Finally, the Malay particles include kah and tah that generate question marker, and lah that generates imperative and predicative marker. 3 A Rule-Based Part of Speech (RPOS) Tagger for Malay Texts In this paper, the proposed rule-based POS tagger for Malay language applies three general tagging convention of the Penn Treebank [6] that includes a) the part of speech tags are defined based on their syntactic distribution rather than their semantic function and b) the tagger capitalizes words tagged as proper noun and c) the tagger tags the abbreviations and initials. In addition to that, the proposed rule-based POS tagger for Malay language has additional POS tags which are not included in the Penn Treebank tags [3]. These tags include a) kata perintah command (CO) b) kata pangkal ayat (PNG) c) kata bantu auxiliary (AUX) d) kata penguat (GUT) e) kata naïf negation (NEG) f) kata pemeri (MER) g) kata pembenar (BNR) h) kata arah direction (DR) i) kata penekan (PEN) j) kata pembenda (BND) In this paper, we outline a simple rule-based POS tagger for Malay language. The rules involve the affixing and word relation rules [3]. Malay language affixing has a prefix, infix, suffix and combination, in this paper only the prefix, suffix and combination are considered. This is because infix is not a productive affixing and it can cause ambiguity in the POS tagging as a similar infix may exist in the noun, verb and adjective. The affixing rules consist of a noun (as shown in Table 3), a proper noun, an adjective (as shown in Table 4), a verb (as shown in Table 5), pembenda, penegas and penekan.

6 A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles 55 The penegas rule includes a sequence of characters ending in kah, lah and tah. The pembenda rule includes a non noun root word and ending with nya. Finally, the penekan rule includes a noun root word and ending with nya. In addition to the affixing rules, we also include the word type relation rules. The word type relation rule is a rule used for selecting the base POS tag to represent the word if the word has more than one POS tags. This is done by checking the validity of the word type relation before and after the word as explained in the Section 2. The word type relation list, shown in Table 2, is not an exhaustive list which is extracted from Tatabahasa Dewan Edisi Ketiga [3]. Table 3. Noun Affixing Identification Rules Rules Prefix Next character Sequences of character May end with Suffix 1a pe ny, ng, r, l and w a-z an - 1b pem b and p a-z an - 1c pen d, c, j, sy and z a-z an - 1d peng g, kh, h,k and vowel a-z an - 1e penge - a-z (3 to 4 an - character) 1f pel or ke - a-z an - 1g juru, maha, tata, pra, swa, tuna, eka, dwi, tri, panca, pasca, pro, anti, poli, auto, sub, supra - a-z - - 1h not started with me, meng, mem, menge, ber, be, di, diper Rules - a-z - an, at, in, wan, wati, isme, isasi, logi, tas, man, nita, isme, ik, is, al Table 4. Adjective Affixing Identification Rules Next character Sequences of character Suffix Prefix May end with 2a ter, se, bi - a-z - - 2b ke - a-z an - 2c not starting with di - a-z - in, at, ah, iah, and men sequences of vowels then wi and sequences of consonants end ending with i

7 56 R. Alfred, A. Mujat, and J.H. Obit Rules Prefix Table 5. Verb Affixing Identification Rules Next Character Sequences Suffix of character May end with a-z - - 3a me ny, ng, r, l, w, y, p, t, k, s 3b mem b, f, p and v a-z kan or i - 3c men d, c, j, sy, z, t and s a-z kan or i - 3d meng g, gh, kh, h, k and a-z - - vowel 3e menge - a-z (3 to character) 3f) memper or diper - a-z kan or i - 3g) ber not r a-z kan or an - 3h) bel - a-z - - 3i) Ter not r a-z - - 3j) Ke - a-z - An 3k) - - a-z - i or kan 3l) di or diper - a-z kan or i - WORD POS tag Dictionary Word POS tag does not exist in dictionary Single POS tag Affixing Rules NOUN ADJECTIVE More than one POS tag TAGGED WORD VERB PENEGAS WORD RELATION RULES PEMBENDA PENEKAN Fig. 3. The Rule-Based POS Tagger Framework for Malay Text Articles Fig. 3 illustrates the framework of the proposed rule-based POS (RPOS) tagger for Malay text articles which consists of a POS tag dictionary, a set of affixing rules and word relation rules. The POS tag dictionary consists of Malay words with their POS tags and these Malay words are extracted manually from Thesaurus Bahasa Melayu

8 A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles 57 that has more than 8,700 tagged words [2]. The Malay language affixing generates a new word and meaning and in this paper we apply affixing characteristics in order to identify POS tags only for the noun, adjective, verb, penegas, pembenda and penekan. First, the rule-based POS tagger starts by checking the existence of the word POS tag in the POS tag dictionary. If the word exists in the POS tag dictionary and has only one tag then the word tagging is completed. If the word exists in the dictionary and has more the one possible tagging name, identifying valid word type relation will be performed to select the proper POS tag name. Otherwise, if the word does not exist in the POS tag dictionary, the word will be processed in line with the affixing rules before it is processed in the tagging process again. 4 Experimental Setup and Evaluations In this experiment, we have extracted ten sets of news article from the Malay online news and ten sets of biomedical articles from the Malaysian Journal of community health ( We performed the rule-based POS tagging process on these sets of news and biomedical articles based on the affixing and word type relation rules. We then compared the results with the actual tags. We have performed the process of tagging the words manually in order to evaluate the accuracy of our proposed algorithm. Table 6 shows the percentage accuracies of the rule-based POS tagger performance against the manually tagging process for both the news and biomedical articles. In Table 6, the total token represents the actual number of word found in the test sets. The counted token represents the number of words actually used for POS tagging. Table 6. Experiment Results for Rules based POS tagging for Malay language News Articles Biomedical Articles Accuracy Test Accuracy (%) Total Counted Total Counted (%) Biomedical Articles Set News Articles token token token token Average Discussions The results show that the proposed rule-based Malay POS tagger achieves 89 percent accuracy for the Malay news articles and 86 percent accuracy for the Malay biomedical articles. The result of the rule-based Malay POS tagger for Malay biomedical

9 58 R. Alfred, A. Mujat, and J.H. Obit articles is lower due to the existing of some borrowed words in Malay from the English Language. Based on our experiment results, for the news articles, we also have identified some of the words POS tags that the rule-based POS tagger for Malay language has failed to identify. These words POS tags include the words kopersai (NN), berniaga (VB), selepas (RB), waktu (NN/AUX), bertugas (VB), selepas(rb) and waktu (AUX). On the other hand, for the biomedical articles, it shows that the rule-based POS tagger for Malay language have failed to identify some words POS tags that include words which are borrowed from the English language such as antropometri (anthropometry a noun), dialysis (dialysis a noun), inflamasi (inflammation a noun), komplikasi (complication a noun), vascular (vascular a noun or adjective), nefropati (nephropathy a noun), neuropati (neuropathic a noun), retinopati (retinopathy a noun), infarksi (infarction a noun), myocardium (myocardium a noun), amputasi (amputation a noun) and superfisial (superficial a adjective). 6 Conclusion In this paper, we have outlined the framework for a simple Rule-based Part of Speech (RPOS) tagger for Malay text articles. Based on our experiment results, the performance of the proposed rule-based POS tagger is acceptable compared to performance of a statistical POS tagger reported earlier. This indicates that a ruled-based POS tagger for Malay language is able to predict any unknown word's POS at some promising accuracy. The performance of the proposed rule-based POS tagger for Malay language can be improved by adding more word type relations and more POS tags into the POS tag dictionary. By improving the word type relations, more sentence formats can be handled. References 1. Brill, E.: A simple rule-based part of speech tagger. In: HLT 1991: Proceedings of the Workshop on Speech and Natural Language, pp Association for Computational Linguistics, Morristown (1992) 2. Thesaurus Bahasa Melayu, New Edition Kuala Lumpur, Dewan Bahasa dan Pustaka (2008) ISBN X 3. Karim, N.S., Onn, F.M., Musa, H.H., Mahmood, A.H.: Tatabahasa Dewan Edisi Ketiga. Dewan Bahasa dan Pustaka, Kuala Lumpur (2008) 4. Ranaivo-Malancon, B.: Computational Analysis of Affixed Words in Malay Language. In: The 8th International Symposium on Malay/Indonesian Linguistics (ISMIL8), Penang, Malaysia (2004) 5. Purwarianti, A.: Developing Cross Language Systems for Language Pair with Limited Resource-Indonesian-Japanese CLIR and CLQA, Phd. thesis, Toyohashi University of Technology (2007) 6. Santorini, B.: Part-of-Speech tagging guideline for the Penn Treebank Project, 3rd Revision, 2nd Printing (1990)

10 A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles Merialdo, B.: Tanging English Text with a Probabilistic Model. Computational Linguistics 20(2), (1994) 8. Elworthy, D.: Does Baum-Welch Re-estimation Help Taggers? In: Proceedings of the 4th ACL Conference on Applied Natural Language Processing, ANLP (1994) 9. Banko, M., Moore, R.C.: Part of Speech Tagging in Context. In: Proceedings of the 8th International Conference on Computational Linguistics, COLING (2004) 10. Wang, Q.I., Schuurmans, D.: Improved Estimation for Unsupervised Part-of-Speech Tagging. In: Proceedings of IEEE International Conference on Natural Language Processing and Knowledge Engineering, IEEE NLP-KE (2005) 11. Biemann, C., Giuliano, C., Gliozzo, A.: Unsupervised Part-Of-Speech Tagging Supporting Supervised Methods. In: Proceedings of RANLP 2007, Borovets, Bulgaria (2007) 12. Jurafsky, D., Martin, J.H.: Speech and language processing. Prentice Hall, New Jersey (2000)

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger Page 1 of 35 Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger Kaihong Liu, MD, MS, Wendy Chapman, PhD, Rebecca Hwa, PhD, and Rebecca S. Crowley, MD, MS

More information

Derivational and Inflectional Morphemes in Pak-Pak Language

Derivational and Inflectional Morphemes in Pak-Pak Language Derivational and Inflectional Morphemes in Pak-Pak Language Agustina Situmorang and Tima Mariany Arifin ABSTRACT The objectives of this study are to find out the derivational and inflectional morphemes

More information

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,

More information

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each

More information

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words, First Grade Standards These are the standards for what is taught in first grade. It is the expectation that these skills will be reinforced after they have been taught. Taught Throughout the Year Foundational

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz

More information

Building an HPSG-based Indonesian Resource Grammar (INDRA)

Building an HPSG-based Indonesian Resource Grammar (INDRA) Building an HPSG-based Indonesian Resource Grammar (INDRA) David Moeljadi, Francis Bond, Sanghoun Song {D001,fcbond,sanghoun}@ntu.edu.sg Division of Linguistics and Multilingual Studies, Nanyang Technological

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

BULATS A2 WORDLIST 2

BULATS A2 WORDLIST 2 BULATS A2 WORDLIST 2 INTRODUCTION TO THE BULATS A2 WORDLIST 2 The BULATS A2 WORDLIST 21 is a list of approximately 750 words to help candidates aiming at an A2 pass in the Cambridge BULATS exam. It is

More information

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence. NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

Words come in categories

Words come in categories Nouns Words come in categories D: A grammatical category is a class of expressions which share a common set of grammatical properties (a.k.a. word class or part of speech). Words come in categories Open

More information

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature 1 st Grade Curriculum Map Common Core Standards Language Arts 2013 2014 1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature Key Ideas and Details

More information

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis International Journal of Arts Humanities and Social Sciences (IJAHSS) Volume 1 Issue 1 ǁ August 216. www.ijahss.com Linguistic Variation across Sports Category of Press Reportage from British Newspapers:

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

Context Free Grammars. Many slides from Michael Collins

Context Free Grammars. Many slides from Michael Collins Context Free Grammars Many slides from Michael Collins Overview I An introduction to the parsing problem I Context free grammars I A brief(!) sketch of the syntax of English I Examples of ambiguous structures

More information

Ch VI- SENTENCE PATTERNS.

Ch VI- SENTENCE PATTERNS. Ch VI- SENTENCE PATTERNS faizrisd@gmail.com www.pakfaizal.com It is a common fact that in the making of well-formed sentences we badly need several syntactic devices used to link together words by means

More information

Emmaus Lutheran School English Language Arts Curriculum

Emmaus Lutheran School English Language Arts Curriculum Emmaus Lutheran School English Language Arts Curriculum Rationale based on Scripture God is the Creator of all things, including English Language Arts. Our school is committed to providing students with

More information

First Grade Curriculum Highlights: In alignment with the Common Core Standards

First Grade Curriculum Highlights: In alignment with the Common Core Standards First Grade Curriculum Highlights: In alignment with the Common Core Standards ENGLISH LANGUAGE ARTS Foundational Skills Print Concepts Demonstrate understanding of the organization and basic features

More information

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative English Teaching Cycle The English curriculum at Wardley CE Primary is based upon the National Curriculum. Our English is taught through a text based curriculum as we believe this is the best way to develop

More information

Language contact in East Nusantara

Language contact in East Nusantara Language contact in East Nusantara Introduction The aim of this workshop will be to try to uncover some of the range of language contact phenomena exhibited by languages from throughout the East Nusantara

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

Developing Grammar in Context

Developing Grammar in Context Developing Grammar in Context intermediate with answers Mark Nettle and Diana Hopkins PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE The Pitt Building, Trumpington Street, Cambridge, United

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Grammars & Parsing, Part 1:

Grammars & Parsing, Part 1: Grammars & Parsing, Part 1: Rules, representations, and transformations- oh my! Sentence VP The teacher Verb gave the lecture 2015-02-12 CS 562/662: Natural Language Processing Game plan for today: Review

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

Constructing Parallel Corpus from Movie Subtitles

Constructing Parallel Corpus from Movie Subtitles Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing

More information

An Evaluation of POS Taggers for the CHILDES Corpus

An Evaluation of POS Taggers for the CHILDES Corpus City University of New York (CUNY) CUNY Academic Works Dissertations, Theses, and Capstone Projects Graduate Center 9-30-2016 An Evaluation of POS Taggers for the CHILDES Corpus Rui Huang The Graduate

More information

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks 3rd Grade- 1st Nine Weeks R3.8 understand, make inferences and draw conclusions about the structure and elements of fiction and provide evidence from text to support their understand R3.8A sequence and

More information

Word Stress and Intonation: Introduction

Word Stress and Intonation: Introduction Word Stress and Intonation: Introduction WORD STRESS One or more syllables of a polysyllabic word have greater prominence than the others. Such syllables are said to be accented or stressed. Word stress

More information

Parsing of part-of-speech tagged Assamese Texts

Parsing of part-of-speech tagged Assamese Texts IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important

More information

ScienceDirect. Malayalam question answering system

ScienceDirect. Malayalam question answering system Available online at www.sciencedirect.com ScienceDirect Procedia Technology 24 (2016 ) 1388 1392 International Conference on Emerging Trends in Engineering, Science and Technology (ICETEST - 2015) Malayalam

More information

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly Inflected Languages Classical Approaches to Tagging The slides are posted on the web. The url is http://chss.montclair.edu/~feldmana/esslli10/.

More information

Distant Supervised Relation Extraction with Wikipedia and Freebase

Distant Supervised Relation Extraction with Wikipedia and Freebase Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational

More information

Indian Institute of Technology, Kanpur

Indian Institute of Technology, Kanpur Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar

More information

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma University of Alberta Large-Scale Semi-Supervised Learning for Natural Language Processing by Shane Bergsma A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of

More information

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5- New York Grade 7 Core Performance Indicators Grades 7 8: common to all four ELA standards Throughout grades 7 and 8, students demonstrate the following core performance indicators in the key ideas of reading,

More information

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1) Houghton Mifflin Reading Correlation to the Standards for English Language Arts (Grade1) 8.3 JOHNNY APPLESEED Biography TARGET SKILLS: 8.3 Johnny Appleseed Phonemic Awareness Phonics Comprehension Vocabulary

More information

CS 598 Natural Language Processing

CS 598 Natural Language Processing CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@

More information

Memory-based grammatical error correction

Memory-based grammatical error correction Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,

More information

Prediction of Maximal Projection for Semantic Role Labeling

Prediction of Maximal Projection for Semantic Role Labeling Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba

More information

Development of the First LRs for Macedonian: Current Projects

Development of the First LRs for Macedonian: Current Projects Development of the First LRs for Macedonian: Current Projects Ruska Ivanovska-Naskova Faculty of Philology- University St. Cyril and Methodius Bul. Krste Petkov Misirkov bb, 1000 Skopje, Macedonia rivanovska@flf.ukim.edu.mk

More information

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque Approaches to control phenomena handout 6 5.4 Obligatory control and morphological case: Icelandic and Basque Icelandinc quirky case (displaying properties of both structural and inherent case: lexically

More information

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.

More information

Using a Native Language Reference Grammar as a Language Learning Tool

Using a Native Language Reference Grammar as a Language Learning Tool Using a Native Language Reference Grammar as a Language Learning Tool Stacey I. Oberly University of Arizona & American Indian Language Development Institute Introduction This article is a case study in

More information

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny By the End of Year 8 All Essential words lists 1-7 290 words Commonly Misspelt Words-55 working out more complex, irregular, and/or ambiguous words by using strategies such as inferring the unknown from

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

Advanced Grammar in Use

Advanced Grammar in Use Advanced Grammar in Use A self-study reference and practice book for advanced learners of English Third Edition with answers and CD-ROM cambridge university press cambridge, new york, melbourne, madrid,

More information

Grade 7. Prentice Hall. Literature, The Penguin Edition, Grade Oregon English/Language Arts Grade-Level Standards. Grade 7

Grade 7. Prentice Hall. Literature, The Penguin Edition, Grade Oregon English/Language Arts Grade-Level Standards. Grade 7 Grade 7 Prentice Hall Literature, The Penguin Edition, Grade 7 2007 C O R R E L A T E D T O Grade 7 Read or demonstrate progress toward reading at an independent and instructional reading level appropriate

More information

A heuristic framework for pivot-based bilingual dictionary induction

A heuristic framework for pivot-based bilingual dictionary induction 2013 International Conference on Culture and Computing A heuristic framework for pivot-based bilingual dictionary induction Mairidan Wushouer, Toru Ishida, Donghui Lin Department of Social Informatics,

More information

LTAG-spinal and the Treebank

LTAG-spinal and the Treebank LTAG-spinal and the Treebank a new resource for incremental, dependency and semantic parsing Libin Shen (lshen@bbn.com) BBN Technologies, 10 Moulton Street, Cambridge, MA 02138, USA Lucas Champollion (champoll@ling.upenn.edu)

More information

What the National Curriculum requires in reading at Y5 and Y6

What the National Curriculum requires in reading at Y5 and Y6 What the National Curriculum requires in reading at Y5 and Y6 Word reading apply their growing knowledge of root words, prefixes and suffixes (morphology and etymology), as listed in Appendix 1 of the

More information

Sample Goals and Benchmarks

Sample Goals and Benchmarks Sample Goals and Benchmarks for Students with Hearing Loss In this document, you will find examples of potential goals and benchmarks for each area. Please note that these are just examples. You should

More information

Cross-Lingual Text Categorization

Cross-Lingual Text Categorization Cross-Lingual Text Categorization Nuria Bel 1, Cornelis H.A. Koster 2, and Marta Villegas 1 1 Grup d Investigació en Lingüística Computacional Universitat de Barcelona, 028 - Barcelona, Spain. {nuria,tona}@gilc.ub.es

More information

Coast Academies Writing Framework Step 4. 1 of 7

Coast Academies Writing Framework Step 4. 1 of 7 1 KPI Spell further homophones. 2 3 Objective Spell words that are often misspelt (English Appendix 1) KPI Place the possessive apostrophe accurately in words with regular plurals: e.g. girls, boys and

More information

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017 GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017 Instructor: Dr. Claudia Schwabe Class hours: TR 9:00-10:15 p.m. claudia.schwabe@usu.edu Class room: Old Main 301 Office: Old Main 002D Office hours:

More information

THE VERB ARGUMENT BROWSER

THE VERB ARGUMENT BROWSER THE VERB ARGUMENT BROWSER Bálint Sass sass.balint@itk.ppke.hu Péter Pázmány Catholic University, Budapest, Hungary 11 th International Conference on Text, Speech and Dialog 8-12 September 2008, Brno PREVIEW

More information

Test Blueprint. Grade 3 Reading English Standards of Learning

Test Blueprint. Grade 3 Reading English Standards of Learning Test Blueprint Grade 3 Reading 2010 English Standards of Learning This revised test blueprint will be effective beginning with the spring 2017 test administration. Notice to Reader In accordance with the

More information

A Graph Based Authorship Identification Approach

A Graph Based Authorship Identification Approach A Graph Based Authorship Identification Approach Notebook for PAN at CLEF 2015 Helena Gómez-Adorno 1, Grigori Sidorov 1, David Pinto 2, and Ilia Markov 1 1 Center for Computing Research, Instituto Politécnico

More information

A Syllable Based Word Recognition Model for Korean Noun Extraction

A Syllable Based Word Recognition Model for Korean Noun Extraction are used as the most important terms (features) that express the document in NLP applications such as information retrieval, document categorization, text summarization, information extraction, and etc.

More information

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS Julia Tmshkina Centre for Text Techitology, North-West University, 253 Potchefstroom, South Africa 2025770@puk.ac.za

More information

Training and evaluation of POS taggers on the French MULTITAG corpus

Training and evaluation of POS taggers on the French MULTITAG corpus Training and evaluation of POS taggers on the French MULTITAG corpus A. Allauzen, H. Bonneau-Maynard LIMSI/CNRS; Univ Paris-Sud, Orsay, F-91405 {allauzen,maynard}@limsi.fr Abstract The explicit introduction

More information

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.

More information

The taming of the data:

The taming of the data: The taming of the data: Using text mining in building a corpus for diachronic analysis Stefania Degaetano-Ortlieb, Hannah Kermes, Ashraf Khamis, Jörg Knappen, Noam Ordan and Elke Teich Background Big data

More information

TABE 9&10. Revised 8/2013- with reference to College and Career Readiness Standards

TABE 9&10. Revised 8/2013- with reference to College and Career Readiness Standards TABE 9&10 Revised 8/2013- with reference to College and Career Readiness Standards LEVEL E Test 1: Reading Name Class E01- INTERPRET GRAPHIC INFORMATION Signs Maps Graphs Consumer Materials Forms Dictionary

More information

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen UNIVERSITY OF OSLO Department of Informatics Dialog Act Recognition using Dependency Features Master s thesis Sindre Wetjen November 15, 2013 Acknowledgments First I want to thank my supervisors Lilja

More information

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading ELA/ELD Correlation Matrix for ELD Materials Grade 1 Reading The English Language Arts (ELA) required for the one hour of English-Language Development (ELD) Materials are listed in Appendix 9-A, Matrix

More information

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer

More information

Written by: YULI AMRIA (RRA1B210085) ABSTRACT. Key words: ability, possessive pronouns, and possessive adjectives INTRODUCTION

Written by: YULI AMRIA (RRA1B210085) ABSTRACT. Key words: ability, possessive pronouns, and possessive adjectives INTRODUCTION STUDYING GRAMMAR OF ENGLISH AS A FOREIGN LANGUAGE: STUDENTS ABILITY IN USING POSSESSIVE PRONOUNS AND POSSESSIVE ADJECTIVES IN ONE JUNIOR HIGH SCHOOL IN JAMBI CITY Written by: YULI AMRIA (RRA1B210085) ABSTRACT

More information

Improving Accuracy in Word Class Tagging through the Combination of Machine Learning Systems

Improving Accuracy in Word Class Tagging through the Combination of Machine Learning Systems Improving Accuracy in Word Class Tagging through the Combination of Machine Learning Systems Hans van Halteren* TOSCA/Language & Speech, University of Nijmegen Jakub Zavrel t Textkernel BV, University

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

BASIC ENGLISH. Book GRAMMAR

BASIC ENGLISH. Book GRAMMAR BASIC ENGLISH Book 1 GRAMMAR Anne Seaton Y. H. Mew Book 1 Three Watson Irvine, CA 92618-2767 Web site: www.sdlback.com First published in the United States by Saddleback Educational Publishing, 3 Watson,

More information

The Ups and Downs of Preposition Error Detection in ESL Writing

The Ups and Downs of Preposition Error Detection in ESL Writing The Ups and Downs of Preposition Error Detection in ESL Writing Joel R. Tetreault Educational Testing Service 660 Rosedale Road Princeton, NJ, USA JTetreault@ets.org Martin Chodorow Hunter College of CUNY

More information

Prentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Correlated to Nebraska Reading/Writing Standards (Grade 10)

Prentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Correlated to Nebraska Reading/Writing Standards (Grade 10) Prentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Nebraska Reading/Writing Standards (Grade 10) 12.1 Reading The standards for grade 1 presume that basic skills in reading have

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

A Comparison of Two Text Representations for Sentiment Analysis

A Comparison of Two Text Representations for Sentiment Analysis 010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational

More information

Comprehension Recognize plot features of fairy tales, folk tales, fables, and myths.

Comprehension Recognize plot features of fairy tales, folk tales, fables, and myths. 4 th Grade Language Arts Scope and Sequence 1 st Nine Weeks Instructional Units Reading Unit 1 & 2 Language Arts Unit 1& 2 Assessments Placement Test Running Records DIBELS Reading Unit 1 Language Arts

More information

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Basic Parsing with Context-Free Grammars Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Announcements HW 2 to go out today. Next Tuesday most important for background to assignment Sign up

More information

IS SABAH MALAY A REAL LANGUAGE? By: Jane Wong Kon Ling, Ph.D Centre for the Promotion of Knowledge and Language Learning Universiti Malaysia Sabah

IS SABAH MALAY A REAL LANGUAGE? By: Jane Wong Kon Ling, Ph.D Centre for the Promotion of Knowledge and Language Learning Universiti Malaysia Sabah IS SABAH MALAY A REAL LANGUAGE? By: Jane Wong Kon Ling, Ph.D Centre for the Promotion of Knowledge and Language Learning Universiti Malaysia Sabah INTRODUCTION The Main Question: Is Sabah Malay a Real

More information

Mercer County Schools

Mercer County Schools Mercer County Schools PRIORITIZED CURRICULUM Reading/English Language Arts Content Maps Fourth Grade Mercer County Schools PRIORITIZED CURRICULUM The Mercer County Schools Prioritized Curriculum is composed

More information

Part of Speech Template

Part of Speech Template Part of Speech Template (available at www.panl10n.net/wiki/partofspeech) (If any local language font is used in this document, please provide it with the document) Please fill the template for each part

More information

5 th Grade Language Arts Curriculum Map

5 th Grade Language Arts Curriculum Map 5 th Grade Language Arts Curriculum Map Quarter 1 Unit of Study: Launching Writer s Workshop 5.L.1 - Demonstrate command of the conventions of Standard English grammar and usage when writing or speaking.

More information

Universiteit Leiden ICT in Business

Universiteit Leiden ICT in Business Universiteit Leiden ICT in Business Ranking of Multi-Word Terms Name: Ricardo R.M. Blikman Student-no: s1184164 Internal report number: 2012-11 Date: 07/03/2013 1st supervisor: Prof. Dr. J.N. Kok 2nd supervisor:

More information

Character Stream Parsing of Mixed-lingual Text

Character Stream Parsing of Mixed-lingual Text Character Stream Parsing of Mixed-lingual Text Harald Romsdorfer and Beat Pfister Speech Processing Group Computer Engineering and Networks Laboratory ETH Zurich {romsdorfer,pfister}@tik.ee.ethz.ch Abstract

More information

Problems of the Arabic OCR: New Attitudes

Problems of the Arabic OCR: New Attitudes Problems of the Arabic OCR: New Attitudes Prof. O.Redkin, Dr. O.Bernikova Department of Asian and African Studies, St. Petersburg State University, St Petersburg, Russia Abstract - This paper reviews existing

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and

More information

cmp-lg/ Jan 1998

cmp-lg/ Jan 1998 Identifying Discourse Markers in Spoken Dialog Peter A. Heeman and Donna Byron and James F. Allen Computer Science and Engineering Department of Computer Science Oregon Graduate Institute University of

More information

knarrator: A Model For Authors To Simplify Authoring Process Using Natural Language Processing To Portuguese

knarrator: A Model For Authors To Simplify Authoring Process Using Natural Language Processing To Portuguese knarrator: A Model For Authors To Simplify Authoring Process Using Natural Language Processing To Portuguese Adriano Kerber Daniel Camozzato Rossana Queiroz Vinícius Cassol Universidade do Vale do Rio

More information

Grade 5: Module 3A: Overview

Grade 5: Module 3A: Overview Grade 5: Module 3A: Overview This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Exempt third-party content is indicated by the footer: (name of copyright

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Accurate Unlexicalized Parsing for Modern Hebrew

Accurate Unlexicalized Parsing for Modern Hebrew Accurate Unlexicalized Parsing for Modern Hebrew Reut Tsarfaty and Khalil Sima an Institute for Logic, Language and Computation, University of Amsterdam Plantage Muidergracht 24, 1018TV Amsterdam, The

More information

Senior Stenographer / Senior Typist Series (including equivalent Secretary titles)

Senior Stenographer / Senior Typist Series (including equivalent Secretary titles) New York State Department of Civil Service Committed to Innovation, Quality, and Excellence A Guide to the Written Test for the Senior Stenographer / Senior Typist Series (including equivalent Secretary

More information

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction CLASSIFICATION OF PROGRAM Critical Elements Analysis 1 Program Name: Macmillan/McGraw Hill Reading 2003 Date of Publication: 2003 Publisher: Macmillan/McGraw Hill Reviewer Code: 1. X The program meets

More information

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9)

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9) Nebraska Reading/Writing Standards, (Grade 9) 12.1 Reading The standards for grade 1 presume that basic skills in reading have been taught before grade 4 and that students are independent readers. For

More information