Centrality Measures of Sentences in an English-Japanese Parallel Corpus
|
|
- Janice Sutton
- 6 years ago
- Views:
Transcription
1 Centrality Measures of Sentences in an English-Japanese Parallel Corpus Masanori Oya Mejiro University Abstract This study introduces directed acyclic graph representation of typed dependencies among words in a sentence, and proposes a method for calculating the degree centrality and closeness centrality of the typeddependency directed acyclic graph. The method is applied to sentences in a section of an English-Japanese parallel corpus and the differences in the results are discussed, along with suggestions for further study. Key Words- Dependency grammar, Typeddependency directed acyclic graphs, Graph centralities, English-Japanese parallel corpus 1. Introduction In [1], Oya proposed that graph centrality measures [2, 3] can be applied to typed-dependency trees for sentences. Accordingly, these measures have been applied to English sentences of different genres of texts [4, 5, 6, 7] and to English-Japanese translation pairs of a small-scale corpus [5, 8, 9]. In [1], an English- Japanese parallel corpus larger than those used in [5, 8, 9] is used. This study follows these studies to explore the possibility of automatic numerical analysis of the syntactic dependency structure of sentences in an English-Japanese parallel corpus, in terms of the centrality measures of typed dependencies among words in the sentences. The aim of this study is to represent the difference of the structural settings between two languages, based not on the speakers subjective intuition, but on objective, numerical measures. These objective measures for the structural difference between these two languages can be applied for various purposes, such as language teaching and machine translation. The remainder of this paper is organized as follows. Section 2 summarizes the theoretical background of this study: dependency grammar, graph centrality measures (degree centrality and closeness centrality), as well as the interpretation of the two centrality measures. Section 3 reports the analysis of the English-Japanese parallel corpus in terms of the centrality measures of the sentences. Section 4 is a discussion on the results of the analysis and the possibility of further study of centrality measures of typed-dependency trees, and Section 5 concludes this study. 2. Theoretical background 2.1. Dependency grammar Dependency grammar is a set of syntactic theories that focus on the dependency relationships among words in a sentence. Since it was first proposed in [11], dependency grammar frameworks have been developed by a number of researchers, e.g., extensible dependency grammar [12, 13], word grammar [14], Stanford dependency [15, 16, 17] Typed-dependency directed acyclic graphs The dependency relationship among words in a sentence can be represented by a typed-dependency directed acyclic graph (DAG) [1, 15, 16, 17]. For example, the sentence I am studying dependency grammar can be represented as follows: I NSUBJ studying AUX am DOBJ grammar NN dependency Figure 1. Typed-Dependency Directed Acyclic Graph for I am studying dependency grammar In Figure 1, each of the words is represented as one node. The dependency relationship between words is represented by an arc with a label. The direction of the arc represents the direction of the dependency. An arc starts from the head node, and it ends at the tail node. For example, the word studying is the head of the words I, am, and grammar, and the labels of these dependencies are nsubj, aux, and dobj, respectively. The dependency relationship among words in a sentence is acyclic, meaning that the no path from any node in the graph leads to the same node.
2 The output of the Stanford Parser [15, 17] concisely represents the dependency relationship among the words in a sentence. The output for the sentence I am studying dependency grammar is shown in (1): (1) nsubj(studying-3, I-1) aux(studying-3, am-2) dobj(studying-3, grammar-5) nn(theory-5, dependency-4) The first line in (1) states that the dependency between the third and the first word in the input sentence is typed as nsubj (an abbreviation for nominal subject ). The Stanford parser categorizes dependencies into one of the 55 different types [17]. The parse output of a sentence by the Stanford parser is a typed-dependency DAG representation for an input sentence, and we can calculate the structural characteristics of the output. Consider sentences (2) and (3); both of them have three words, yet the dependency relationships among them and the structural characteristics are different: (2) Write an article. (3) David wrote it. The Stanford parser outputs for (2) and (3) are (4) and (5), respectively (the nodes for the period are excluded; the root node is an abstract node postulated to be the head of the root of the dependency tree): (4) root(root-, Write-1) det(article-3, an-2) dobj(write-1, article-3) (5) nsubj(wrote-2, David-1) root(root-, wrote-2) dobj(wrote-2, it-3) The typed-dependency DAG representations for (2) and (3) are shown in Figure 2 and Figure 3, respectively: Root ROOT Write DOBJ article DET an Figure 2. Typed-Dependency DAG Representation for Write an article. ROOT wrote NSUBJ DOBJ David Root Figure 3. Typed-Dependency DAG Representation for David wrote it. The dependency relationship in the former DAG is deeper than that in the latter. There are three arcs from the root node to the terminal node in the DAG in Figure 2, whereas there are two arcs from the root to the terminal node in the DAG in Figure 3. On the other hand, the dependency relationship in the latter DAG is wider than that in the former. The verb read is connected to two other words in the DAG in Figure 3, while the verb read is connected to one word in Figure 2. For sentences with the same word count, we can have typed-dependency DAGs with different widths and depths, as will be shown in Section 2.4. The width and the depth of a given DAG can be calculated as degree centrality and closeness centrality in a welldefined manner Graph centrality measures Degree centrality of a node in a given graph is defined as the number of nodes connected to this node [2]. The degree centrality of a given graph as a whole is the sum of the maximum degree in the graph minus the degree of each of all of the other nodes, divided by the largest possible sum of the maximum degree of the graph minus the degree of all the other nodes [3]. Degree centrality increases in proportion to the flatness of dependency DAGs, and when a DAG is a star graph, its degree centrality is 1. For example, the degree centralities of sentences (2) and (3) are.333 and 1, respectively; these degree centralities indicate that sentence (3) is flatter than (2) in terms of the typeddependency relationships among the words. Closeness centrality is defined by the reciprocal of the average distance from a given node to another in a graph [2, 3]. The distance from one node to another is the number of arcs between them. In typed-dependency DAG representations, the most relevant distance is that from the root node to all the other nodes, since it represents the depth of the dependency DAG. The closeness centrality of a DAG is 1 when it contains only two nodes: the root node and one word dependent on it. Closeness centrality decreases in proportion to embeddedness, or the distance from the root to the other nodes. Smaller closeness centralities indicate more it
3 embedded typed-dependency DAGs. For example, the closeness centralities of sentences (2) and (3) are.5 and.6, respectively; these figures show that sentence (2) is more embedded than (3) in terms of the typed dependency relationships among the words Interpretation of graph centrality measures Observing the distribution of the sentences in the corpus in terms of these two centralities, we can have some objective insight into the structural characteristics of these sentences. For example, if we find that sentences with a certain value of degree centrality or closeness centrality appear more frequently than other sentences with other values of these centralities, these sentences can be argued to have typical structural characteristics representative of the corpus in which these sentences are found Related work The first study of syntactic networks based on data from dependency treebanks is [18], showing that their networks fit the small world model [19]. Quantitative network analysis (QNA) is introduced in [2] as a means for classifying complex networks according to their topology, and QNA is applied to genealogical classification of languages in [21]. An approach to automatic language classification of 11 languages according to their dependency networks is introduced in [22]; this includes degree and closeness centralities as typological network indices. The results matches the genealogical similarities of these languages, yet the 11 languages do not include Japanese, a language genealogically and typologically different from them. 3. Analysis In order to verify the idea in the previous section, a corpus-based analysis was carried out. In [5, 8, 9], an analysis of a small-scale parallel corpus was conducted for different purposes. In this study, a large-scale English-Japanese parallel corpus is used to obtain more reliable cross-linguistic results Procedure The centrality measures of the typed-dependency DAGs of the sentences in a given corpus are calculated automatically. The English sentences in the parallel corpus were parsed by the Stanford Typed-dependency parser, and their degree centralities and closeness centralities were calculated by a Ruby script used in [5]. The Japanese sentences in the corpus, on the other hand, were parsed by the Kurohashi-Nagao Parser (KNP) [23, 24, 25], and its output was transformed automatically into Stanford-Parser-style triples (the method of this automatic conversion of KNP output into triples is based on [26, 27]), from which their degree centralities and closeness centralities were calculated by the same script. KNP outputs the dependency relationships among syntactic units (bunsetsu in Japanese), which consist of one content word, followed by one particle when necessary to demonstrate the case. In order to compare its output with that of the Stanford dependency parser, the output format of the Stanford dependency parser was set to Collapsed Tree [17], wherein prepositions are converted into the name of the dependency type. For example, the prepositional phrase on Monday in a sentence I read this book on Monday. depends on the verb read with the dependency type prep_on Description of the text data The corpus used in this study is the Japanese- English Bilingual Corpus of Wikipedia s Kyoto Articles ver.2.1 (National Institute of Information and Communications Technology, 211). This corpus contains about 5, English-Japanese pairs of manually translated sentences on topics related to Kyoto. This corpus is divided into 16 subcorpora according to their contents (such as religion, famous people, or famous buildings). In this study, the sentences in the subcorpus on notable buildings in Kyoto are chosen randomly (henceforth BLD) to calculate their degree centralities and closeness centralities Results The descriptive statistics of the degree centralities and closeness centralities in the English sentences and Japanese sentences in BLD are as follows: Degree Closeness Mean S.D. Mean S.D. E J Table 1: Descriptive statistics (E: English; J: Japanese) Figures 4 and 5 show the frequencies of degree centralities of the Japanese and English sentences in BLD, respectively. When we round off these degree centralities to two decimal places, the most frequent degree centrality among the Japanese sentences is 1 (approx. 7.6% of all the sentences). The second most frequent degree centrality among Japanese sentences is.24 (approx. 6.2%). The most frequent degree centrality among the English sentences is.24 (approx. 4.5%), and the second most frequent value is.26
4 (approx. 4.3%). English sentences with degree centrality 1 are not as frequent as Japanese ones Figure 4. Frequency of Degree Centralities of Japanese Sentences in BLD; x = degree centrality; y = frequency; n = Figure 5. Frequency of Degree Centralities of English Sentences in BLD; x = degree centrality; y = frequency; n = 1595 Figures 6 and 7 show the frequencies of sentences in terms of the values of closeness centralities of the Japanese and English sentences in BLD, respectively. When we round off these closeness centralities to two decimal places, the most frequent value among the Japanese sentences is.4 (approx. 5.3%), and the second most frequent value is.5 (approx. 4.6%). The most frequent value among the English sentences is.33 (approx. 5.3%), and the second most frequent value is.32 (approx. 5.2%) Figure 6. Frequency of Closeness Centralities of Japanese Sentences in BLD; x = closeness centrality; y = frequency; n = 1595
5 Figure 7. Frequency of Closeness Centralities of English Sentences in BLD; x = closeness centrality; y = frequency; n = Discussion The distribution of the degree centralities of English and that of Japanese sentences in BLD differ in that the variety of the degree centralities of the English sentences in BLD is larger than that of the Japanese sentences. In other words, degree centralities of English sentences do not concentrate on particular values, compared to those of Japanese sentences. This difference seems to suggest that English has a larger variety of structural settings of typeddependency DAGs compared to Japanese. In this context, special attention should be paid to the relatively high frequency of Japanese sentences with degree centrality 1, which means that their typeddependency DAGs share the star-graph setting, in which one word is connected to all the other words (see Figure 3 for an example of a star graph). This type of Japanese sentences can be regarded as having structural characteristics that are typical to this language, as far as the flatness of their typed-dependency DAGs is concerned. In English, on the other hand, typed-dependency DAGs in the star-graph setting do not exhibit such a typical typeddependency DAG setting in terms of their flatness. The difference between the frequencies of closeness centralities between the English and Japanese sentences in BLD is not as extensive as that in the case of their degree centralities, yet in Japanese, certain closeness centralities are more frequent than others are. This also can be considered to reflect the structural characteristics of Japanese, especially with respect to the fact that these more frequent closeness centralities are found in oneword sentences (their closeness centrality is.75), twoword sentences (.67), and three-word sentences (.63). These tendencies in the distributions of degree centralities and closeness centralities can be applied to a variety of fields, along with typological classification of languages [21, 22]. For example, in the field of language teaching, it can be possible to estimate the naturalness of English essays written by Japanese learners of English, or vice versa, in terms of the flatness expressed by degree centrality or of the embeddedness expressed by closeness centrality. A rough sketch of applying centrality measures into language teaching is as follows: if it is found that some Japanese learners tend to write many flat English sentences more frequently than English native speakers do, they need some advice so that they will write more embedded English sentences. Apart from language teaching, this estimation of naturalness of sentences of a certain language in terms of their flatness and embeddedness can also be applied to the result of sentence-generating applications, such as machine translation. Naturalness of sentences, whether generated by a system or written by humans, in terms of their centrality measures, is one of the possible research questions of future research. 5. Conclusion This study explored the possibility of automatic numerical analysis of the syntactic structure of sentences in an English-Japanese parallel corpus in terms of the centrality measures (degree centrality and closeness centrality) of typed dependencies among words in these sentences. The results suggest that centrality measures can reflect the difference of structural settings of these languages in terms of their flatness, and further research is required to apply the centrality measures of typeddependency DAGs for the naturalness of sentences. 6. Acknowledgement This work was supported by JSPS KAKENHI Grant Number References [1] M. Oya, Directed Acyclic Graph Representation of Grammatical Knowledge and its Application for Calculating Sentence Complexity, Proceedings of the
6 15th International Conference of Pan-Pacific Association of Applied Linguistics, 21, pp [2] L. Freeman, Centrality in Social Networks, Social Networks, Vol.1, 1979, pp [3] Wasserman, S. and K. Faust, Social Network Analysis, Cambridge: Cambridge University Press, [4] M. Oya, Degree Centralities, Closeness Centralities, and Dependency Distances of Different Genres of Texts, Selected Papers from the 17th International Conference of Pan-Pacific Association of Applied Linguistics, pp , 213. [5] M. Oya, A Study of Syntactic Typed-Dependency Trees for English and Japanese and Graph-Centrality Measures, Doctoral dissertation, Waseda University, 214. [6] M. Oya, Dependency-grammar analyses of different genres of English, Second Asia Pacific Corpus Linguistics Conference (APCLC 214), The Hong Kong Polytechnic University, 214. [7] M. Oya, Extracting Structural Properties from Syntactic Dependency Corpus, The 4th Conference of Japan Association for English Corpus Studies, Kumamoto Gakuen University, 214. [8] M. Oya, Syntactic Dependency Structures of English and Japanese, Mejiro Journal of Humanities, Vol. 9, 213, pp [9] M. Oya, Typed-dependency Tree Pairs of English and Japanese, Mejiro Journal of Humanities, Vol. 1, 214, pp [1] M. Oya, An English-Japanese bilingual corpusbased comparison of their syntactic dependency structures, The 19th Conference of Pan-Pacific Association of Applied Linguistics, Waseda University, 214. [11] L. Tesnière, Éléments de syntaxe structural, Paris: Klincksieck, [12] R. Debusmann, Dependency Grammar as Graph Description, Prospects and Advances in the Syntax- Semantics Interface, Nancy, 23, Retrieved July 27th, 21, from Publications/documents/passi3.pdf [13] R. Debusmann and M. Kuhlmann, Dependency Grammar: Classification and Exploration, Project report (CHORUS, SFB 378), 27, Retrieved July 3, 21, from [14] Hudson, R., An Introduction to Word Grammar, Cambridge University Press, 21. [15] M. C. de Marneffe, B. MacCartney, and C. D. Manning, Generating Typed Dependency Parses from Phrase Structure Parses, Proceedings of the Fifth International Conference on Language Resources and Evaluation, 26. [16] M. C. de Marneffe and C. D. Manning, The Stanford Typed Dependencies Representation, COLING Workshop on Cross-framework and Cross-domain Parser Evaluation, 29. [17] M. C. de Marneffe and C.D. Manning, Stanford Typed Dependency Manual, Retrieved July 3, 21, from manual.pdf. [18] R. Ferrer i Cancho, R. Solé, and R. Köhler. Patterns in Syntactic Dependency Networks. Physical Review E, 69, 51915, 69:5, 24. [19] D. J. Watts and S. H. Strogatz Collective Dynamics of Small World Networks. Nature, 393, pp , [2] A. Mehler, Structural Similarities of Complex Networks: A Computational Model by Text Representation. Applied Artificial Intelligence, 22, pp , 28. [21] A. Mehler, O. Pustylnikov, and N. Diewald, The Geography of Social Ontologies: The Sapir-Whorf Hypothesis Revisited. Computer, Speech and Language. London: Academic Press, 21. [22] O. Abramov and A. Mehler, Automatic Language Classification by Means of Syntactic Dependency Networks Journal of Quantitative Linguistics 18:4, pp , 211. [23] S. Kurohashi and M. Nagao, A Method for Analyzing Conjunctive Structures in Japanese, Journal of Information Processing Society of Japan, Vol. 33, No. 8, 1992, pp [24] S. Kurohashi and M. Nagao, A Syntactic Analysis Method of Long Japanese Sentences based on Coordinate Structure Detection, Journal of Natural Language Processing, Vol.1, No.1, 1994, pp [25] S. Kurohashi and and M. Nagao, Building a Japanese Parsed Corpus while Improving the Parsing System, Proceedings of the 1st International Conference on Language Resources and Evaluation, 1998, pp [26] M. Oya, A Method of Automatic Acquisition of Typed-dependency Representation of Japanese Syntactic Structure, Proceedings of the 14th Conference of Pan- Pacific Association of Applied Linguistics, 29, pp [27] M. Oya, Treebank-Based Automatic Acquisition of Wide Coverage, Deep Linguistic Resources for Japanese, M.Sc. thesis, School of Computing, Dublin City University, 21.
AQUA: An Ontology-Driven Question Answering System
AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.
More information11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation
tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each
More informationA Graph Based Authorship Identification Approach
A Graph Based Authorship Identification Approach Notebook for PAN at CLEF 2015 Helena Gómez-Adorno 1, Grigori Sidorov 1, David Pinto 2, and Ilia Markov 1 1 Center for Computing Research, Instituto Politécnico
More informationUNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen
UNIVERSITY OF OSLO Department of Informatics Dialog Act Recognition using Dependency Features Master s thesis Sindre Wetjen November 15, 2013 Acknowledgments First I want to thank my supervisors Lilja
More informationContent Language Objectives (CLOs) August 2012, H. Butts & G. De Anda
Content Language Objectives (CLOs) Outcomes Identify the evolution of the CLO Identify the components of the CLO Understand how the CLO helps provide all students the opportunity to access the rigor of
More informationBasic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1
Basic Parsing with Context-Free Grammars Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Announcements HW 2 to go out today. Next Tuesday most important for background to assignment Sign up
More informationParsing of part-of-speech tagged Assamese Texts
IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal
More informationContext Free Grammars. Many slides from Michael Collins
Context Free Grammars Many slides from Michael Collins Overview I An introduction to the parsing problem I Context free grammars I A brief(!) sketch of the syntax of English I Examples of ambiguous structures
More informationIntroduction to Causal Inference. Problem Set 1. Required Problems
Introduction to Causal Inference Problem Set 1 Professor: Teppei Yamamoto Due Friday, July 15 (at beginning of class) Only the required problems are due on the above date. The optional problems will not
More informationTrend Survey on Japanese Natural Language Processing Studies over the Last Decade
Trend Survey on Japanese Natural Language Processing Studies over the Last Decade Masaki Murata, Koji Ichii, Qing Ma,, Tamotsu Shirado, Toshiyuki Kanamaru,, and Hitoshi Isahara National Institute of Information
More informationYoshida Honmachi, Sakyo-ku, Kyoto, Japan 1 Although the label set contains verb phrases, they
FlowGraph2Text: Automatic Sentence Skeleton Compilation for Procedural Text Generation 1 Shinsuke Mori 2 Hirokuni Maeta 1 Tetsuro Sasada 2 Koichiro Yoshino 3 Atsushi Hashimoto 1 Takuya Funatomi 2 Yoko
More informationProcedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing
Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 141 ( 2014 ) 124 128 WCLTA 2013 Using Corpus Linguistics in the Development of Writing Blanka Frydrychova
More informationTarget Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data
Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationThe presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.
Lecture 4: OT Syntax Sources: Kager 1999, Section 8; Legendre et al. 1998; Grimshaw 1997; Barbosa et al. 1998, Introduction; Bresnan 1998; Fanselow et al. 1999; Gibson & Broihier 1998. OT is not a theory
More informationSyntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm syntax: from the Greek syntaxis, meaning setting out together
More informationIntroduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.
to as a linguistic theory to to a member of the family of linguistic frameworks that are called generative grammars a grammar which is formalized to a high degree and thus makes exact predictions about
More informationAchim Stein: Diachronic Corpora Aston Corpus Summer School 2011
Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011 Achim Stein achim.stein@ling.uni-stuttgart.de Institut für Linguistik/Romanistik Universität Stuttgart 2nd of August, 2011 1 Installation
More informationNumeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C
Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C Using and applying mathematics objectives (Problem solving, Communicating and Reasoning) Select the maths to use in some classroom
More informationCS 598 Natural Language Processing
CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@
More informationApplications of memory-based natural language processing
Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal
More informationThe Effect of Multiple Grammatical Errors on Processing Non-Native Writing
The Effect of Multiple Grammatical Errors on Processing Non-Native Writing Courtney Napoles Johns Hopkins University courtneyn@jhu.edu Aoife Cahill Nitin Madnani Educational Testing Service {acahill,nmadnani}@ets.org
More informationSome Principles of Automated Natural Language Information Extraction
Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract
More informationAn Interactive Intelligent Language Tutor Over The Internet
An Interactive Intelligent Language Tutor Over The Internet Trude Heift Linguistics Department and Language Learning Centre Simon Fraser University, B.C. Canada V5A1S6 E-mail: heift@sfu.ca Abstract: This
More informationNew Ways of Connecting Reading and Writing
Sanchez, P., & Salazar, M. (2012). Transnational computer use in urban Latino immigrant communities: Implications for schooling. Urban Education, 47(1), 90 116. doi:10.1177/0042085911427740 Smith, N. (1993).
More informationGrammars & Parsing, Part 1:
Grammars & Parsing, Part 1: Rules, representations, and transformations- oh my! Sentence VP The teacher Verb gave the lecture 2015-02-12 CS 562/662: Natural Language Processing Game plan for today: Review
More informationDeveloping True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability
Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability Shih-Bin Chen Dept. of Information and Computer Engineering, Chung-Yuan Christian University Chung-Li, Taiwan
More informationApproaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque
Approaches to control phenomena handout 6 5.4 Obligatory control and morphological case: Icelandic and Basque Icelandinc quirky case (displaying properties of both structural and inherent case: lexically
More informationChunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.
NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationEnsemble Technique Utilization for Indonesian Dependency Parser
Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id
More informationLanguage Acquisition Chart
Language Acquisition Chart This chart was designed to help teachers better understand the process of second language acquisition. Please use this chart as a resource for learning more about the way people
More information- «Crede Experto:,,,». 2 (09) (http://ce.if-mstuca.ru) '36
- «Crede Experto:,,,». 2 (09). 2016 (http://ce.if-mstuca.ru) 811.512.122'36 Ш163.24-2 505.. е е ы, Қ х Ц Ь ғ ғ ғ,,, ғ ғ ғ, ғ ғ,,, ғ че ые :,,,, -, ғ ғ ғ, 2016 D. A. Alkebaeva Almaty, Kazakhstan NOUTIONS
More informationSouth Carolina English Language Arts
South Carolina English Language Arts A S O F J U N E 2 0, 2 0 1 0, T H I S S TAT E H A D A D O P T E D T H E CO M M O N CO R E S TAT E S TA N DA R D S. DOCUMENTS REVIEWED South Carolina Academic Content
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationTowards a MWE-driven A* parsing with LTAGs [WG2,WG3]
Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Jakub Waszczuk, Agata Savary To cite this version: Jakub Waszczuk, Agata Savary. Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]. PARSEME 6th general
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationProject in the framework of the AIM-WEST project Annotation of MWEs for translation
Project in the framework of the AIM-WEST project Annotation of MWEs for translation 1 Agnès Tutin LIDILEM/LIG Université Grenoble Alpes 30 october 2014 Outline 2 Why annotate MWEs in corpora? A first experiment
More informationLinguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis
International Journal of Arts Humanities and Social Sciences (IJAHSS) Volume 1 Issue 1 ǁ August 216. www.ijahss.com Linguistic Variation across Sports Category of Press Reportage from British Newspapers:
More informationFourth Grade. Reporting Student Progress. Libertyville School District 70. Fourth Grade
Fourth Grade Libertyville School District 70 Reporting Student Progress Fourth Grade A Message to Parents/Guardians: Libertyville Elementary District 70 teachers of students in kindergarten-5 utilize a
More informationBANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS
Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.
More informationAbstractions and the Brain
Abstractions and the Brain Brian D. Josephson Department of Physics, University of Cambridge Cavendish Lab. Madingley Road Cambridge, UK. CB3 OHE bdj10@cam.ac.uk http://www.tcm.phy.cam.ac.uk/~bdj10 ABSTRACT
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationTowards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la
Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing Grzegorz Chrupa la A dissertation submitted in fulfilment of the requirements for the award of Doctor of Philosophy (Ph.D.)
More informationExtending Place Value with Whole Numbers to 1,000,000
Grade 4 Mathematics, Quarter 1, Unit 1.1 Extending Place Value with Whole Numbers to 1,000,000 Overview Number of Instructional Days: 10 (1 day = 45 minutes) Content to Be Learned Recognize that a digit
More informationAssociation Between Categorical Variables
Student Outcomes Students use row relative frequencies or column relative frequencies to informally determine whether there is an association between two categorical variables. Lesson Notes In this lesson,
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More information1. Programme title and designation International Management N/A
PROGRAMME APPROVAL FORM SECTION 1 THE PROGRAMME SPECIFICATION 1. Programme title and designation International Management 2. Final award Award Title Credit value ECTS Any special criteria equivalent MSc
More informationAN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)
B. PALTRIDGE, DISCOURSE ANALYSIS: AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC. 2012. PP. VI, 282) Review by Glenda Shopen _ This book is a revised edition of the author s 2006 introductory
More informationA Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique
A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique Hiromi Ishizaki 1, Susan C. Herring 2, Yasuhiro Takishima 1 1 KDDI R&D Laboratories, Inc. 2 Indiana University
More informationIntra-talker Variation: Audience Design Factors Affecting Lexical Selections
Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and
More informationAn Empirical and Computational Test of Linguistic Relativity
An Empirical and Computational Test of Linguistic Relativity Kathleen M. Eberhard* (eberhard.1@nd.edu) Matthias Scheutz** (mscheutz@cse.nd.edu) Michael Heilman** (mheilman@nd.edu) *Department of Psychology,
More informationObjectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition
Chapter 2: The Representation of Knowledge Expert Systems: Principles and Programming, Fourth Edition Objectives Introduce the study of logic Learn the difference between formal logic and informal logic
More informationMaximizing Learning Through Course Alignment and Experience with Different Types of Knowledge
Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February
More informationThe following information has been adapted from A guide to using AntConc.
1 7. Practical application of genre analysis in the classroom In this part of the workshop, we are going to analyse some of the texts from the discipline that you teach. Before we begin, we need to get
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationMemory-based grammatical error correction
Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,
More informationEnhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities
Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion
More informationThe role of the first language in foreign language learning. Paul Nation. The role of the first language in foreign language learning
1 Article Title The role of the first language in foreign language learning Author Paul Nation Bio: Paul Nation teaches in the School of Linguistics and Applied Language Studies at Victoria University
More informationDeveloping a TT-MCTAG for German with an RCG-based Parser
Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,
More informationReinforcement Learning by Comparing Immediate Reward
Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate
More informationLEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE
LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE Submitted in partial fulfillment of the requirements for the degree of Sarjana Sastra (S.S.)
More informationProcedia - Social and Behavioral Sciences 143 ( 2014 ) CY-ICER Teacher intervention in the process of L2 writing acquisition
Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 143 ( 2014 ) 238 242 CY-ICER 2014 Teacher intervention in the process of L2 writing acquisition Blanka
More informationSyntactic and Lexical Simplification: The Impact on EFL Listening Comprehension at Low and High Language Proficiency Levels
ISSN 1798-4769 Journal of Language Teaching and Research, Vol. 5, No. 3, pp. 566-571, May 2014 Manufactured in Finland. doi:10.4304/jltr.5.3.566-571 Syntactic and Lexical Simplification: The Impact on
More informationHandbook for Graduate Students in TESL and Applied Linguistics Programs
Handbook for Graduate Students in TESL and Applied Linguistics Programs Section A Section B Section C Section D M.A. in Teaching English as a Second Language (MA-TESL) Ph.D. in Applied Linguistics (PhD
More informationPrediction of Maximal Projection for Semantic Role Labeling
Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba
More informationPage 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified
Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General Grade(s): None specified Unit: Creating a Community of Mathematical Thinkers Timeline: Week 1 The purpose of the Establishing a Community
More informationLoughton School s curriculum evening. 28 th February 2017
Loughton School s curriculum evening 28 th February 2017 Aims of this session Share our approach to teaching writing, reading, SPaG and maths. Share resources, ideas and strategies to support children's
More informationOCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationLinguistics. The School of Humanities
Linguistics The School of Humanities Ch a i r Nancy Niedzielski Pr o f e s s o r Masayoshi Shibatani Stephen A. Tyler Professors Emeriti James E. Copeland Philip W. Davis Sydney M. Lamb Associate Professors
More informationConcept mapping instrumental support for problem solving
40 Int. J. Cont. Engineering Education and Lifelong Learning, Vol. 18, No. 1, 2008 Concept mapping instrumental support for problem solving Slavi Stoyanov* Open University of the Netherlands, OTEC, P.O.
More informationRule Learning with Negation: Issues Regarding Effectiveness
Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX
More informationCorpus Linguistics (L615)
(L615) Basics of Markus Dickinson Department of, Indiana University Spring 2013 1 / 23 : the extent to which a sample includes the full range of variability in a population distinguishes corpora from archives
More informationMinimalism is the name of the predominant approach in generative linguistics today. It was first
Minimalism Minimalism is the name of the predominant approach in generative linguistics today. It was first introduced by Chomsky in his work The Minimalist Program (1995) and has seen several developments
More informationProcedia - Social and Behavioral Sciences 154 ( 2014 )
Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 154 ( 2014 ) 263 267 THE XXV ANNUAL INTERNATIONAL ACADEMIC CONFERENCE, LANGUAGE AND CULTURE, 20-22 October
More informationLQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization
LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization Annemarie Friedrich, Marina Valeeva and Alexis Palmer COMPUTATIONAL LINGUISTICS & PHONETICS SAARLAND UNIVERSITY, GERMANY
More informationThe Effect of Extensive Reading on Developing the Grammatical. Accuracy of the EFL Freshmen at Al Al-Bayt University
The Effect of Extensive Reading on Developing the Grammatical Accuracy of the EFL Freshmen at Al Al-Bayt University Kifah Rakan Alqadi Al Al-Bayt University Faculty of Arts Department of English Language
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationSecond Exam: Natural Language Parsing with Neural Networks
Second Exam: Natural Language Parsing with Neural Networks James Cross May 21, 2015 Abstract With the advent of deep learning, there has been a recent resurgence of interest in the use of artificial neural
More informationEvidence for Reliability, Validity and Learning Effectiveness
PEARSON EDUCATION Evidence for Reliability, Validity and Learning Effectiveness Introduction Pearson Knowledge Technologies has conducted a large number and wide variety of reliability and validity studies
More information1. Introduction. 2. The OMBI database editor
OMBI bilingual lexical resources: Arabic-Dutch / Dutch-Arabic Carole Tiberius, Anna Aalstein, Instituut voor Nederlandse Lexicologie Jan Hoogland, Nederlands Instituut in Marokko (NIMAR) In this paper
More informationAge Effects on Syntactic Control in. Second Language Learning
Age Effects on Syntactic Control in Second Language Learning Miriam Tullgren Loyola University Chicago Abstract 1 This paper explores the effects of age on second language acquisition in adolescents, ages
More informationThe stages of event extraction
The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks
More informationThe Internet as a Normative Corpus: Grammar Checking with a Search Engine
The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a
More informationA Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many
Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.
More informationEdexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE
Edexcel GCSE Statistics 1389 Paper 1H June 2007 Mark Scheme Edexcel GCSE Statistics 1389 NOTES ON MARKING PRINCIPLES 1 Types of mark M marks: method marks A marks: accuracy marks B marks: unconditional
More informationTeaching ideas. AS and A-level English Language Spark their imaginations this year
Teaching ideas AS and A-level English Language Spark their imaginations this year We ve put together this handy set of teaching ideas so you can explore new ways to engage your AS and A-level English Language
More informationTHE ECONOMIC IMPACT OF THE UNIVERSITY OF EXETER
THE ECONOMIC IMPACT OF THE UNIVERSITY OF EXETER Report prepared by Viewforth Consulting Ltd www.viewforthconsulting.co.uk Table of Contents Executive Summary... 2 Background to the Study... 6 Data Sources
More informationThe Ups and Downs of Preposition Error Detection in ESL Writing
The Ups and Downs of Preposition Error Detection in ESL Writing Joel R. Tetreault Educational Testing Service 660 Rosedale Road Princeton, NJ, USA JTetreault@ets.org Martin Chodorow Hunter College of CUNY
More informationMASTER S THESIS GUIDE MASTER S PROGRAMME IN COMMUNICATION SCIENCE
MASTER S THESIS GUIDE MASTER S PROGRAMME IN COMMUNICATION SCIENCE University of Amsterdam Graduate School of Communication Kloveniersburgwal 48 1012 CX Amsterdam The Netherlands E-mail address: scripties-cw-fmg@uva.nl
More informationA Domain Ontology Development Environment Using a MRD and Text Corpus
A Domain Ontology Development Environment Using a MRD and Text Corpus Naomi Nakaya 1 and Masaki Kurematsu 2 and Takahira Yamaguchi 1 1 Faculty of Information, Shizuoka University 3-5-1 Johoku Hamamatsu
More informationMulti-Lingual Text Leveling
Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency
More informationHyperedge Replacement and Nonprojective Dependency Structures
Hyperedge Replacement and Nonprojective Dependency Structures Daniel Bauer and Owen Rambow Columbia University New York, NY 10027, USA {bauer,rambow}@cs.columbia.edu Abstract Synchronous Hyperedge Replacement
More informationImproved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form
Orthographic Form 1 Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form The development and testing of word-retrieval treatments for aphasia has generally focused
More informationGuru: A Computer Tutor that Models Expert Human Tutors
Guru: A Computer Tutor that Models Expert Human Tutors Andrew Olney 1, Sidney D'Mello 2, Natalie Person 3, Whitney Cade 1, Patrick Hays 1, Claire Williams 1, Blair Lehman 1, and Art Graesser 1 1 University
More informationEdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,
More informationReading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-
New York Grade 7 Core Performance Indicators Grades 7 8: common to all four ELA standards Throughout grades 7 and 8, students demonstrate the following core performance indicators in the key ideas of reading,
More informationWhat the National Curriculum requires in reading at Y5 and Y6
What the National Curriculum requires in reading at Y5 and Y6 Word reading apply their growing knowledge of root words, prefixes and suffixes (morphology and etymology), as listed in Appendix 1 of the
More information