n?1 Y CF P (N1; ::::; N n ) = log(p (N1jS P re ) P (N i js In ) P (N n js P ost )) (1) i=2 C(N1; S P re ) P (N1jS P re ) = Pn i=1 C(N i; S P re ) C(N
|
|
- Nancy Woods
- 6 years ago
- Views:
Transcription
1 Segmenting Korean Compound Nouns using Statistical Information and a Preference Rule Bo-Hyun Yun, Min-Jeung Cho, Hae-Chang Rim Department of Computer Science, Korea University 1, 5-ka, Anam-dong, SEOUL, , KOREA ybh@nlp.korea.ac.kr, cmj@nlp.korea.ac.kr, rim@nlp.korea.ac.kr Abstract This paper presents a method of segmenting Korean compound nouns by using statistical information and a preference rule. Statistical information is represented by CFP(Compound noun Formation Probability) that consists of both frequencies of axes and frequencies of two-syllabled and three-syllabled nouns. A preference rule is MNPR(Minimal Noun Preference Rule) that prefers a structure pattern of a compound noun with minimal number of unit nouns. Moreover, we apply three kinds of heuristics in order to segment compound nouns including unknown unit nouns. Experimental results show that the precision of the proposed method is approximately 96% on average. Furthermore, the experiments prove the proposed method can segment compound nouns including unknown nouns and maintain the constant precision rate in segmenting compound nouns extracted from various domains. 1 Introduction Segmenting a compound noun(cn) in a raw corpus is one of the crucial issues for natural language processing systems such as a machine translation system, an information retrieval system, and a spelling checker. It is necessary to segment Korean compound nouns correctly in order to select the right target lexemes in machine translation, to increase the recall rate in information retrieval, and to correct a spacing error of compound nouns in spelling checking. However, the segmentation is a dicult problem because a Korean compound noun consists of more than one unit noun without blanks and because there are possibly many ambiguous segmentations in a compound noun. In segmenting Korean compound nouns in a raw corpus, we have to consider following problems: 1) A raw corpus has various eojeols 1 such as a verbal and an adjectival to be eliminated. 2) An eojeol including a compound noun has several suxes to be removed. 3) There exist many ambiguous segmentations to be resolved in Korean compound nouns. 4) Because all of unit noun(un)s can't be registered in a lexicon, there are many compound nouns including unknown 1 Eojeol is the spacing unit in Korean like a word in English. An eojeol consists of one or more morphemes. It sometimes corresponds to a word or a phrase in English. unit nouns. In this research, we have solved the rst and the second problems by using a morphological analyzer[8] and a POS(Part-Of-Speech) tagger 2 [6, 7, 9] and suggest only the solutions of the third and the fourth problems. To analyze compound nouns in Japanese, Yosiyuki et al[11] uses collocation information and a thesaurus. The accuracy of this method is about 80%. In Chienese, Nie et al[10], at rst, segment a text by using the rule and dictionary-based method. Then a hybrid approach is applied to locate candidates for the unknown words contained therein, and the segmentation process is driven again. This method shows the accuracy of 96.51%. For Korean compound nouns, several segmentation methods[3, 4, 12] have been proposed. Choi[4] applies structure patterns of compound nouns orderly and then segments compound nouns. But this method can't resolve ambiguous segmentations. Yun et al[12] applies several structure patterns and resolves ambiguous segmentations by using the frequencies of head words and statistical preference rules. However, both methods can't segment compound nouns including unknown unit nouns. Chang et al[3] constructs a trie to store corpus information, inserts 2 The morphological analyzer and POS tagger have been developed at the NLP Lab. of Korea University.
2 n?1 Y CF P (N1; ::::; N n ) = log(p (N1jS P re ) P (N i js In ) P (N n js P ost )) (1) i=2 C(N1; S P re ) P (N1jS P re ) = Pn i=1 C(N i; S P re ) C(N i ; S In ) P (N i js In ) = Pn i=1 C(N i; S In ) C(N n ; S P ost ) P (N n js P ost ) = Pn i=1 C(N i; S P ost ) (2) (3) (4) dummy nodes to mark the end of a noun in the learning phase, and analyzes the compound noun by using the constructed trie in the application phase. But the performance of this method is dependent on specic domains. To solve these problems, we propose a method of segmenting compound nouns based on statistical information, CFP and a preference rule, MNPR. 2 Statistical Information and Preference Rule 2.1 Statistical Information To acquire statistical information, we assume that the structure of Korean compound nouns can be expressed in a binary tree. The binary tree consists of a specier and a head. That is, the structure of Korean compound nouns corresponds to the Binary Branch Structure(BBS) based on X' theory in linguistics[5]. Figure 1 shows that the specier and the head have the recursive property as indicated by a symbol '+' in the structure of Korean compound nouns. The specier and the head can also have a subspecier and a subhead respectively. In this research, to simplify the acquisition of statistical information, we dene the unit noun between a specier and a head as an intermediate. Based on the above structure, the frequencies of two-syllabled and three-syllabled nouns are obtained from 81,276 compound nouns registered in the dictionary of Kumsung Publishing Company as follows: The rst unit noun N1 is counted as the speci- er. The middle unit nouns N2? N n?1 are counted as the intermediate. The last unit noun N n is counted as the head. As those compound nouns have the mark '-' which stands for a correct segmentation, it is easy to distinguish the specier, the intermediate and the head. Figure 1: The Structure of Compound Noun The frequencies of one-syllabled axes are acquired from 4,486 three-syllabled compound nouns with a N1 - N2 form as follows: If N1 is one syllable, N1 is counted as the prex. If N2 is one syllable, N2 is counted as the sux. By using the frequency data, we can dene CFP as the equation (1). where S P re, S In, and S P ost are the state of a specier, an intermediate, and a head respectively. C(N1; S P re ), C(N i ; S In ), and C(N1; S P ost ) are the frequencies that N is used as a specier, an intermediate, and a head respectively. Equation (1) is calculated by multiplying the probability that N1 is used as a specier, the probability that N2; :::; N n? 1 is used as an intermediate, and the probability that N n is used as a head[2]. In other words, CFP represents the capacity that unit nouns form a compound noun. Indeed, by using log, we forces the value of the probability to be ranged from 0 to?1. In equation (2), P (N1jS P re ) expresses the probability that N1 is used as a specier. Likewise, the probabilities in equation (3) and (4) have the similar meanings with the probability in equation (2). 2.2 Preference Rule A preference rule, MNPR, is the rule acquired by an empirical study. The basic principle is based on MAP(Minimal Attachment Principle) that is applied to a syntactic analysis[1]. The MAP is the principle
3 that a parse tree with the least node is preferred in resolving structural ambiguity. Similarly, we dene MNPR based on MAP as follows: MNPR(Minimal Noun Preference Rule): If the number of unit nouns is dierent among ambiguous segmentations, we prefer the structure pattern with minimal number of unit nouns. 3 Segmentation Algorithm The algorithm of segmenting compound nouns is shown in Figure 2. At rst, we apply structure patterns of compound nouns by consulting a general noun dictionary with 50,518 entries. If one result is generated, we regard the segmentation result as the correct segmentation. If the given compound noun can be ambiguously segmented, we resolve it by using CFP and MNPR. The method of resolving an ambiguous segmentation is explained in Section 3.1 in detail. If a compound noun can not be segmented, we regard the compound noun as a compound noun including unknown unit nouns and segment compound nouns by the method suggested in the Section 3.2. Segment CN (CN) f Apply structure patterns of compound nouns if ( one segmentation result ) Print the segmentation result else if ( several segmentation results ) Resolve Ambiguity() else if ( no segmentation result ) Segment CN including Unknown Word(CN) g Figure 2: A Segmentation Algorithm 3.1 Resolving Ambiguous Segmentations The algorithm of resolving ambiguous segmentations is performed dierently according to the number of unit nouns. If the number of segmented unit nouns is the same among ambiguous segmentations, we apply statistical information, CFP; otherwise, we apply a preference rule, MNPR. First, if the number of unit nouns is the same among ambiguous segmentations, we apply CFP to segment the compound noun. Table 1 shows total summations of the frequency data used as a speci- er, an intermediate, and a head for the calculation of CFP. For instance, a compound noun 'bujeonghapgukja( A <,, a illegally successful candidate)' can be segmented into both 'bujeonghapgukja( A /<, /, a disharmonious Table 1: Summations of each Specier, Intermediate, and Head Type 2-Syllable 3-Syllable Pn i=1 C(N i; S P re ) Pn i=1 C(N i; S In ) Pn i=1 C(N i; S P ost ) lattice)' and 'bujeonghapgukja( A/ <, /, a illegally successful candidate)'. The frequencies of unit nouns are as follows: C(bujeonghap; S P re ) = 1 C(gukja; S P ost ) = 13 C(bujeong; S P re ) = 87 C(hapgukja; S P ost ) = 4 By using the above frequencies, we calculate CFPs of two candidates as follows: log(cf P (bujeonghap=gukja)) = log(p (bujeonghapjs P re ) P (gukjajs P ost )) =?7:9866 log(cf P (bujeong=hapgukja)) = log(p (bujeongjs P re ) P (hapgukjajs P ost )) =?6:6507 Because CFP(bujeong/hapgukja) is larger thann CFP(bujeonghap/gukja), 'bujeonghapgukja( A <)' is segmented into 'bujeong/hapgukja( A/ <)'. Second, if the number of unit nouns is dierent, we resolve an ambiguous segmentation by MNPR. For example, a compound noun 'golfjangsaupja(p $P z, golfw?, a golf course businessman)' can be segmented into both 'golf/jangsa/upja(p /$P/z, golf/$p/?, a golf trade businessman)' and 'golfjang/saupja(p $/Pz, golf/w?, a golf course businessman)'. The number of unit nouns in 'golfjangsaupja(p /$P/z)' is 3 and the number of unit nouns in 'golfjangsaupja(p $/Pz)' is 2. By MNPR, we choose 'golfjangsaupja(p $/P z)' for the correct segmentation because it has smaller number of unit nouns. 3.2 Segmenting Compound Nouns including Unknown Nouns In general, because all unit nouns can't be registered in a lexicon, many compound nouns include unknown unit nouns. Most of the unknown unit nouns are three-syllabled noun, a foreign noun, and a noun of
4 a specic area. In this research, we segment these compound nouns through three phases. First, if more than three-syllabled noun of a specic position is a known noun, we apply the structure pattern itself. The unit nouns of a specic position are underlined as follows: 6 syllable : 3/3, 4/2, 2/4 7 syllable : 2/3/2, 3/4, 4/3, 5/2, 2/5 8 syllable : 2/3/3, 3/3/2, 2/4/2, 3/5, 5/3, 6/2, 2/6 9 syllable : 3/3/3, 2/3/4, 2/4/3, 3/4/2, 4/3/2, 2/5/2, 3/6, 6/3, 2/7, 7/2 10 syllable : 2/4/4, 4/4/2, 2/4/3, 4/3/3, 3/4/3, 3/3/4, 3/5/2, 2/5/3 For example, a compound noun 'orengekaunti( b /, Orange County)' have a known noun 'orenge( )' and have an unknown noun 'kaunti(b /)'. By a structure pattern '3/3', a compound noun 'orengekaunti( b /)' is correctly segmented into 'orengekaunti( /b /)'. Second, if two-syllabled noun is registered but three-syllabled noun is not registered, we apply the frequencies of an ax. For instance, a compound noun 'gunchuksahuphoy(&9p,, an architect society)' is at rst segmented into 'gunchuk/sa/huphoy(&9/p/ )' because 'gunchuk(&9)' and 'huphoy( )' is registered but 'gunchuksa(&9p)' is not. Then, in order to decide whether an ax 'sa(p)' is a prex or a sux, we use the frequency of a prex and a suf- x. An ax 'sa(p)' was used 29 as a prex and 111 as a sux. Therefore, a compound noun 'gunchuksahuphoy(&9p )' can be correctly segmented into 'gunchuksa/huphoy(&9p/ )'. Third, we assume following default patterns as the patterns that are frequently segmented, and we apply them for a segmentation. 4 syllable : 2/2 5 syllable : 2/3 6 syllable : 2/2/2 7 syllable : 2/2/3 8 syllable : 2/2/2/2 9 syllable : 2/2/2/3 10 syllable : 2/2/2/2/2 Figure 3: System Conguration 4 Experimental Results The system conguration that implements the proposed algorithm is shown in Figure 3. A raw text is analyzed by a morphological analyzer and is tagged a POS tagger. Then, we extract N, N+N, N+N+N, and N+N+N+N forms from a POS-tagged corpus. But a N form may be an unit noun or a compound noun due to the recognition process of unknown nouns. Accordingly, we assume the unit noun is registered in an unit noun dictionary and lter out the unit noun of N forms. After all, the segmentation system receives only compound nouns as an input and produces the one segmentation result. We use three kinds of data to estimate the precision rate on the proposed algorithm. The rst test data is 345 compound nouns including a great deal of an unknown unit noun. The second test data is 1,200 compound nouns extracted from about 1,000 documents of KTSET 2.0 which are used for a test set of information retrieval. The KTSET 2.0 test collection consists of 44,400 documents and 50 queries. It includes the relevance judgment of each document with respect to each query. The third test data is 1,644 compound nouns extracted in a balance and at random from corpora. The third test data is extracted from 19,613 compound nouns that the Korean morphological analyzer can not analyze. We dene the criteria of evaluating the segmentation algorithm as follows: The inclusion rate of unknown nouns : D=B 100
5 Table 2: Experimental Results Type data 1 data 2 data 3 # of CNs in the Input(A) # of CNs Segmented by the System(B) # of CNs including only Known UNs(C) # of CNs including at least one Unknown UN(D) # of Ambiguously Segmented(E) # of CNs Correctly Segmented(F) Inclusion Rate of UN 28.6% 12% 24.1% Rate of Ambiguous Segmenatations 35% 21.5% 24% Precision Rate 95.6% 96.8% 95.8% The rate of ambiguous segmentations : E=B 100 The precision rate : F=B 100 where B,D,E, and F are shown in Table 2. From the result of the rst test data and the third test data, we can say that the proposed algorithm can segment compound nouns including unknown nouns correctly. By the result of the second test data, we can nd that the performance of the proposed algorithm can maintain the constant precision rate in segmenting compound nouns extracted from various domains. In Table 3, we show a data analysis on CFP, MNPR, heuristics of resolving ambiguous segmentations, where B and F are shown in Table 2. The table show that CFP and MNPR are useful informations in resolving ambiguous segmentations. But heuristics to segment compound nouns including at least one unit noun show the precision of 78%. This means that there's still plenty of room for improvement. Table 3: Data Analysis of CFP, MNPR, Heuristics Method B F Precision CFP % MNPR % Heuristics % Our proposed method is compared with other researches as shown in Table 4. In this table, 'Segmentation' means the segmentation of CNs including unknown nouns and 'Resolution' means the resolution of ambiguous segmentations. This table shows the proposed method can segment compound nouns including unknown nouns and resolve ambiguous segmentations at better precision rate. In Table 5, we compare our method with that of Chang apart from existing researches. The reason is Table 4: Results of Comparision 1 Factor Yun95 Choi96 Proposed Segmentation No No Yes Resolution Yes No Yes Precision 82% 83% 95.6% that Yun[12] and Choi[4] use dictionary-based methods but Chang[3] utilizes the corpus-based method. In this table, 'Trained' means the trained data used in order to construct a trie and acquire statistical information. 'Untrained' means the untrained data to evaluate the precision rate besides the trained data. This result shows the proposed method can maintain a constant precision rate regardless of a specic area. Table 5: Results of Comparision 2 Data Chang96 Proposed Tranined 97.66% 98.0% Untrained 87.75% 95.6% KTSET % 96.8% 5 Conclusion In this paper, we have presented four requirements necessary for segmenting Korean compound nouns in a raw corpus and suggested a method of segmenting Korean compound nouns into unit nouns. We applied structure patterns of compound nouns and resolved ambiguous segmentations by using statistical information, CFP, and a preference rule, MNPR. The experimental results have shown that the precision rate is about 96%. The experiments have
6 proved the proposed method can segment compound nouns including unknown nouns and maintain the constant precision rate in segmenting compound nouns extracted from various domains. In future work, we will try to improve the accuracy of segmenting compound nouns including unit nouns. In addition, we will apply the segmentation method to compound noun indexing in order to improve the performance of an information retrieval system. References [11] K. Yosiyuki, T. Takenobu, T. Hozumi, \Analysis of Japanese Compound Nouns using Collocation information," Proc. of the 14th Conference on Computational Linguistics (COLING- 94), pp , [12] B.H. Yun, H.S. Lim, H.C. Rim, \Analysis of Korean Compound Nouns using Statistical Information," Proc. of the 22nd Korea Information Science Society Spring Conference, pp , April [1] J. Allen, Natural Language Understanding, The Benjamin/Cummings Publishing Company Inc., [2] E. Charniak, C. Hendrickson, N. Jacobson, and M. Perkowitz, \Equations for Part-of-speech Tagging," Proc. of the Eleventh National Conference on Ariticial Intelligence, pp , [3] D.H. Chang, S.H. Myaeng, \A Korean Compound Noun Analysis method for Eective indexing," Hangul and Korean Information Processing Conference, pp , (in Korean) [4] J.H. Choi, \A Division Method of Korean Compound Noun by number of syllable," Hangul and Korean Information Processing Conference, pp , (in Korean) [5] W.S. Jung, Word Formation Theory of Korean language, 1st Ed., p.267, Hansin-Culture Publishing Company (in Korean) [6] J.D. Kim, A Korean Part-of-Speech Tagging Model Based on Morpheme-unit with Eojeol Context, M.S. Dissertation, Korea University, (in Korean) [7] S.Z. Lee, Two-level Korean Part-of-Speech Tagging using HMM, M.S. Dissertation, Korea University, (in Korean) [8] H.S. Lim, Korean Mophological Analyzer based on Classication of Ambiguity pattern, M.S. Dissertation, Korea University, (in Korean) [9] H.S. Lim, J.D. Kim, H.C. Rim, \Improvement of Transformation Rule-Based Korean Part-Of- Speech Tagger," Hangul and Korean Information Processing Conference, pp , (in Korean) [10] J.Y. Nie, M.L. Hannan, W. Jin, \Combining Dictionary, Rules and Statistical Information in Segmentation of Chinese," Computer Processing of Chinese and Oriental Languages, Vol. 9, No., 2, pp , 1995.
have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationA Syllable Based Word Recognition Model for Korean Noun Extraction
are used as the most important terms (features) that express the document in NLP applications such as information retrieval, document categorization, text summarization, information extraction, and etc.
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationParsing of part-of-speech tagged Assamese Texts
IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationMULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY
MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationEdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,
More informationImproved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form
Orthographic Form 1 Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form The development and testing of word-retrieval treatments for aphasia has generally focused
More informationCitation for published version (APA): Veenstra, M. J. A. (1998). Formalizing the minimalist program Groningen: s.n.
University of Groningen Formalizing the minimalist program Veenstra, Mettina Jolanda Arnoldina IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF if you wish to cite from
More informationEnhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities
Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion
More informationphone hidden time phone
MODULARITY IN A CONNECTIONIST MODEL OF MORPHOLOGY ACQUISITION Michael Gasser Departments of Computer Science and Linguistics Indiana University Abstract This paper describes a modular connectionist model
More informationNCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches
NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches Yu-Chun Wang Chun-Kai Wu Richard Tzong-Han Tsai Department of Computer Science
More informationLING 329 : MORPHOLOGY
LING 329 : MORPHOLOGY TTh 10:30 11:50 AM, Physics 121 Course Syllabus Spring 2013 Matt Pearson Office: Vollum 313 Email: pearsonm@reed.edu Phone: 7618 (off campus: 503-517-7618) Office hrs: Mon 1:30 2:30,
More informationProceedings of the 19th COLING, , 2002.
Crosslinguistic Transfer in Automatic Verb Classication Vivian Tsang Computer Science University of Toronto vyctsang@cs.toronto.edu Suzanne Stevenson Computer Science University of Toronto suzanne@cs.toronto.edu
More informationThe Role of the Head in the Interpretation of English Deverbal Compounds
The Role of the Head in the Interpretation of English Deverbal Compounds Gianina Iordăchioaia i, Lonneke van der Plas ii, Glorianna Jagfeld i (Universität Stuttgart i, University of Malta ii ) Wen wurmt
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationPOS tagging of Chinese Buddhist texts using Recurrent Neural Networks
POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important
More informationPROTEIN NAMES AND HOW TO FIND THEM
PROTEIN NAMES AND HOW TO FIND THEM KRISTOFER FRANZÉN, GUNNAR ERIKSSON, FREDRIK OLSSON Swedish Institute of Computer Science, Box 1263, SE-164 29 Kista, Sweden LARS ASKER, PER LIDÉN, JOAKIM CÖSTER Virtual
More information2 Mitsuru Ishizuka x1 Keywords Automatic Indexing, PAI, Asserted Keyword, Spreading Activation, Priming Eect Introduction With the increasing number o
PAI: Automatic Indexing for Extracting Asserted Keywords from a Document 1 PAI: Automatic Indexing for Extracting Asserted Keywords from a Document Naohiro Matsumura PRESTO, Japan Science and Technology
More informationTarget Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data
Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se
More information11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation
tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationCS 598 Natural Language Processing
CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@
More informationOCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationDisambiguation of Thai Personal Name from Online News Articles
Disambiguation of Thai Personal Name from Online News Articles Phaisarn Sutheebanjard Graduate School of Information Technology Siam University Bangkok, Thailand mr.phaisarn@gmail.com Abstract Since online
More informationStudies on Key Skills for Jobs that On-Site. Professionals from Construction Industry Demand
Contemporary Engineering Sciences, Vol. 7, 2014, no. 21, 1061-1069 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ces.2014.49133 Studies on Key Skills for Jobs that On-Site Professionals from
More informationAccuracy (%) # features
Question Terminology and Representation for Question Type Classication Noriko Tomuro DePaul University School of Computer Science, Telecommunications and Information Systems 243 S. Wabash Ave. Chicago,
More informationEli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology
ISCA Archive SUBJECTIVE EVALUATION FOR HMM-BASED SPEECH-TO-LIP MOVEMENT SYNTHESIS Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano Graduate School of Information Science, Nara Institute of Science & Technology
More informationUniversiteit Leiden ICT in Business
Universiteit Leiden ICT in Business Ranking of Multi-Word Terms Name: Ricardo R.M. Blikman Student-no: s1184164 Internal report number: 2012-11 Date: 07/03/2013 1st supervisor: Prof. Dr. J.N. Kok 2nd supervisor:
More informationThe Internet as a Normative Corpus: Grammar Checking with a Search Engine
The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationIntra-talker Variation: Audience Design Factors Affecting Lexical Selections
Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and
More informationProject in the framework of the AIM-WEST project Annotation of MWEs for translation
Project in the framework of the AIM-WEST project Annotation of MWEs for translation 1 Agnès Tutin LIDILEM/LIG Université Grenoble Alpes 30 october 2014 Outline 2 Why annotate MWEs in corpora? A first experiment
More informationIterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages
Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer
More informationRANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S
N S ER E P S I M TA S UN A I S I T VER RANKING AND UNRANKING LEFT SZILARD LANGUAGES Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A-1997-2 UNIVERSITY OF TAMPERE DEPARTMENT OF
More informationBooks Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny
By the End of Year 8 All Essential words lists 1-7 290 words Commonly Misspelt Words-55 working out more complex, irregular, and/or ambiguous words by using strategies such as inferring the unknown from
More informationBANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS
Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.
More informationPredicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks
Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com
More informationChunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.
NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and
More informationAQUA: An Ontology-Driven Question Answering System
AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.
More informationPrediction of Maximal Projection for Semantic Role Labeling
Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba
More informationModeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures
Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Ulrike Baldewein (ulrike@coli.uni-sb.de) Computational Psycholinguistics, Saarland University D-66041 Saarbrücken,
More informationThe Verbmobil Semantic Database. Humboldt{Univ. zu Berlin. Computerlinguistik. Abstract
The Verbmobil Semantic Database Karsten L. Worm Univ. des Saarlandes Computerlinguistik Postfach 15 11 50 D{66041 Saarbrucken Germany worm@coli.uni-sb.de Johannes Heinecke Humboldt{Univ. zu Berlin Computerlinguistik
More information! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,
! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &, 4 The Interaction of Knowledge Sources in Word Sense Disambiguation Mark Stevenson Yorick Wilks University of Shef eld University of Shef eld Word sense
More informationFirst Grade Curriculum Highlights: In alignment with the Common Core Standards
First Grade Curriculum Highlights: In alignment with the Common Core Standards ENGLISH LANGUAGE ARTS Foundational Skills Print Concepts Demonstrate understanding of the organization and basic features
More informationScienceDirect. Malayalam question answering system
Available online at www.sciencedirect.com ScienceDirect Procedia Technology 24 (2016 ) 1388 1392 International Conference on Emerging Trends in Engineering, Science and Technology (ICETEST - 2015) Malayalam
More informationClouds = Heavy Sidewalk = Wet. davinci V2.1 alpha3
Identifying and Handling Structural Incompleteness for Validation of Probabilistic Knowledge-Bases Eugene Santos Jr. Dept. of Comp. Sci. & Eng. University of Connecticut Storrs, CT 06269-3155 eugene@cse.uconn.edu
More information2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases
POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz
More informationThe Smart/Empire TIPSTER IR System
The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of
More informationMandarin Lexical Tone Recognition: The Gating Paradigm
Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition
More informationExploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data
Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationSummarizing Text Documents: Carnegie Mellon University 4616 Henry Street
Summarizing Text Documents: Sentence Selection and Evaluation Metrics Jade Goldstein y Mark Kantrowitz Vibhu Mittal Jaime Carbonell y jade@cs.cmu.edu mkant@jprc.com mittal@jprc.com jgc@cs.cmu.edu y Language
More informationTwitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationTHE VERB ARGUMENT BROWSER
THE VERB ARGUMENT BROWSER Bálint Sass sass.balint@itk.ppke.hu Péter Pázmány Catholic University, Budapest, Hungary 11 th International Conference on Text, Speech and Dialog 8-12 September 2008, Brno PREVIEW
More informationDefragmenting Textual Data by Leveraging the Syntactic Structure of the English Language
Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Nathaniel Hayes Department of Computer Science Simpson College 701 N. C. St. Indianola, IA, 50125 nate.hayes@my.simpson.edu
More informationWeb as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics
(L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes
More informationMultilingual Sentiment and Subjectivity Analysis
Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department
More informationCharacter Stream Parsing of Mixed-lingual Text
Character Stream Parsing of Mixed-lingual Text Harald Romsdorfer and Beat Pfister Speech Processing Group Computer Engineering and Networks Laboratory ETH Zurich {romsdorfer,pfister}@tik.ee.ethz.ch Abstract
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationTrend Survey on Japanese Natural Language Processing Studies over the Last Decade
Trend Survey on Japanese Natural Language Processing Studies over the Last Decade Masaki Murata, Koji Ichii, Qing Ma,, Tamotsu Shirado, Toshiyuki Kanamaru,, and Hitoshi Isahara National Institute of Information
More informationPerformance Analysis of Optimized Content Extraction for Cyrillic Mongolian Learning Text Materials in the Database
Journal of Computer and Communications, 2016, 4, 79-89 Published Online August 2016 in SciRes. http://www.scirp.org/journal/jcc http://dx.doi.org/10.4236/jcc.2016.410009 Performance Analysis of Optimized
More informationSCHEMA ACTIVATION IN MEMORY FOR PROSE 1. Michael A. R. Townsend State University of New York at Albany
Journal of Reading Behavior 1980, Vol. II, No. 1 SCHEMA ACTIVATION IN MEMORY FOR PROSE 1 Michael A. R. Townsend State University of New York at Albany Abstract. Forty-eight college students listened to
More informationThe Ups and Downs of Preposition Error Detection in ESL Writing
The Ups and Downs of Preposition Error Detection in ESL Writing Joel R. Tetreault Educational Testing Service 660 Rosedale Road Princeton, NJ, USA JTetreault@ets.org Martin Chodorow Hunter College of CUNY
More informationCharacteristics of the Text Genre Realistic fi ction Text Structure
LESSON 14 TEACHER S GUIDE by Oscar Hagen Fountas-Pinnell Level A Realistic Fiction Selection Summary A boy and his mom visit a pond and see and count a bird, fish, turtles, and frogs. Number of Words:
More informationUsing Semantic Relations to Refine Coreference Decisions
Using Semantic Relations to Refine Coreference Decisions Heng Ji David Westbrook Ralph Grishman Department of Computer Science New York University New York, NY, 10003, USA hengji@cs.nyu.edu westbroo@cs.nyu.edu
More informationTowards a MWE-driven A* parsing with LTAGs [WG2,WG3]
Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Jakub Waszczuk, Agata Savary To cite this version: Jakub Waszczuk, Agata Savary. Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]. PARSEME 6th general
More informationProblems of the Arabic OCR: New Attitudes
Problems of the Arabic OCR: New Attitudes Prof. O.Redkin, Dr. O.Bernikova Department of Asian and African Studies, St. Petersburg State University, St Petersburg, Russia Abstract - This paper reviews existing
More informationHeuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger
Page 1 of 35 Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger Kaihong Liu, MD, MS, Wendy Chapman, PhD, Rebecca Hwa, PhD, and Rebecca S. Crowley, MD, MS
More informationThe stages of event extraction
The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks
More informationTour. English Discoveries Online
Techno-Ware Tour Of English Discoveries Online Online www.englishdiscoveries.com http://ed242us.engdis.com/technotms Guided Tour of English Discoveries Online Background: English Discoveries Online is
More informationPhysics 270: Experimental Physics
2017 edition Lab Manual Physics 270 3 Physics 270: Experimental Physics Lecture: Lab: Instructor: Office: Email: Tuesdays, 2 3:50 PM Thursdays, 2 4:50 PM Dr. Uttam Manna 313C Moulton Hall umanna@ilstu.edu
More informationA Domain Ontology Development Environment Using a MRD and Text Corpus
A Domain Ontology Development Environment Using a MRD and Text Corpus Naomi Nakaya 1 and Masaki Kurematsu 2 and Takahira Yamaguchi 1 1 Faculty of Information, Shizuoka University 3-5-1 Johoku Hamamatsu
More informationBridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models
Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models Jung-Tae Lee and Sang-Bum Kim and Young-In Song and Hae-Chang Rim Dept. of Computer &
More informationLoughton School s curriculum evening. 28 th February 2017
Loughton School s curriculum evening 28 th February 2017 Aims of this session Share our approach to teaching writing, reading, SPaG and maths. Share resources, ideas and strategies to support children's
More information1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature
1 st Grade Curriculum Map Common Core Standards Language Arts 2013 2014 1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature Key Ideas and Details
More informationEnsemble Technique Utilization for Indonesian Dependency Parser
Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id
More informationConstructing Parallel Corpus from Movie Subtitles
Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing
More informationOn document relevance and lexical cohesion between query terms
Information Processing and Management 42 (2006) 1230 1247 www.elsevier.com/locate/infoproman On document relevance and lexical cohesion between query terms Olga Vechtomova a, *, Murat Karamuftuoglu b,
More informationTHE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING
SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,
More informationSome Principles of Automated Natural Language Information Extraction
Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract
More informationDerivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.
Final Exam (120 points) Click on the yellow balloons below to see the answers I. Short Answer (32pts) 1. (6) The sentence The kinder teachers made sure that the students comprehended the testable material
More informationAGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS
AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic
More informationCOMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR
COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR ROLAND HAUSSER Institut für Deutsche Philologie Ludwig-Maximilians Universität München München, West Germany 1. CHOICE OF A PRIMITIVE OPERATION The
More informationarxiv: v1 [cs.cl] 2 Apr 2017
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,
More informationContext Free Grammars. Many slides from Michael Collins
Context Free Grammars Many slides from Michael Collins Overview I An introduction to the parsing problem I Context free grammars I A brief(!) sketch of the syntax of English I Examples of ambiguous structures
More informationOn-Line Data Analytics
International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob
More informationTraining and evaluation of POS taggers on the French MULTITAG corpus
Training and evaluation of POS taggers on the French MULTITAG corpus A. Allauzen, H. Bonneau-Maynard LIMSI/CNRS; Univ Paris-Sud, Orsay, F-91405 {allauzen,maynard}@limsi.fr Abstract The explicit introduction
More informationDerivational and Inflectional Morphemes in Pak-Pak Language
Derivational and Inflectional Morphemes in Pak-Pak Language Agustina Situmorang and Tima Mariany Arifin ABSTRACT The objectives of this study are to find out the derivational and inflectional morphemes
More informationShort Text Understanding Through Lexical-Semantic Analysis
Short Text Understanding Through Lexical-Semantic Analysis Wen Hua #1, Zhongyuan Wang 2, Haixun Wang 3, Kai Zheng #4, Xiaofang Zhou #5 School of Information, Renmin University of China, Beijing, China
More informationELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading
ELA/ELD Correlation Matrix for ELD Materials Grade 1 Reading The English Language Arts (ELA) required for the one hour of English-Language Development (ELD) Materials are listed in Appendix 9-A, Matrix
More informationAlgebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview
Algebra 1, Quarter 3, Unit 3.1 Line of Best Fit Overview Number of instructional days 6 (1 day assessment) (1 day = 45 minutes) Content to be learned Analyze scatter plots and construct the line of best
More informationProduct Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments
Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &
More informationThe Karlsruhe Institute of Technology Translation Systems for the WMT 2011
The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 Teresa Herrmann, Mohammed Mediani, Jan Niehues and Alex Waibel Karlsruhe Institute of Technology Karlsruhe, Germany firstname.lastname@kit.edu
More informationTHE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS
THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS ELIZABETH ANNE SOMERS Spring 2011 A thesis submitted in partial
More informationA heuristic framework for pivot-based bilingual dictionary induction
2013 International Conference on Culture and Computing A heuristic framework for pivot-based bilingual dictionary induction Mairidan Wushouer, Toru Ishida, Donghui Lin Department of Social Informatics,
More information*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN
From: AAAI Technical Report WS-98-08. Compilation copyright 1998, AAAI (www.aaai.org). All rights reserved. Recommender Systems: A GroupLens Perspective Joseph A. Konstan *t, John Riedl *t, AI Borchers,
More informationThe development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach
BILINGUAL LEARNERS DICTIONARIES The development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach Mark VAN MOL, Leuven, Belgium Abstract This paper reports on the
More information