Extended Similarity Test for the Evaluation of Semantic Similarity Functions
|
|
- Wilfrid West
- 6 years ago
- Views:
Transcription
1 Extended Similarity Test for the Evaluation of Semantic Similarity Functions Maciej Piasecki 1, Stanisław Szpakowicz 2,3, Bartosz Broda 1 1 Institute of Applied Informatics, Wrocław University of Technology, Poland {maciej.piasecki,bartosz.broda}@pwr.wroc.pl 2 School of Information Technology and Engineering, University of Ottawa szpak@site.uottawa.ca 3 Institute of Computer Science, Polish Academy of Sciences Abstract We propose a more demanding version of a well-known WordNet-Based Similarity Test. Our work with semantic similarity functions for Polish nouns has shown that test to be insufficiently stringent. We briefly present the background, explain the extension of WBST and report on experiments that contrast the old and new evaluation tool. 1. Introduction Many tasks in Natural Language Processing Word Sense Disambiguation, Text Entailment, Text Classification, to name just a few require a measure of semantic relatedness. Automatic acquisition of lexical semantic relations, in particular, can hardly be imagined without some form of a semantic similarity function (henceforth, SSF). A SSF maps pairs of lexical units into real numbers, and is usually normalized. A lexical unit (LU) is a word type or lexeme organized, especially in inflected languages, by the values of morphological categories such as number, gender and so on. Evaluation of the quality or effectiveness of a SSF is a non-trivial problem. Manual evaluation is barely feasible on a small scale. Not only are SSFs required to work for any pair of LUs, but also people are notoriously bad at working with real numbers. A linear ordering of dozens of LUs is nearly impossible, and even comparing two terms requires a significantly complicated setup (Rubenstein and Goodenough, 1965). Given a small sample, people can easily distinguish a bad SSF from a good one; we must distinguish good SSFs from those that are merely passable. We note three forms of SSF evaluation (Budanitsky and Hirst, 2006; Zesch and Gurevych, 2006): mathematical analysis of formal properties (for example, the property of a metric distance (Lin, 1998b)), applicationspecific evaluation and comparison with human judgement. Mathematical analysis gives few clues about the future uses of a SSF. Evaluation via an application may make it difficult to separate the effect of a SSF and other elements of the application (Zesch and Gurevych, 2006). A direct comparison to a manually created resource seems the least trouble-free. The construction of such resources, however, is labour-intensive even if it only labels LU pairs as similar (maybe just related (Zesch and Gurevych, 2006)) or not similar; this does not allow a fair assessment of the ordering of LUs on a continuous scale, as an SSF does. Indirect comparison with the existing resources (Grefenstette, 1993) is another possibility. For example, one could compare a SSF constructed automatically and another based on the semantic similarity across WordNet s (Fellbaum, 1998) hypernymy structure. In (Lin, 1998a; Weeds and Weir, 2005) two list of the k LUs most similar to the given one are transformed to rank numbers of the subsequent LUs on the lists, and compared by the cosine measure. The drawback of such an evaluation is that we know how close the two similarity functions are, but not how people perceive a SSF. Automatic differentiation between words synonymous and not synonymous with a LU is a natural application for a SSF. In Latent Semantic Analysis (LSA) (Landauer and Dumais, 1997) the SSF constructed on the basis of a statistical analysis of a corpus was used to make decisions in a synonymy test, a component of the Test of English as a Foreign Language (TOEFL); this gave 64.4% of hits. (Turney, 2001) reported 73.75% hits, and (Turney et al., 2003) 97.5% hits; this practically solved the TOFEL synonymy problem. Next, (Freitag et al., 2005) proposed a WordNet-Based Synonymy Test (WBST), in which Word- Net is used to generate a large set of questions identical in format to those in the TOEFL. Section 2. discusses WBST. The best reported result for nouns is 75.8% (Freitag et al., 2005). A slightly modified WBST was used to evaluate a SSF for Polish nouns (Piasecki et al., 2006) with the result of 86.09%. The evaluation of a SSF via a synonymy test shows the ability of the SSF to distinguish synonyms from nonsynonyms. Since the SSF is the centrepiece of the application, the achieved results can be directly attributed to it. There was, however, a problem: WBST appeared to be too easy, as it is shown in Section 2., so it no longer was a useful tool in the assessment of more sophisticated SSFs for Polish nouns. In view of these findings, we have set out to design a more demanding automatic method of SSF assessment. We want its results to be easily interpreted by people and its feasibility tested on people. We also expect that it will pick the SSF that is a better tool for the recognition of lexical semantic relations between LUs. 2. WBST and the Similarity among Polish Nouns The application of LSA to the TOEFL became unattractive as a method of comparing SSFs once the result of 97.5% hits has been achieved (Turney et al., 2003).
2 (Freitag et al., 2005) proposed a new test, WBST. It was seen as more difficult because it contained many more questions. An instance of the test is built thus: first, pair a LU q included in a wordnet (WordNet 2.0 in (Freitag et al., 2005)) with a randomly chosen synonym s; next, randomly draw from the wordnet three other LUs not in q s synset (detractors) to complete an answer set A in the question-answer (QA) pair q, A. During evaluation, SSF generates values for the pairs q, a i, a i A, that are expected to favour s. The WBST has been, amongst other applications, used to evaluate SSFs for Polish nouns. The underlying resource was Polish WordNet (plwn) (Derwojedowa et al., 2007a; Derwojedowa et al., 2007b) a lexical database now under construction (Piasecki et al., 2006). The test was slightly modified. In plwn, many synsets have only 1-3 LUs, in accordance with the definition of the semantic relations (Derwojedowa et al., 2007b). In order to get a better coverage of LUs by WBST questions, and not to leave LUs in singleton synsets untested, the direct hypernyms of LUs from singleton synsets were taken to form QA pairs in (Piasecki et al., 2006). We named this modification the WBST with Hypernyms (WBST+H). The inclusion of hypernyms in QA pairs did not make the test easier, as was shown in (Piasecki et al., 2006). In (Piasecki et al., 2006) a SSF was based on adjectival modification of nouns and on noun coordination; we also ran preliminary experiments with describing a noun via its association with verbs. This work lacks the in-depth analysis of all possible lexico-syntactic markers of Polish noun meaning. After the re-implementation of the approach of (Piasecki et al., 2006) and the addition of several lexicosyntactic features (such as modification by an adjectival participle see Section 4.), the result exceeded 90%. We could, however, observe little difference in the influence of the subsequent types of features. This was contrary to our expectations. In the repeated WBST+H tests with several raters, we had the result close to 100% (markedly more than 89.29% reported in (Piasecki et al., 2006)). This may have happened because the tests were generated on the basis of a more recent, improved version of plwn. It is imperative that we construct a more difficult WBST-style test to facilitate further work on SSFs for Polish nouns. 3. Enhanced WBST The WBST defined in (Freitag et al., 2005) stipulates that the elements of the answer set A not synonymous with q are chosen completely at random from the whole wordnet. This means that the difference in meaning between q and the detractors is in most cases obvious to test-takers. It also tends to be obvious to a good SSF. Our overall goal, however, is to construct automatically synsets of highly similar LUs, and to differentiate the LUs in a synset from all other LUs that are similar but not synonymous, among them co-hyponyms (Derwojedowa et al., 2007b). Any SSF must therefore distinguish closely related LUs, not only those with very different meaning. We need to construct the answer set A so that nonsynonyms are closer in meaning to the correct answer q than it is the case in WBST+H. Obviously, they cannot be synonyms of either s nor q, but they ought to be related to both. We need to select the non-synonyms among LUs similar to s and to q. In order to achieve this, we have decided to leverage the structure of the wordnet in the determination of similarity. During the generation of the modified test, named Enhanced WBST (EWBST), non-synonyms are still selected randomly but only from the set of LUs broadly similar to q and s. The acceptable values of SSF W N (Q, x) are lower than a threshold sim t ; the synset Q contains q and s, and x is a detractor. We tested several wordnet-based similarity functions (Agirre and Edmonds, 2006), here implemented on the basis of plwn s hypernymy structure, achieving the best result in a generated test with the following function: SSF W N = p min (1) 2d p min is the length of a minimal path between two LUs in plwn, and d = 9 is a maximal depth of the hypernymy hierarchy in the current version of plwn. The similarity threshold sim t = 2 for this function has been established experimentally. The hypernymy structure of nouns in plwn (as of May 7, 2007) does not have a single root. Many methods of similarity computation require a root, so we have introduced one artificially and linked to it all trees in the hypernymy forest. We noticed that the random selection of LU detractors on the basis of any similarity measure tends to favour LUs in the hypernymy subtrees other than q, if q is located near the root. The number of LUs linked by a short path across the root is much higher than the number of LUs from the subtree of q which are located at a close distance to q. The problem is especially visible for question LUs in small hypernymy subtrees with a limited number of hyponyms. As the problem appears in the case of any similarity measure based on the path length, we have heuristically modified the measure by adding the value 3 to any path going across the artificial root. The lower values gave no visible changes, while the higher numbers caused a large reduction of the number of QA pairs. To illustrate the difference in the level of difficulty between WBST+H and EWBST, we show an example problem generated by this method for the same QA pair admistracja (administration), zarza d(board). The EWBST built the following test: Q: admistracja (administration) A: urza d (office, department), fundacja (charity, endowment), zarza d (board, management), ministerstwo (ministry). And the test generated by WBST+H: Q: admistracja (administration) A: poddasze (attic), repatriacja (repatriation), zarza d (board, management), zwolennik (follower, zealot). An example EWBST test was given to 32 native speakers of Polish, all of them Computer Science students. (This bias in the group of raters should not influence the results, because the test was composed on the basis of plwn which at present includes only general Polish vocabulary.) The test consisted of 99 QA pairs. All LUs in
3 the test were selected from 5706 single word noun LUs included in plwn. In the set of question LUs, there were 42 LUs occurring more 1000 times in the IPI PAN corpus (Przepiórkowski, 2004). This subset was distinguished in the test, because such LUs are also the basis of the comparison with the results achieved in (Freitag et al., 2005) and (Piasecki et al., 2006). For all QA pairs the result was 70%, with the minimum 61.62%, maximum 78.79% and the standard deviation from the mean σ = 4.07%. For the subset consisting of frequent LUs, the average result was 63.24% with the minimum 52.38%, maximum 73.81% and σ = 5.37%. The results, as expected, are much lower than those achieved in WBST+H tests. We were surprised that the results the raters had for the frequent LUs were significantly lower than for all LUs. It is likely that more frequent LUs are at same time more polysemous, and that makes them more difficult to distinguish from other similar LUs. The results for frequent LUs are lower, but at the level similar to the results for all LUs. A situation like this was also observed in the application of the EWBST to SSFs discussed in Section Similarity Functions for Polish Nouns (Piasecki et al., 2006) proposed a SSF based on the frequency of modification of a noun by specific adjectives and on the frequency of coordination with specific nouns. Features based on verbs were also tested. Following this approach, we have defined a set of noun-meaning markers (italicised on the list below) identifiable via shallow morpho-syntactic processing 1 : modification by a specific adjective, from (Piasecki et al., 2006) (written A in Table 1), modification by a specific adjective, from (Piasecki et al., 2006) (Part in Table 1), co-ordination with a a specific noun, from (Piasecki et al., 2006) (Nc), occurrence of a verb for which a given noun in a specific case can be an argument (V(case)), modification by a specific noun in genitive (NMg), occurrence of a specific preposition with which a given noun in a specific case forms a prepositional phrase (Prep(case)). An N C matrix M is created from the IPI PAN corpus documents tagged by the TaKIPI tagger (Piasecki, 2006). C is the number of lexico-syntactic features used, N the number of nouns, M[n, c] the number of occurrences of the n-th noun with the c-th feature (i.e., the constraint was satisfied). All features are based on the occurrences of certain lexical markers in the context (a sentence defined by TaKIPI), which satisfy certain morpho-syntactic constraints such as for example the presence of some syntactic configuration between the noun and the marker. The constraints are expressed in the 1 Full parsing, unfortunately, was not an option. JOSKIPI language included in TaKIPI. One such constraint is shown below (partially, enough to illustrate the variety of morpho-syntactic phenomena that we can test). or( and( llook(-1,begin,$c, and(equal(pos[$c],{conj}), inter(base[$c],{"ani","albo", "czy","i","lub","oraz"}))), only($c,-1,$oa,in(pos[$oa],{conj,adj, pact,ppas,num,numcol, adv,qub,pcon,pant})), llook($-1c,begin,$s, and(in(pos[$s],subst,ger,depr), inter(base[$s],"variable-n"))), inter(cas[$s],cas[0]), only($s,$c,$ob,in(pos[$ob],{conj, adj,pact,ppas,num,numcol,adv, qub,pcon,pant,subst,ger,depr})) ), and(... analogically to the right) ) In this expression, the first operator llook looks for a conjunction to the left of the centre of the context the position of a given noun. The operator only test the units between that conjunction and the centre; the allowed types are conjunction, adjective, adjectival participle,numeral, adverb, etc. Next, we look for the potentially coordinated noun, defined in each instance of the constraint (variable- N) for a column of the matrix, and we test case agreement inter. Next, we consider words to the right of the conjunction, using a symmetrically inverted constraint. The similarity between nouns is calculated on the basis of matrix rows according to the method proposed in (Piasecki et al., 2006). The central element of the method is the Rank Weight Function: 1. Weighted values of the cells are recalculated using a weight function f w : c M[n i, c] = f w (M[n i, c]). 2. Features in a row vector M[n i, ] are sorted in the ascending order on the weighted values. 3. The k highest-ranking features are selected; e.g. k = For each selected feature c j : M[n i, c j ] = k rank(c j ) As the weight function, we applied the t-score test tscore(n, c) (Manning and Schütze, 2001): tscore(n, c) = M[n,c] T FnT Fc W T FnT Fc W T F n, T F c are the total frequencies of noun words / constraints satisfied, and W is the number of words processed. Additionally a threshold for the number of features common to both nouns mcom > 1% serves as a constraint for the similarity to be positive (otherwise it is set to 0).
4 5. Experiments In order to test the influence of the constraints, for each constraint we created a separate matrix on the basis of about 254 million words from the IPI PAN corpus. The SSFs calculated from the matrices by the rank method were next tested using WBST+H and EWBST, both generated according to the present state of plwn. Most of the tests were limited to nouns in plwn that occur more than 1000 times in the corpus (6105 nouns) the threshold used in (Freitag et al., 2005; Piasecki et al., 2006). We generated 3025 QA pairs for the frequent nouns. In EWBST tests run for all nouns of plwn, the results were similar to those for the frequent nouns only. It is in contrast with the WBST+H test in which there is a big difference between the accuracy of the frequent and infrequent nouns. For example, in (Piasecki and Broda, 2007) the best result for the frequent nouns is 81.15%, while for all nouns only 64.03%. WBST+H is easier for the frequent nouns well described by the frequent occurences of features, because it is easier to distinguish them from completely randomly selected nouns. In EWBST all nouns are compared with those similar to them. The result for infrequent nouns, not so well described, stays approximately the same, but the result for the frequent nouns is worse, as the task becomes harder. From the point of view of the automatic construction of synsets, this behaviour of EWBST is advantageous: we perform only one test and yet we get a good description of the whole SSF. Features W E E A A 88.65% 51.51% 50,97% Part 78.79% 43.86% 37,94% NMg 72.43% 44.56% 41,16% Nc 76.85% 47.01% 44,70% Prep(acc) 35.14% 22.20% 20,39% Prep(all) 50.21% 30.00% 28,33% V(acc) 75.36% 41.78% 40,17% V(dat) 48.64% 30.04% 26,25% V(all) 75.94% 42.04% 40.12% A+NMg 86.66% 52.20% 52.75% A+NMg+Prep(all) 86.74% 51.20% 52.24% A+NMg+Prep(all)+Part 87.40% 52.27% 52.62% A+NMg+Part 87.29% 52.86% 53.31% A+Nc+Part 90.92% 53.32% 52.55% A+Nc+NMg 88.65% 53.52% 54.25% A+Nc+NMg+Part 88.57% 53.13% 54.25% Table 1: Experiments with SSFs based on different constraints. Constraint names are defined in Section 4. W means WBST applied to frequent nouns (> 1000 occurrences). E means EWBST for frequent and E A for all nouns. The rank method can select an appropriate set of features for a tested noun, so the individual results of the subsequent matrices do not influence directly the results of the joint matrix. This can be seen in Table 1. For example, the individual result of the constraint based on modification by participles is only 43.86%, but when we add the Part matrix to the Adj+NMg matrix, with a higher accuracy, the result goes up to 52.86%. The results of the best SSFs are different in WBST+H and in EWBST. The best combination in WBST+H, namely A+Nc+Part, expresses a relatively low result in EWBST. The coordination with other nouns is a good factor to identify large semantic fields of related nouns, and in the WBST+H test it helps distinguish between a given noun and a completely unrelated noun. This is why the result is very good. In EWBST the situation is different. We test the ability to distinguish among semantically related nouns. The modification by a noun in genitive is a medium-quality feature when taken alone (only 44.56% in EWBST), probably because this modification is polysemous (and vague as well); the constraint also overgenerates it often signals inexistent assotiations. There are no morpho-syntactic constraints for this type of modification. No agreement is required, and without full parsing we can only rely on a very vague syntactic requirement of adjacency of the two words, like this: and( rlook(1,5,$a, and(in(pos[$a],{subst,ger,depr}), equal(cas[$a],{gen}), inter(base[$a],{"variable-n"})) ), only(1,$a,$ad, or(in(pos[$ad],{adv,qub,pcon, pant})), and( in(pos[$ad],{subst,ger, depr})), equal(cas[$ad],{gen})), and(in(pos[$ad],{adj,pact, ppas,num,numcol}), agrpp(0,$ad,{nmb,gnd,cas},3)) ))) agrpp is the operator of morpho-syntactic agreement on the selected attributes. In spite of the errors introduced by the constraint, the feature NMg delivers the additional source of properties expressed by the meaning of a noun, and a combination of the NMgen matrix with other matrices of properties, Adj+Nc+NMg+Part, results in the best score obtained in the EWBST test. It could not be observed with the former test, WBST+H. 6. Conclusions We have proposed an extension of the WordNet-Based Similarity Test, which appears to be more discerning. Our research goal is the application of SSFs in the automatic construction of wordnet synsets. The operation of the proposed EWBST brings us closer to that goal. The EWBST allows us to observe the ability of a SSF to make finegrained distinctions between semantically related Lexical Units. Its results can be easily interpreted. The test can be generated on a large scale, depending only on the size of the underlying wordnet. The EWBST is challenging for people and significantly difficult for SSFs. It leaves more
5 room for improvement, behaves in the same way for frequent and infrequent nouns. The drawback of EWBST is its dependency on the existence of a semantic similarity measure generated on the basis of manually created data (in its present version on the existence of a hypernymy hierarchy). In many, if not all, wordnets, the hypernymy hierarchy is rich only for nouns. The EWBST can work well for verbs or adjectives, but first a different similarity function should be proposed for the generation of the answer sets in tests. Acknowledgement. Work financed by the Polish Ministry of Education and Science, project No. 3 T11C References Agirre, Eneko and Philip Edmonds (eds.), Word Sense Disambiguation: Algorithms and Applications. Text, Speech and Language Technology. Springer. Budanitsky, Alexander and Graeme Hirst, Evaluating wordnet-based measures of semantic distance. Computational Linguistics, 32(1): Derwojedowa, Magdalena, Maciej Piasecki, Stanisław Szpakowicz, and Magdalena Zawisławska, 2007a. plwordnet the polish wordnet. Online access to the database of plwordnet: wroc.pl. Derwojedowa, Magdalena, Maciej Piasecki, Stanisław Szpakowicz, and Magdalena Zawisławska, 2007b. Polish WordNet on a shoestring. In Proceedings of Biannual Conference of the Society for Computational Linguistics and Language Technology, Tübingen, April Universität Tübingen. Fellbaum, Christiane (ed.), WordNet An Electronic Lexical Database. The MIT Press. Freitag, Dayne, Matthias Blume, John Byrnes, Edmond Chow, Sadik Kapadia, Richard Rohwer, and Zhiqiang Wang, New experiments in distributional representations of synonymy. In Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005). Ann Arbor, Michigan: Association for Computational Linguistics. Grefenstette, G., Evaluation techniques for automatic semantic extraction: Comparing syntactic and window based approaches. In Proceedings of The Workshop on Acquisition of Lexical Knowledge from Text, Columbus, SIGLEX 93. ACL. Landauer, T. and S. Dumais, A solution to Plato s problem: The latent semantic analysis theory of acquisition. Psychological Review, 104(2): Lin, Dekang, 1998a. Automatic retrieval and clustering of similar words. In COLING ACL. Lin, Dekang, 1998b. An information-theoretic definition of similarity. In Proceedings of 15th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA. Manning, Christopher D. and Hinrich Schütze, Foundations of Statistical Natural Language Processing. The MIT Press. Piasecki, Maciej, Handmade and automatic rules for Polish tagger. Lecture Notes in Artificial Intelligence. Springer. Piasecki, Maciej and Bartosz Broda, Semantic similarity measure of Polish nouns based on linguistic features. In Witold Abramowicz (ed.), Business Information Systems 10th International Conference, BIS 2007, Poznan, Poland, April 25-27, 2007, Proceedings, volume 4439 of Lecture Notes in Computer Science. Springer. Piasecki, Maciej, Stanisław Szpakowicz, and Bartosz Broda, Automatic selection of heterogeneous syntactic features in semantic similarity of polish nouns. In Proceedings of the Text, Speech and Dialog 2007 Conference. Przepiórkowski, Adam, The IPI PAN Corpus Preliminary Version. Institute of Computer Science PAS. Rubenstein, H. and J. B. Goodenough, Contextual correlates of synonymy. Communication of the ACM, 8(10): Turney, P.T., Mining the web for synonyms: Pmiir versus lsa on toefl. In Proceedings of the Twelfth European Conference on Machine Learning. Berlin: Springer-Verlag. Turney, P.T., M.L. Littman, J. Bigham, and V. Shnayder, Combining independent modules to solve multiple-choice synonym and analogy problems. In Proceedings International Conference on Recent Advances in Natural Language Processing (RANLP-03). Borovets, Bulgaria. Weeds, Julie and David Weir, Co-occurrence retrieval: A flexible framework for lexical distributional similarity. Computational Linguistics, 31(4): Zesch, Torsten and Iryna Gurevych, Automatically creating datasets for measures of semantic relatedness. In Proceedings of the Workshop on Linguistic Distances. Sydney, Australia: Association for Computational Linguistics.
Probabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationVocabulary Usage and Intelligibility in Learner Language
Vocabulary Usage and Intelligibility in Learner Language Emi Izumi, 1 Kiyotaka Uchimoto 1 and Hitoshi Isahara 1 1. Introduction In verbal communication, the primary purpose of which is to convey and understand
More informationLeveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word Similarity Balamurali A.R., Subhabrata Mukherjee, Akshat Malu and Pushpak Bhattacharyya Dept. of Computer Science and Engineering, IIT Bombay 6th International Global
More informationA Semantic Similarity Measure Based on Lexico-Syntactic Patterns
A Semantic Similarity Measure Based on Lexico-Syntactic Patterns Alexander Panchenko, Olga Morozova and Hubert Naets Center for Natural Language Processing (CENTAL) Université catholique de Louvain Belgium
More informationOn document relevance and lexical cohesion between query terms
Information Processing and Management 42 (2006) 1230 1247 www.elsevier.com/locate/infoproman On document relevance and lexical cohesion between query terms Olga Vechtomova a, *, Murat Karamuftuoglu b,
More informationIntra-talker Variation: Audience Design Factors Affecting Lexical Selections
Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and
More informationA Comparison of Two Text Representations for Sentiment Analysis
010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational
More informationChunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.
NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and
More informationAQUA: An Ontology-Driven Question Answering System
AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationRecognition of Structured Collocations in An Inflective Language
Proceedings of the International Multiconference on Computer Science and Information Technology pp. 237 246 ISSN 1896-7094 c 2007PIPS Recognition of Structured Collocations in An Inflective Language Bartosz
More informationLearning Methods for Fuzzy Systems
Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationMultilingual Sentiment and Subjectivity Analysis
Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department
More informationAutomatic Extraction of Semantic Relations by Using Web Statistical Information
Automatic Extraction of Semantic Relations by Using Web Statistical Information Valeria Borzì, Simone Faro,, Arianna Pavone Dipartimento di Matematica e Informatica, Università di Catania Viale Andrea
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationMatching Similarity for Keyword-Based Clustering
Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web
More informationData Integration through Clustering and Finding Statistical Relations - Validation of Approach
Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Marek Jaszuk, Teresa Mroczek, and Barbara Fryc University of Information Technology and Management, ul. Sucharskiego
More informationA Bayesian Learning Approach to Concept-Based Document Classification
Databases and Information Systems Group (AG5) Max-Planck-Institute for Computer Science Saarbrücken, Germany A Bayesian Learning Approach to Concept-Based Document Classification by Georgiana Ifrim Supervisors
More informationUniversity of Groningen. Systemen, planning, netwerken Bosman, Aart
University of Groningen Systemen, planning, netwerken Bosman, Aart IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document
More informationarxiv: v1 [cs.cl] 2 Apr 2017
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationA Domain Ontology Development Environment Using a MRD and Text Corpus
A Domain Ontology Development Environment Using a MRD and Text Corpus Naomi Nakaya 1 and Masaki Kurematsu 2 and Takahira Yamaguchi 1 1 Faculty of Information, Shizuoka University 3-5-1 Johoku Hamamatsu
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationAn Interactive Intelligent Language Tutor Over The Internet
An Interactive Intelligent Language Tutor Over The Internet Trude Heift Linguistics Department and Language Learning Centre Simon Fraser University, B.C. Canada V5A1S6 E-mail: heift@sfu.ca Abstract: This
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationRobust Sense-Based Sentiment Classification
Robust Sense-Based Sentiment Classification Balamurali A R 1 Aditya Joshi 2 Pushpak Bhattacharyya 2 1 IITB-Monash Research Academy, IIT Bombay 2 Dept. of Computer Science and Engineering, IIT Bombay Mumbai,
More informationThe Internet as a Normative Corpus: Grammar Checking with a Search Engine
The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a
More informationCombining a Chinese Thesaurus with a Chinese Dictionary
Combining a Chinese Thesaurus with a Chinese Dictionary Ji Donghong Kent Ridge Digital Labs 21 Heng Mui Keng Terrace Singapore, 119613 dhji @krdl.org.sg Gong Junping Department of Computer Science Ohio
More informationCROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2
1 CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 Peter A. Chew, Brett W. Bader, Ahmed Abdelali Proceedings of the 13 th SIGKDD, 2007 Tiago Luís Outline 2 Cross-Language IR (CLIR) Latent Semantic Analysis
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationParsing of part-of-speech tagged Assamese Texts
IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal
More informationEnsemble Technique Utilization for Indonesian Dependency Parser
Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id
More informationImproved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form
Orthographic Form 1 Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form The development and testing of word-retrieval treatments for aphasia has generally focused
More informationWord Sense Disambiguation
Word Sense Disambiguation D. De Cao R. Basili Corso di Web Mining e Retrieval a.a. 2008-9 May 21, 2009 Excerpt of the R. Mihalcea and T. Pedersen AAAI 2005 Tutorial, at: http://www.d.umn.edu/ tpederse/tutorials/advances-in-wsd-aaai-2005.ppt
More informationReinforcement Learning by Comparing Immediate Reward
Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationMulti-Lingual Text Leveling
Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency
More informationAssessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2
Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2 Ted Pedersen Department of Computer Science University of Minnesota Duluth, MN, 55812 USA tpederse@d.umn.edu
More informationConstructing Parallel Corpus from Movie Subtitles
Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing
More informationShort Text Understanding Through Lexical-Semantic Analysis
Short Text Understanding Through Lexical-Semantic Analysis Wen Hua #1, Zhongyuan Wang 2, Haixun Wang 3, Kai Zheng #4, Xiaofang Zhou #5 School of Information, Renmin University of China, Beijing, China
More informationAxiom 2013 Team Description Paper
Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationEvidence for Reliability, Validity and Learning Effectiveness
PEARSON EDUCATION Evidence for Reliability, Validity and Learning Effectiveness Introduction Pearson Knowledge Technologies has conducted a large number and wide variety of reliability and validity studies
More informationTextGraphs: Graph-based algorithms for Natural Language Processing
HLT-NAACL 06 TextGraphs: Graph-based algorithms for Natural Language Processing Proceedings of the Workshop Production and Manufacturing by Omnipress Inc. 2600 Anderson Street Madison, WI 53704 c 2006
More informationTHE VERB ARGUMENT BROWSER
THE VERB ARGUMENT BROWSER Bálint Sass sass.balint@itk.ppke.hu Péter Pázmány Catholic University, Budapest, Hungary 11 th International Conference on Text, Speech and Dialog 8-12 September 2008, Brno PREVIEW
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationIntroduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.
to as a linguistic theory to to a member of the family of linguistic frameworks that are called generative grammars a grammar which is formalized to a high degree and thus makes exact predictions about
More informationCS 598 Natural Language Processing
CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@
More informationRule discovery in Web-based educational systems using Grammar-Based Genetic Programming
Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de
More informationProcedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing
Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 141 ( 2014 ) 124 128 WCLTA 2013 Using Corpus Linguistics in the Development of Writing Blanka Frydrychova
More informationEdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,
More informationUniversiteit Leiden ICT in Business
Universiteit Leiden ICT in Business Ranking of Multi-Word Terms Name: Ricardo R.M. Blikman Student-no: s1184164 Internal report number: 2012-11 Date: 07/03/2013 1st supervisor: Prof. Dr. J.N. Kok 2nd supervisor:
More informationMaximizing Learning Through Course Alignment and Experience with Different Types of Knowledge
Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February
More informationTarget Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data
Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationNotes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1
Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial
More informationOn-the-Fly Customization of Automated Essay Scoring
Research Report On-the-Fly Customization of Automated Essay Scoring Yigal Attali Research & Development December 2007 RR-07-42 On-the-Fly Customization of Automated Essay Scoring Yigal Attali ETS, Princeton,
More information*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN
From: AAAI Technical Report WS-98-08. Compilation copyright 1998, AAAI (www.aaai.org). All rights reserved. Recommender Systems: A GroupLens Perspective Joseph A. Konstan *t, John Riedl *t, AI Borchers,
More informationMeasuring the relative compositionality of verb-noun (V-N) collocations by integrating features
Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features Sriram Venkatapathy Language Technologies Research Centre, International Institute of Information Technology
More informationTwitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationRule Learning with Negation: Issues Regarding Effectiveness
Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX
More informationChapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard
Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA Alta de Waal, Jacobus Venter and Etienne Barnard Abstract Most actionable evidence is identified during the analysis phase of digital forensic investigations.
More informationAccuracy (%) # features
Question Terminology and Representation for Question Type Classication Noriko Tomuro DePaul University School of Computer Science, Telecommunications and Information Systems 243 S. Wabash Ave. Chicago,
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationMachine Learning from Garden Path Sentences: The Application of Computational Linguistics
Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,
More informationEvolution of Symbolisation in Chimpanzees and Neural Nets
Evolution of Symbolisation in Chimpanzees and Neural Nets Angelo Cangelosi Centre for Neural and Adaptive Systems University of Plymouth (UK) a.cangelosi@plymouth.ac.uk Introduction Animal communication
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationEvaluation of Usage Patterns for Web-based Educational Systems using Web Mining
Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl
More informationEvaluation of Usage Patterns for Web-based Educational Systems using Web Mining
Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl
More informationOutline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt
Outline Using Web Data for Linguistic Purposes NCLT, Dublin City University Outline Outline 1 Corpora as linguistic tools 2 Limitations of web data Strategies to enhance web data 3 Corpora as linguistic
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationLANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume 11 : 12 December 2011 ISSN
LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume ISSN 1930-2940 Managing Editor: M. S. Thirumalai, Ph.D. Editors: B. Mallikarjun, Ph.D. Sam Mohanlal, Ph.D. B. A. Sharada, Ph.D.
More informationOnline Updating of Word Representations for Part-of-Speech Tagging
Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org
More informationUnsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model
Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.
More informationLanguage Independent Passage Retrieval for Question Answering
Language Independent Passage Retrieval for Question Answering José Manuel Gómez-Soriano 1, Manuel Montes-y-Gómez 2, Emilio Sanchis-Arnal 1, Luis Villaseñor-Pineda 2, Paolo Rosso 1 1 Polytechnic University
More informationProgram Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading
Program Requirements Competency 1: Foundations of Instruction 60 In-service Hours Teachers will develop substantive understanding of six components of reading as a process: comprehension, oral language,
More informationThe Smart/Empire TIPSTER IR System
The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of
More informationThe Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh
The Effect of Discourse Markers on the Speaking Production of EFL Students Iman Moradimanesh Abstract The research aimed at investigating the relationship between discourse markers (DMs) and a special
More informationWriting a composition
A good composition has three elements: Writing a composition an introduction: A topic sentence which contains the main idea of the paragraph. a body : Supporting sentences that develop the main idea. a
More informationExploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data
Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer
More informationUsing Web Searches on Important Words to Create Background Sets for LSI Classification
Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract
More informationLearning From the Past with Experiment Databases
Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University
More informationProblems of the Arabic OCR: New Attitudes
Problems of the Arabic OCR: New Attitudes Prof. O.Redkin, Dr. O.Bernikova Department of Asian and African Studies, St. Petersburg State University, St Petersburg, Russia Abstract - This paper reviews existing
More informationCross-Lingual Text Categorization
Cross-Lingual Text Categorization Nuria Bel 1, Cornelis H.A. Koster 2, and Marta Villegas 1 1 Grup d Investigació en Lingüística Computacional Universitat de Barcelona, 028 - Barcelona, Spain. {nuria,tona}@gilc.ub.es
More informationMultilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities
Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities Soto Montalvo GAVAB Group URJC Raquel Martínez NLP&IR Group UNED Arantza Casillas Dpt. EE UPV-EHU Víctor Fresno GAVAB
More informationBeyond the Pipeline: Discrete Optimization in NLP
Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We
More informationMemory-based grammatical error correction
Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,
More informationDetecting English-French Cognates Using Orthographic Edit Distance
Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National
More informationAn Introduction to the Minimalist Program
An Introduction to the Minimalist Program Luke Smith University of Arizona Summer 2016 Some findings of traditional syntax Human languages vary greatly, but digging deeper, they all have distinct commonalities:
More informationProduct Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments
Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &
More informationDefragmenting Textual Data by Leveraging the Syntactic Structure of the English Language
Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Nathaniel Hayes Department of Computer Science Simpson College 701 N. C. St. Indianola, IA, 50125 nate.hayes@my.simpson.edu
More informationDerivational and Inflectional Morphemes in Pak-Pak Language
Derivational and Inflectional Morphemes in Pak-Pak Language Agustina Situmorang and Tima Mariany Arifin ABSTRACT The objectives of this study are to find out the derivational and inflectional morphemes
More informationTHE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING
SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,
More information1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature
1 st Grade Curriculum Map Common Core Standards Language Arts 2013 2014 1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature Key Ideas and Details
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationAdvanced Grammar in Use
Advanced Grammar in Use A self-study reference and practice book for advanced learners of English Third Edition with answers and CD-ROM cambridge university press cambridge, new york, melbourne, madrid,
More information