Statistical Methods. Allen s Chapter 7 J&M s Chapters 8 and 12

Size: px
Start display at page:

Download "Statistical Methods. Allen s Chapter 7 J&M s Chapters 8 and 12"

Transcription

1 Statistical Methods Allen s Chapter 7 J&M s Chapters 8 and 12 1

2 Statistical Methods Large data sets (Corpora) of natural languages allow using statistical methods that were not possible before Brown Corpus includes about words with POS Penn Treebank contains full syntactic annotations 2

3 Part of Speech Tagging Determining the most likely category of each word in a sentence with ambiguous words Example: finding POS of words that can be either nouns and/or verbs Need two random variables: 1. C that ranges over POS {N, V} 2. W that ranges over all possible words 3

4 Part of Speech Tagging (Cont.) Example: W = flies Problem: which one is greater? P(C=N W = Flies) or P(C=V W = flies) P(N flies) or P(V flies) P(N flies) = P(N & flies) / P(flies) P(V flies) = P(V & flies) / P(flies) So P(N & flies) or P(V & flies) 4

5 Part of Speech Tagging (Cont.) We don t have true probabilities We can estimate using large data sets Suppose: There is a Corpus with words There is 1000 uses of flies: 400 with an noun sense, and 600 with a verb sense P(flies) = 1000 / = P(flies & N) = 400 / = P(flies & V) = 600 / = P(V flies) = P(V & flies) / P(flies) = / = So in %60 occasions flies is a verb 5

6 Estimating Probabilities We want to use probability to predict the future events Using the information P(V flies) = to predict that the next flies is more likely to be a verb This is called Maximum Likelihood estimation (MLE) Generally the larger the data set we use, the more accuracy we get 6

7 Estimating Probabilities (Cont.) Estimating the outcome probability of tossing a coin (i.e., 0.5) Acceptable margin of error : ( ) The more tests performed, the more accurate estimation 2 trials: %50 chance of reaching acceptable result 3 trials: %75 chance 4 trials: %87.5 chance 8 trials: %93 chance 12trials: %95 chance 7

8 Estimating tossing a coin outcome 8

9 Estimating Probabilities (Cont.) So the larger data set the better, but The problem of sparse data Brown Corpus contains about a million words but there is only different words, so one expect each word occurs about 20 times, But over words occur less than 5 times. 9

10 Estimating Probabilities (Cont.) For a random variable X with a set of values V i, computed from counting number times X = x i P(X = x i ) V i / i V i Maximum Likelihood Estimation (MLE) uses V i = x i Expected likelihood Estimation (ELE) Uses V i = x i

11 MLE vs ELE Suppose a word w doesn t occur in the Corpus We want to estimate w occurring in one of 40 classes L1 L40 We have a random variable X, X = xi only if w appears in word category Li By MLE, P(Li w) is undefined because the divisor is zero ELE, P(Li w) 0.5 / 20 = Suppose w occurs 5 times (4 times as an noun and once as a verb) By MLE, P(N w) = 4/5 = 0.8, By ELE, P(N w) =4.5/25 =

12 Evaluation Data set is divided into: Training set (%80-%90 of the data) Test set (%10-%20) Cross-Validation: Repeatedly removing different parts of corpus as the test set, Training on the reminder of the corpus, Then evaluating the new test set. 12

13 Part of speech tagging Simplest Algorithm: choose the interpretation that occurs most frequently flies in the sample corpus was %60 a verb This algorithm success rate is %90 Over %50 of words appearing in most corpora are unambiguous To improve the success rate, Use the tags before or after the word under examination If flies is preceded by the word the it is definitely a noun 13

14 Part of speech tagging (Cont.) Bayes rule: P(A B) = P(A) * P(B A) / P(B) There is a sequence of words w 1 w t, and Find a sequence of lexical categories C 1 C t, such that 1. P(C 1 C t w 1 w t ) is maximized Using the Bayes rule: 2. P(C 1 C t ) * P(w 1 w t C 1 C t ) / P(w 1 w t ) The problem is reduced to finding C 1 C t, such that 3. P(C 1 C t ) * P(w 1 w t C 1 C t ) is maximized The probabilities can be estimated by some independence assumptions 14

15 Part of speech tagging (Cont.) Using the information about The previous word category: bigram Or two previous word categories: trigram Or n-1 previous word categories: n-gram Using the bigram model P(C 1 C t ) i=1,t P(C i C i-1 ) P(Art N V N) = P(Art, ) * P(N ART) * P( V N) * P(N V) P(w 1 w t C 1 C t ) i=1,t P(w i C i ) We are looking for a sequence C 1 C t such that i=1,t P(C i C i-1 ) * P(w i C i ) is maximized 15

16 Part of speech tagging (Cont.) The information needed by this new formula can be extracted from the corpus P(C i = V C i-1 = N) = Count( N at position i-1 & V at position i) / Count (N at position i-1) (Fig. 7-4) P( the Art) = Count(# times the is an Art) / Count(# times an Art occurs) (Fig. 7-6) 16

17 Using an Artificial corpus An artificial corpus generated with 300 sentences of categories Art, N, V, P 1998 words, 833 nouns, 300 verbs, 558 article, and 307 propositions, To deal with the problem of the problem of the sparse data, a minimum probability of is assumed 17

18 18 Bigram probabilities from the generated corpus

19 Word counts in the generated corpus N V ART P TOTAL flies fruit like a the flower flowers birds others TOTAL

20 Lexical-generation probabilities 20 (Fig. 7-6) PROB (the ART).54 PROB (a ART).360 PROB (flies N).025 PROB (a N).001 PROB (flies V).076 PROB (flower N) 063 PROB (like V).1 PROB (flower V).05 PROB (like P).068 PROB (birds N).076 PROB (like N).012

21 21 Part of speech tagging (Cont.) How to find the sequence C 1 C t that maximizes i=1,t P(C i C i-1 ) * P(w i C i ) Brute Force search: Finding all possible sequences With N categories and T words, there are N T sequences Using bigram probabilities, the probability w i to be in category C i depends only on C i-1 The process can be modeled by a special form of probabilistic finite state machine (Fig. 7-7)

22 22 Markov Chain Probability of a sequence of 4 words being in cats: ART N V N 0.71 * 1 * 0.43 * 0.35 = The representation is accurate only if the probability of a category occurring depends only the one category before it. This called the Markov assumption The network is called Markov chain

23 Hidden Markov Model (HMM) Markov network can be extended to include the lexical-generation probabilities, too. Each node could have an output probability for its every possible corresponding output The output probabilities are exactly the lexicalgeneration probabilities shown in fig 7-6 Markov network with output probabilities is called Hidden Markov Model (HMM) 23

24 24 Hidden Markov Model (HMM) The word hidden indicates that for a specific sequence of words, it is not clear what state the Markov model is in For instance, the word flies could be generated from state N with a probability of 0.25, or from state V with a probability of Now, it is not trivial to compute the probability of a sequence of words from the network

25 Hidden Markov Model (HMM) The probability that the sequence N V ART N generates the output Flies Like a flower is: The probability of path N V ART N is 0.29 * 0.43 * 0.65 * 1 = The probability of the output being Flies like a flower is P(flies N) * P(like V) * P(a ART) * P(flower N) = * 0.1 * 0.36 * = 5.4 * 10-5 The likelihood that HMM would generate the sentence is * = * 10-6 Therefore, the probability of a sentence w 1 w t, given a sequence C 1 C t, is i=1,t P(C i C i-1 ) * P(w i C i ) 25

26 Markov Chain 26

27 Viterbi Algorithm 27

28 Flies like a flower SEQSCORE(i, 1) = P(flies Li) * P(Li ) P(flies/V) = * = 7.6 * 10-6 P(flies/N) = * 0.29 = P(likes/V) = max( P(flies/N) * P(V N), P(flies/V) * P(V V)) * P(like V) = max ( * 0.43, 7.6 * 10-6 * ) * 0.1 =

29 Flies like a flower 29

30 30 Flies like a flower Brute force search steps are N T Viterbi algorithm steps are K* T * N 2

31 Getting Reliable Statistics (smoothing) Suppose we have 40 categories To collect unigrams, at least 40 samples, one for each category, are needed For bigrams, 1600 samples are needed For trigerams, samples are needed For 4-grams, samples are needed P(C i C C i-1 ) = 1 P(C i ) + 2 P(C i C i-1 ) + 3 P(C i C i-2 C i-1 ) = 1 31

32 32 Statistical Parsing Corpus-based methods offer new ways to control parsers We could use statistical methods to identify the common structures of a Language We can choose the most likely interpretation when a sentence is ambiguous This might lead to much more efficient parsers that are almost deterministic

33 33 Statistical Parsing What is the input of an statistical parser? Input is the output of a POS tagging algorithm If POSs are accurate, lexical ambiguity is removed But if tagging is wrong, parser cannot find the correct interpretation, or, may find a valid but implausible interpretation With %95 accuracy, the chance of correctly tagging a sentence of 8 words is 0.67, and that of 15 words is 0.46

34 34 Obtaining Lexical probabilities A better approach is: 1. computing the probability that each word appears in the possible lexical categories. 2. combining these probabilities with some method of assigning probabilities to rule use in the grammar The context independent Lexical category of a word w be L j can be estimated by: P(L j w) = count (L j & w) / i=1, N count( L i & w)

35 Context-independent lexical categories 35 P(Lj w) = count (Lj & w) / i=1,n count( Li & w) P(Art the) = 300 /303 =0.99 P(N flies) = 21 / 44 = 0.48

36 Context dependent lexical probabilities A better estimate can be obtained by computing how likely it is that category L i occurs at position t, in all sequences of the input w 1 w t Instead of just finding the sequence with the maximum probability, we add up the probabilities of all sequences that end in w t /L i The probability that flies is a noun in the sentence The flies like flowers is calculated by adding the probability of all sequences that end with flies as a noun 36

37 37 Context-dependent lexical probabilities Using probabilities of Figs 7-4 and 7-6, the sequences that have nonzero values: P(The/Art flies/n) = P( the ART) * P(ART ) * P(N ART) * P(flies N) = 0.54 * 0.71 * 1.0 * = 9.58 * 10-3 P(The/N flies/n) = P( the N) * P(N ) * P(N N) * P(flies N) = 1/833 * 0.29 * 0.13 * = 1.13 * 10-6 P(The/P flies/n) = P(the P) * P(P ) * P(N P) * P(flies N) = 2/307 * * 0.26 * = 4.55 * 10-9 Which adds up to 9.58 * 10-3

38 38 Context-dependent lexical probabilities Similarly, there are three nonzero sequences ending with flies as a V with a total value of 1.13 * 10-5 P(The flies) = 9.58 * * 10-5 = * 10-3 P(flies/N The flies) = P(flies/N & The flies) / P(The flies) = 9.58 * 10-3 / * 10-3 = P(flies/V The flies) = P(flies/V & The flies) / P(The flies) = 1.13 * 10-5 / * 10-3 =

39 Forward Probabilities The probability of producing the words w 1 w t and ending is state w t /L i is called the forward probability i (t) and is defined as: i (t) = P(w t /L i & w 1 w t ) In the flies like flowers, 2 (3) is the sum of values computed for all sequences ending in a V (the second category) in position 3, for the input the flies like P(w t /L i w 1 w t ) = P(w t /L i & w 1 w t ) / P(w 1 w t ) i (t) / j=1, N j (t) 39

40 Forward Probabilities 40

41 41 Context dependent lexical Probabilities

42 Context dependent lexical Probabilities 42

43 43 Backward Probability Backward probability, j (t)), is the probability of producing the sequence w t w T beginning from the state w t /L j P(w t /L i ) ( i (t) * i (t) ) / j=1, N ( j (t) * i (t))

44 44 Probabilistic Context-free Grammars CFGs can be generalized to PCFGs We need some statistics on rule use The simplest approach is to count the number of times each rule is used in a corpus with parsed sentences If category C has rules R 1 R m, then P(R j C) = count(# times R j used) / i=1,m count(# times R i used)

45 Probabilistic Context-free Grammars 45

46 Independence assumption You can develop algorithm similar to the Veterbi algorithm that finds the most probable parse tree for an input Certain independence assumptions must be made The probability of a constituent being derived by a rule Rj is independent of how the constituent is used as a sub constituent The probabilities of NP rules are the same no matter the NP is the subject, the object of a verb, or the object of a proposition This assumption is not valid; a subject NP is much more likely to be a pronoun than an object NP 46

47 47 Inside Probability The probability that a constituent C generates a sequence of words w i, w i+1,, w j (written as w i,j ) is called the inside probability and is denoted as P(w i,j C) It is called inside probability because it assigns a probability to the word sequence inside the constituent

48 48 Inside Probabilities How to derive inside probabilities? For lexical categories, these are the same as lexicalgeneration probabilities P(flower N) is the inside probability that the constituent N is realized as the word flower (0.06 in fig. 7-6) Using lexical-generation probabilities, inside probabilities of Non-lexical constituents can be computed

49 49 Inside probability of an NP generating A flower The probability of an NP generates a flower is estimated as: P(a flower NP) = P(rule 8 NP) * P(a ART) * P(flower N) + P(Rule 6 NP) * P(a N) * P(flower N) = 0.55 * 0.36 * * * 0.06 = 0.012

50 50 Inside probability of an S generating A flower wilted These probabilities can then be used to compute the probabilities of larger constituents P(a flower wilted S) = P(Rule 1 S) * P(a flower NP) * P(wilted VP) + P(Rule 1 S) * P(a NP) * P(flower wilted VP)

51 51 Probabilistic chart parsing In parsing, we are interested in finding the most likely parse rather than the overall probability of a given sentence. We can a Chart Parser for this propose When entering an entry E of category C using rule i with n sub constituents corresponding to entries E 1 E n, then P(E) = P(Rule i C) * P(E 1 ) * * P(E n ) For lexical categories, it is better to use forward probabilities rather than lexical-generation probs.

52 A flower 52

53 53 Probabilistic Parsing This technique identifies the correct parse %50 times The reason is that the independence assumption is too radical One of crucial issues is handling of lexical items A context-free model does not consider lexical preferences Parser prefers that PP attached to V rather than NP, and fails to find the correct structure of those that PP should be attached to NP

54 54 Best-First Parsing Exploring higher probability constituents first Much of the search space, containing lowerrated probabilities is not explored Chart parser s Agenda is organized as a priority queue Arc extension algorithm need to be modified

55 New arc extension for Prob. Chart Parser 55

56 56 The man put a bird in the house Best first parser finds the correct parse after generating 65 constituents, Standard bottom-up parser generates 158 constituents Standard algorithm generates 106 constituents to find the first answer So, the best-first parsing is a significant improvement

57 57 Best First Parsing It finds the most probable interpretation first Probability of a constituent is always lower or equal to the probability of any of its sub constituents If S2 with probability of p2 is found after S1 with the probability of p1, then p2 cannot be higher than p1, otherwise: Sub constituents of S2 would have higher probabilities than p1 and would be found sooner than S1 and thus S2 would be found sooner, too

58 58 Problem of multiplication In practice with large grammars, probabilities would drop quickly because of multiplications Other functions can be used Score(C) = MIN (Score(C C1 Cn), Score(C1),, Score(Cn) But MIN leads to a %39 correct result

59 59 Context-dependent probabilistic parsing The best-first algorithm improves the efficiency, but has no effect on accuracy Computing rules probability based on some context-dependent lexical information can improve accuracy The first word of a constituent is often its head word Computing the probability of rules based on the first word of constituents : P(R C, w)

60 60 Context-dependent probabilistic parsing P(R C, w) = Count( # times R used for cat. C starting with w) / Count(# times cat. C starts with w) Singular names rarely occur alone as a noun phrase (NP N) Plural nouns rarely act as a modifying name (NP N N) Context-dependent rules also encode verb preferences for sub categorizations

61 61 Rule probabilities based on the first word of constituents

62 Context-Dependent Parser Accuracy 62

63 63 The man pu the bird in the house P(VP V NP PP VP, put) = 0.93 * 0.99 * 0.76 * 0.76 = 0.54 P(VP V NP VP, put) =

64 64 The man Likes the bird in the house P(VP V NP PP VP, like) = 0.1 P(VP V NP VP, like) = 0.054

65 65 Context-dependent rules The accuracy of the parser is still %66 Make the rule probabilities relative to larger fragment of input (bigram, trigram, ) Using other important words, such as prepositions The more selective the lexical categories, the more predictive the estimates can be (provided that there is enough data) Other closed class words such as articles, quantifiers, conjunctions can also be used (i.e., treated individually) But what about open class words such as verbs and nouns (cluster similar words)

66 66 Handling Unknown Words An unknown word will disrupt the parse Suppose we have a trigram model of data If w 3 in the sequence of words w 1 w 2 w 3 is unknown, and if w 1 and w 2 are of categories C 1 and C 2 Pick the category C for w 3 such that P(C C 1 C 2 ) is maximized. For instance, if C 2 is ART, then C will probably be a NOUN (or an ADJECTIVE) Morphology can also help Unknown words ending with ing are likely a VERB, and those ending with ly are likely an ADVERB

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly Inflected Languages Classical Approaches to Tagging The slides are posted on the web. The url is http://chss.montclair.edu/~feldmana/esslli10/.

More information

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

CS 598 Natural Language Processing

CS 598 Natural Language Processing CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@

More information

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Ulrike Baldewein (ulrike@coli.uni-sb.de) Computational Psycholinguistics, Saarland University D-66041 Saarbrücken,

More information

Parsing of part-of-speech tagged Assamese Texts

Parsing of part-of-speech tagged Assamese Texts IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal

More information

Natural Language Processing. George Konidaris

Natural Language Processing. George Konidaris Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2017 Natural Language Processing Understanding spoken/written sentences in a natural language. Major area of research in AI. Why? Humans

More information

Grammars & Parsing, Part 1:

Grammars & Parsing, Part 1: Grammars & Parsing, Part 1: Rules, representations, and transformations- oh my! Sentence VP The teacher Verb gave the lecture 2015-02-12 CS 562/662: Natural Language Processing Game plan for today: Review

More information

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence. NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and

More information

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm syntax: from the Greek syntaxis, meaning setting out together

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

The Evolution of Random Phenomena

The Evolution of Random Phenomena The Evolution of Random Phenomena A Look at Markov Chains Glen Wang glenw@uchicago.edu Splash! Chicago: Winter Cascade 2012 Lecture 1: What is Randomness? What is randomness? Can you think of some examples

More information

Context Free Grammars. Many slides from Michael Collins

Context Free Grammars. Many slides from Michael Collins Context Free Grammars Many slides from Michael Collins Overview I An introduction to the parsing problem I Context free grammars I A brief(!) sketch of the syntax of English I Examples of ambiguous structures

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Basic Parsing with Context-Free Grammars Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Announcements HW 2 to go out today. Next Tuesday most important for background to assignment Sign up

More information

Accurate Unlexicalized Parsing for Modern Hebrew

Accurate Unlexicalized Parsing for Modern Hebrew Accurate Unlexicalized Parsing for Modern Hebrew Reut Tsarfaty and Khalil Sima an Institute for Logic, Language and Computation, University of Amsterdam Plantage Muidergracht 24, 1018TV Amsterdam, The

More information

Prediction of Maximal Projection for Semantic Role Labeling

Prediction of Maximal Projection for Semantic Role Labeling Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba

More information

Training and evaluation of POS taggers on the French MULTITAG corpus

Training and evaluation of POS taggers on the French MULTITAG corpus Training and evaluation of POS taggers on the French MULTITAG corpus A. Allauzen, H. Bonneau-Maynard LIMSI/CNRS; Univ Paris-Sud, Orsay, F-91405 {allauzen,maynard}@limsi.fr Abstract The explicit introduction

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Compositional Semantics

Compositional Semantics Compositional Semantics CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Words, bag of words Sequences Trees Meaning Representing Meaning An important goal of NLP/AI: convert natural language

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

An Evaluation of POS Taggers for the CHILDES Corpus

An Evaluation of POS Taggers for the CHILDES Corpus City University of New York (CUNY) CUNY Academic Works Dissertations, Theses, and Capstone Projects Graduate Center 9-30-2016 An Evaluation of POS Taggers for the CHILDES Corpus Rui Huang The Graduate

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

Lecture 10: Reinforcement Learning

Lecture 10: Reinforcement Learning Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation

More information

Proof Theory for Syntacticians

Proof Theory for Syntacticians Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 5, 2012 Logics for Linguistics Many different kinds of logic are directly applicable to formalizing theories in syntax

More information

Analysis of Probabilistic Parsing in NLP

Analysis of Probabilistic Parsing in NLP Analysis of Probabilistic Parsing in NLP Krishna Karoo, Dr.Girish Katkar Research Scholar, Department of Electronics & Computer Science, R.T.M. Nagpur University, Nagpur, India Head of Department, Department

More information

arxiv:cmp-lg/ v1 7 Jun 1997 Abstract

arxiv:cmp-lg/ v1 7 Jun 1997 Abstract Comparing a Linguistic and a Stochastic Tagger Christer Samuelsson Lucent Technologies Bell Laboratories 600 Mountain Ave, Room 2D-339 Murray Hill, NJ 07974, USA christer@research.bell-labs.com Atro Voutilainen

More information

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

Universiteit Leiden ICT in Business

Universiteit Leiden ICT in Business Universiteit Leiden ICT in Business Ranking of Multi-Word Terms Name: Ricardo R.M. Blikman Student-no: s1184164 Internal report number: 2012-11 Date: 07/03/2013 1st supervisor: Prof. Dr. J.N. Kok 2nd supervisor:

More information

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,

More information

Developing Grammar in Context

Developing Grammar in Context Developing Grammar in Context intermediate with answers Mark Nettle and Diana Hopkins PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE The Pitt Building, Trumpington Street, Cambridge, United

More information

LTAG-spinal and the Treebank

LTAG-spinal and the Treebank LTAG-spinal and the Treebank a new resource for incremental, dependency and semantic parsing Libin Shen (lshen@bbn.com) BBN Technologies, 10 Moulton Street, Cambridge, MA 02138, USA Lucas Champollion (champoll@ling.upenn.edu)

More information

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,

More information

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger Page 1 of 35 Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger Kaihong Liu, MD, MS, Wendy Chapman, PhD, Rebecca Hwa, PhD, and Rebecca S. Crowley, MD, MS

More information

Major Milestones, Team Activities, and Individual Deliverables

Major Milestones, Team Activities, and Individual Deliverables Major Milestones, Team Activities, and Individual Deliverables Milestone #1: Team Semester Proposal Your team should write a proposal that describes project objectives, existing relevant technology, engineering

More information

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,

More information

Advanced Grammar in Use

Advanced Grammar in Use Advanced Grammar in Use A self-study reference and practice book for advanced learners of English Third Edition with answers and CD-ROM cambridge university press cambridge, new york, melbourne, madrid,

More information

The Discourse Anaphoric Properties of Connectives

The Discourse Anaphoric Properties of Connectives The Discourse Anaphoric Properties of Connectives Cassandre Creswell, Kate Forbes, Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi Λ, Bonnie Webber y Λ University of Pennsylvania 3401 Walnut Street Philadelphia,

More information

The Role of the Head in the Interpretation of English Deverbal Compounds

The Role of the Head in the Interpretation of English Deverbal Compounds The Role of the Head in the Interpretation of English Deverbal Compounds Gianina Iordăchioaia i, Lonneke van der Plas ii, Glorianna Jagfeld i (Universität Stuttgart i, University of Malta ii ) Wen wurmt

More information

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment Takako Aikawa, Lee Schwartz, Ronit King Mo Corston-Oliver Carmen Lozano Microsoft

More information

BULATS A2 WORDLIST 2

BULATS A2 WORDLIST 2 BULATS A2 WORDLIST 2 INTRODUCTION TO THE BULATS A2 WORDLIST 2 The BULATS A2 WORDLIST 21 is a list of approximately 750 words to help candidates aiming at an A2 pass in the Cambridge BULATS exam. It is

More information

Writing a composition

Writing a composition A good composition has three elements: Writing a composition an introduction: A topic sentence which contains the main idea of the paragraph. a body : Supporting sentences that develop the main idea. a

More information

Chapter 4: Valence & Agreement CSLI Publications

Chapter 4: Valence & Agreement CSLI Publications Chapter 4: Valence & Agreement Reminder: Where We Are Simple CFG doesn t allow us to cross-classify categories, e.g., verbs can be grouped by transitivity (deny vs. disappear) or by number (deny vs. denies).

More information

Indian Institute of Technology, Kanpur

Indian Institute of Technology, Kanpur Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar

More information

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Jakub Waszczuk, Agata Savary To cite this version: Jakub Waszczuk, Agata Savary. Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]. PARSEME 6th general

More information

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.

More information

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit Unit 1 Language Development Express Ideas and Opinions Ask for and Give Information Engage in Discussion ELD CELDT 5 EDGE Level C Curriculum Guide 20132014 Sentences Reflective Essay August 12 th September

More information

Ch VI- SENTENCE PATTERNS.

Ch VI- SENTENCE PATTERNS. Ch VI- SENTENCE PATTERNS faizrisd@gmail.com www.pakfaizal.com It is a common fact that in the making of well-formed sentences we badly need several syntactic devices used to link together words by means

More information

The taming of the data:

The taming of the data: The taming of the data: Using text mining in building a corpus for diachronic analysis Stefania Degaetano-Ortlieb, Hannah Kermes, Ashraf Khamis, Jörg Knappen, Noam Ordan and Elke Teich Background Big data

More information

Some Principles of Automated Natural Language Information Extraction

Some Principles of Automated Natural Language Information Extraction Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract

More information

A Computational Evaluation of Case-Assignment Algorithms

A Computational Evaluation of Case-Assignment Algorithms A Computational Evaluation of Case-Assignment Algorithms Miles Calabresi Advisors: Bob Frank and Jim Wood Submitted to the faculty of the Department of Linguistics in partial fulfillment of the requirements

More information

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist Meeting 2 Chapter 7 (Morphology) and chapter 9 (Syntax) Today s agenda Repetition of meeting 1 Mini-lecture on morphology Seminar on chapter 7, worksheet Mini-lecture on syntax Seminar on chapter 9, worksheet

More information

An Efficient Implementation of a New POP Model

An Efficient Implementation of a New POP Model An Efficient Implementation of a New POP Model Rens Bod ILLC, University of Amsterdam School of Computing, University of Leeds Nieuwe Achtergracht 166, NL-1018 WV Amsterdam rens@science.uva.n1 Abstract

More information

Distant Supervised Relation Extraction with Wikipedia and Freebase

Distant Supervised Relation Extraction with Wikipedia and Freebase Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational

More information

A Comparison of Two Text Representations for Sentiment Analysis

A Comparison of Two Text Representations for Sentiment Analysis 010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational

More information

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS Julia Tmshkina Centre for Text Techitology, North-West University, 253 Potchefstroom, South Africa 2025770@puk.ac.za

More information

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions. to as a linguistic theory to to a member of the family of linguistic frameworks that are called generative grammars a grammar which is formalized to a high degree and thus makes exact predictions about

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

Words come in categories

Words come in categories Nouns Words come in categories D: A grammatical category is a class of expressions which share a common set of grammatical properties (a.k.a. word class or part of speech). Words come in categories Open

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

Developing a TT-MCTAG for German with an RCG-based Parser

Developing a TT-MCTAG for German with an RCG-based Parser Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,

More information

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen UNIVERSITY OF OSLO Department of Informatics Dialog Act Recognition using Dependency Features Master s thesis Sindre Wetjen November 15, 2013 Acknowledgments First I want to thank my supervisors Lilja

More information

A Syllable Based Word Recognition Model for Korean Noun Extraction

A Syllable Based Word Recognition Model for Korean Noun Extraction are used as the most important terms (features) that express the document in NLP applications such as information retrieval, document categorization, text summarization, information extraction, and etc.

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA Three New Probabilistic Models for Dependency Parsing: An Exploration Jason M. Eisner CIS Department, University of Pennsylvania 200 S. 33rd St., Philadelphia, PA 19104-6389, USA jeisner@linc.cis.upenn.edu

More information

Character Stream Parsing of Mixed-lingual Text

Character Stream Parsing of Mixed-lingual Text Character Stream Parsing of Mixed-lingual Text Harald Romsdorfer and Beat Pfister Speech Processing Group Computer Engineering and Networks Laboratory ETH Zurich {romsdorfer,pfister}@tik.ee.ethz.ch Abstract

More information

Learning Computational Grammars

Learning Computational Grammars Learning Computational Grammars John Nerbonne, Anja Belz, Nicola Cancedda, Hervé Déjean, James Hammerton, Rob Koeling, Stasinos Konstantopoulos, Miles Osborne, Franck Thollard and Erik Tjong Kim Sang Abstract

More information

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.

More information

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque Approaches to control phenomena handout 6 5.4 Obligatory control and morphological case: Icelandic and Basque Icelandinc quirky case (displaying properties of both structural and inherent case: lexically

More information

Language and Computers. Writers Aids. Introduction. Non-word error detection. Dictionaries. N-gram analysis. Isolated-word error correction

Language and Computers. Writers Aids. Introduction. Non-word error detection. Dictionaries. N-gram analysis. Isolated-word error correction Spelling & grammar We are all familiar with spelling & grammar correctors They are used to improve document quality They are not typically used to provide feedback L245 (Based on Dickinson, Brew, & Meurers

More information

Specifying a shallow grammatical for parsing purposes

Specifying a shallow grammatical for parsing purposes Specifying a shallow grammatical for parsing purposes representation Atro Voutilainen and Timo J~irvinen Research Unit for Multilingual Language Technology P.O. Box 4 FIN-0004 University of Helsinki Finland

More information

arxiv:cmp-lg/ v1 22 Aug 1994

arxiv:cmp-lg/ v1 22 Aug 1994 arxiv:cmp-lg/94080v 22 Aug 994 DISTRIBUTIONAL CLUSTERING OF ENGLISH WORDS Fernando Pereira AT&T Bell Laboratories 600 Mountain Ave. Murray Hill, NJ 07974 pereira@research.att.com Abstract We describe and

More information

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy Informatics 2A: Language Complexity and the Chomsky Hierarchy September 28, 2010 Starter 1 Is there a finite state machine that recognises all those strings s from the alphabet {a, b} where the difference

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

The Smart/Empire TIPSTER IR System

The Smart/Empire TIPSTER IR System The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of

More information

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1) Houghton Mifflin Reading Correlation to the Standards for English Language Arts (Grade1) 8.3 JOHNNY APPLESEED Biography TARGET SKILLS: 8.3 Johnny Appleseed Phonemic Awareness Phonics Comprehension Vocabulary

More information

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models 1 Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models James B.

More information

Derivational and Inflectional Morphemes in Pak-Pak Language

Derivational and Inflectional Morphemes in Pak-Pak Language Derivational and Inflectional Morphemes in Pak-Pak Language Agustina Situmorang and Tima Mariany Arifin ABSTRACT The objectives of this study are to find out the derivational and inflectional morphemes

More information

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing Grzegorz Chrupa la A dissertation submitted in fulfilment of the requirements for the award of Doctor of Philosophy (Ph.D.)

More information

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words, First Grade Standards These are the standards for what is taught in first grade. It is the expectation that these skills will be reinforced after they have been taught. Taught Throughout the Year Foundational

More information

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature 1 st Grade Curriculum Map Common Core Standards Language Arts 2013 2014 1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature Key Ideas and Details

More information

AN ANALYSIS OF GRAMMTICAL ERRORS MADE BY THE SECOND YEAR STUDENTS OF SMAN 5 PADANG IN WRITING PAST EXPERIENCES

AN ANALYSIS OF GRAMMTICAL ERRORS MADE BY THE SECOND YEAR STUDENTS OF SMAN 5 PADANG IN WRITING PAST EXPERIENCES AN ANALYSIS OF GRAMMTICAL ERRORS MADE BY THE SECOND YEAR STUDENTS OF SMAN 5 PADANG IN WRITING PAST EXPERIENCES Yelna Oktavia 1, Lely Refnita 1,Ernati 1 1 English Department, the Faculty of Teacher Training

More information

BASIC ENGLISH. Book GRAMMAR

BASIC ENGLISH. Book GRAMMAR BASIC ENGLISH Book 1 GRAMMAR Anne Seaton Y. H. Mew Book 1 Three Watson Irvine, CA 92618-2767 Web site: www.sdlback.com First published in the United States by Saddleback Educational Publishing, 3 Watson,

More information

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

A Graph Based Authorship Identification Approach

A Graph Based Authorship Identification Approach A Graph Based Authorship Identification Approach Notebook for PAN at CLEF 2015 Helena Gómez-Adorno 1, Grigori Sidorov 1, David Pinto 2, and Ilia Markov 1 1 Center for Computing Research, Instituto Politécnico

More information

Written by: YULI AMRIA (RRA1B210085) ABSTRACT. Key words: ability, possessive pronouns, and possessive adjectives INTRODUCTION

Written by: YULI AMRIA (RRA1B210085) ABSTRACT. Key words: ability, possessive pronouns, and possessive adjectives INTRODUCTION STUDYING GRAMMAR OF ENGLISH AS A FOREIGN LANGUAGE: STUDENTS ABILITY IN USING POSSESSIVE PRONOUNS AND POSSESSIVE ADJECTIVES IN ONE JUNIOR HIGH SCHOOL IN JAMBI CITY Written by: YULI AMRIA (RRA1B210085) ABSTRACT

More information

Large vocabulary off-line handwriting recognition: A survey

Large vocabulary off-line handwriting recognition: A survey Pattern Anal Applic (2003) 6: 97 121 DOI 10.1007/s10044-002-0169-3 ORIGINAL ARTICLE A. L. Koerich, R. Sabourin, C. Y. Suen Large vocabulary off-line handwriting recognition: A survey Received: 24/09/01

More information

Leveraging Sentiment to Compute Word Similarity

Leveraging Sentiment to Compute Word Similarity Leveraging Sentiment to Compute Word Similarity Balamurali A.R., Subhabrata Mukherjee, Akshat Malu and Pushpak Bhattacharyya Dept. of Computer Science and Engineering, IIT Bombay 6th International Global

More information

THE VERB ARGUMENT BROWSER

THE VERB ARGUMENT BROWSER THE VERB ARGUMENT BROWSER Bálint Sass sass.balint@itk.ppke.hu Péter Pázmány Catholic University, Budapest, Hungary 11 th International Conference on Text, Speech and Dialog 8-12 September 2008, Brno PREVIEW

More information

Formulaic Language and Fluency: ESL Teaching Applications

Formulaic Language and Fluency: ESL Teaching Applications Formulaic Language and Fluency: ESL Teaching Applications Formulaic Language Terminology Formulaic sequence One such item Formulaic language Non-count noun referring to these items Phraseology The study

More information

Sample Goals and Benchmarks

Sample Goals and Benchmarks Sample Goals and Benchmarks for Students with Hearing Loss In this document, you will find examples of potential goals and benchmarks for each area. Please note that these are just examples. You should

More information