Link Learning with Wikipedia

Size: px

Start display at page:

Download "Link Learning with Wikipedia"

Dwight Booth
6 years ago
Views:

1 Link Learning with Wikipedia (Milne and Witten, 2008b) Dominikus Wetzel Department of Computational Linguistics Saarland University December 4, / 28

2 1 Semantic Relatedness 2 Word Sense Disambiguation (WSD) 3 Intermission 4 Link Detection 5 Evaluation 2 / 28

3 Yet another relatedness measure Presented in (Milne and Witten, 2008a): relatedness(a,b) = log(max( A, B )) log( A B ) log( W ) log(min( A, B )) a, b: wikipedia articles A,B: sets of articles that link to a (or b) W : set of all wikipedia articles Note: This can also be used for term relatedness one simply takes the article that would be linked to by the term. 3 / 28

4 Wikipedia for Training a WSD Classifier a term in Wikipedia is linked to an article manual and unambiguous annotation set of possible senses (i.e. sense inventory) set of documents that are linked to by the same term sense article positive example: actual target article of term negative examples: all other target articles 4 / 28

5 Commonness and Relatedness of a Sense - Example Figure: Commonness, relatedness, sense inventory and context (taken from (Milne and Witten, 2008b)) 5 / 28

6 Commonness and Relatedness of a Sense Commonness (article): number of times c an article is used as target by a term normalized by the sum of all numbers c i s.t. i is an element of all possible senses of the term common senses get higher value, less common ones a lower value Relatedness (sense context): how related is the sense to the context of the current article context: all unambiguous terms that are links in the article unambiguous term: always linked to exacly one article Milne and Witten (2008b) claim: if the text is long enough, there are unambiguous terms compare relatedness of sense to each article linked by each unambiguous term 6 / 28

7 Relatedness in Depth Weighting the sense context relatedness: reason: certain context terms are more informative than others link probability Initially suggested in (Mihalcea and Csomai, 2007) P(isKeyword term) count(d key) count(d W ) D key : documents containing term as link D W : all documents containing term (link + no link) average relatedness of a context term to all other context terms some context terms are outliers The weight: both measures are averaged together we have combined weight for each context term 7 / 28

8 Another Feature Context Quality if context is mixed commonness is more important if context is invariant relatedness is more important sum of weights calculated beforehand: link probability and context term relatedness 8 / 28

9 Final Features for the Classifier The Features: commonness weighted relatedness context quality The Classifier: produces a probability that a sense is valid k-best senses can be obtained 9 / 28

10 Training and Testing Training: Testing only articles with at least 50 links disambiguation pages and lists are excluded 500 out of 700 articles were used, containing more than 50,000 links configuration: consider only senses above a threshold of 2% algorithm: best performance (i.e. f-measure) C4.5 (as opposed to Naive Bayes and SVM) 10 / 28

11 Configuration parameter: minimum sense probability Figure: Discarding sense threshold (taken from (Milne and Witten, 2008b)) 11 / 28

12 Brief facts about C4.5 Developed by and desribed in (Quinlan, 1993): a program that generates decision trees used for classification requirements on the data: representable in a fixed size feature vector, classes must be predefined and discrete, sufficient amount of training data claim: supervised machine learning that produces human readable models decision trees in C4.5: a leaf represents the class; all other nodes perform tests of which the results are represented as n-ary branches (only one feature value at a time) pruning of trees to reduce size WEKA J ml/weka/ 12 / 28

13 C4.5 Decision Tree Figure: C4.5 decision tree (taken from (Quinlan, 1993)) 13 / 28

14 Introduction Task: Identify link-worthy terms in raw text Disambiguate terms in order to find correct target article Note: exact position of the link is not determined; only possible candidate terms are identified the system works with set of terms (that all point to the same sense) rather than individual terms 14 / 28

15 Step-by-Step Extract all n-grams Discard stop-words + nonsense terms below threshold Remaining terms are disambiguated find corresponding article with WSD classifier Note: no POS tagging, stemming, etc. required Note: quite robust to spelling mistakes and synonmys (Milne and Witten, 2008b) is unspecific about n-grams: no upper limit for n is given what happens if n-grams overlap (partially or completely) 15 / 28

16 Disambigutated link-worthy terms - Example Figure: taken from (Milne and Witten, 2008b) 16 / 28

17 Disambigutated link-worthy terms - Example Figure: taken from (Milne and Witten, 2008b) 17 / 28

18 Linking as a Classifier The Classifier: decides whether the set of terms (i.e. its identified article) is link-worthy or not features are extracted from articles and positions of terms The Features: Link Probability Relatedness Disambiguation Confidence Generality Location and Spread 18 / 28

19 Features in Detail (1) Link Probability: P(isKeyword term) count(d key) count(d W ) D key : documents containing term as link D W : all documents containing term (link + no link) average: all terms that were disambiguated to the same article maximum: term with the maximum link probability Relatedness: weighted relatedness of sense to context: has been computed during feature extraction in the WSD component average relatedness between sense and all the other possible senses identified by the WSD for one term: 1 N N i=1 relatedness(a,b i) s.t. b i set of possible senses of a term 19 / 28

20 Features in Detail (2) Disambiguation Confidence: average: averaged confidence probability of all terms that were disambiguated to the same article maximum: maximal confidence probability of term to articles Generality: a general topic article is less link-worty than a specific topic article minimum depth in Wikipedia s Category hierarchy Note: why minimum? 20 / 28

21 Features in Detail (3) Location and Spread: frequency: amount of terms that were disambiguated to the same article in one document first occurence: position of first term of same group in raw text last occurence: positon of last term of same group in raw text spread: distance between first and last occurence last three are normalized by the document length 21 / 28

22 Classification Details Parameter estimation and algorithm: threshold: only consider n-grams with link probability 6.5% algorithm: C4.5 performs best Figure: Nonsense term and stop-word removal (taken from (Milne and Witten, 2008b)) 22 / 28

23 The Mechanical Turk Mechanical Turk 2 Artificial artificial intelligence: requestor: people who want a certain task to be completed worker: person who performs the task workers are paid when requestors accept their work impose qualification constraints and ensure human worker / 28

24 Tasks Task I: Task II: given: a document with one link referring to the first paragraph of its article possible answers: no (not a plausible link location) no (plausible link location, but wrong article) kind-of (plausible link location, but article isn t helpful) yes (plausible link location, helpful article) given: text with all generated links provide: any additional Wikipedia article postition of the link URL to the article 24 / 28

25 Results Task I: Task II: Inter annotator agreement: 57% all three, 40% two by one, 3% no agreement system precision: 76% correct, 24% incorrect Inter annotator agreement: 4% all five, 13% by four and three, 17% by two, 53% no agreement system precision: 76% match, 24% no-match 25 / 28

26 Personal hands-on evaluation Do it yourself: 26 / 28

27 The End Thank you! May the discussion begin. 27 / 28

28 References Mihalcea, R. and Csomai, A. (2007). Wikify! Linking Documents to Encyclopedic Knowledge. In Proceedings of the 16 th ACM Conference on Information and Knowledge Management (CIKM 2007), pages Milne, D. and Witten, I. H. (2008a). An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proceedings of the first AAAI Workshop on Wikipedia and Artificial Intelligence (WIKIAI 08). Milne, D. and Witten, I. H. (2008b). Learning to Link with Wikipedia. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM 2008). Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Series in Machine Learning. Morgan Kaufmann, Amsterdam. 28 / 28

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United