Text Classification and Sentiment Analysis

Size: px
Start display at page:

Download "Text Classification and Sentiment Analysis"

Transcription

1 Text Classification and Sentiment Analysis Muhammad Atif Qureshi National University of Ireland, Galway and University of Milano-Bicocca, Italy

2 Is this spam?

3 Who wrote which Federalist papers? : anonymous essays try to convince New York to ra1fy U.S Cons1tu1on: Jay, Madison, Hamilton. Authorship of 12 of the lelers in dispute 1963: solved by Mosteller and Wallace using Bayesian methods James Madison Alexander Hamilton

4 Male or female author? 1. By 1925 present- day Vietnam was divided into three parts under French colonial rule. The southern region embracing Saigon and the Mekong delta was the colony of Cochin- China; the central area with its imperial capital at Hue was the protectorate of Annam 2. Clara never failed to be astonished by the extraordinary felicity of her own name. She found it hard to trust herself to the mercy of fate, which had managed over the years to convert her greatest shame into one of her greatest assets S. Argamon, M. Koppel, J. Fine, A. R. Shimoni, Gender, Genre, and Wri1ng Style in Formal WriLen Texts, Text, volume 23, number 3, pp

5 Posi8ve or nega8ve movie review? unbelievably disappoin1ng Full of zany characters and richly applied sa1re, and some great plot twists this is the greatest screwball comedy ever filmed It was pathe1c. The worst part about it was the boxing scenes. 5

6 What is the subject of this ar8cle? 6 MEDLINE Article? MeSH Subject Category Hierarchy Antogonists and Inhibitors Blood Supply Chemistry Drug Therapy Embryology Epidemiology

7 Text Classifica8on Assigning subject categories, topics, or genres Spam detec1on Authorship iden1fica1on Age/gender iden1fica1on Language Iden1fica1on Sen1ment analysis

8 Text Classifica8on: defini8on Input: a document d a fixed set of classes C = {c 1, c 2,, c J } Output: a predicted class c C

9 Classifica8on Methods: Hand- coded rules Rules based on combina1ons of words or other features spam: black- list- address OR ( dollars AND have been selected ) Accuracy can be high If rules carefully refined by expert But building and maintaining these rules is expensive

10 Classifica8on Methods: Supervised Machine Learning Input: a document d a fixed set of classes C = {c 1, c 2,, c J } A training set of m hand- labeled documents (d 1,c 1 ),...,(d m,c m ) Output: a learned classifier γ:d à c 10

11 Classifica8on Methods: Supervised Machine Learning Any kind of classifier Naïve Bayes Logis1c regression Support- vector machines k- Nearest Neighbors

12 Text Classification and Naïve Bayes The Task of Text Classifica1on

13 Text Classification and Naïve Bayes Naïve Bayes (I)

14 Naïve Bayes Intui8on Simple ( naïve ) classifica1on method based on Bayes rule Relies on very simple representa1on of document Bag of words

15 The bag of words representa8on I love this movie! It's sweet, but with satirical humor. The dialogue is great and the adventure scenes are fun It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet.! γ( )=c

16 The bag of words representa8on I love this movie! It's sweet, but with satirical humor. The dialogue is great and the adventure scenes are fun It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet.! γ( )=c

17 The bag of words representa8on: using a subset of words x love xxxxxxxxxxxxxxxx sweet xxxxxxx satirical xxxxxxxxxx xxxxxxxxxxx great xxxxxxx xxxxxxxxxxxxxxxxxxx fun xxxx xxxxxxxxxxxxx whimsical xxxx romantic xxxx laughing xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxx recommend xxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xx several xxxxxxxxxxxxxxxxx xxxxx happy xxxxxxxxx again xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxx! γ( )=c

18 The bag of words representa8on great! 2! love! 2! γ( )=c recommend! 1! laugh! 1! happy! 1!...!...!

19 Bag of words for document classifica8on Test document parser! language! label! translation!! Machine Learning! learning! training! algorithm! shrinkage! network...! NLP! parser! tag! training! translation! language...!? Garbage! Collection! garbage! collection! memory! optimization! region...! Planning! planning! temporal! reasoning! plan! language...! GUI!...!

20 Text Classification and Naïve Bayes Naïve Bayes (I)

21 Text Classification and Naïve Bayes Formalizing the Naïve Bayes Classifier

22 Bayes Rule Applied to Documents and Classes For a document d and a class c P(c d) = P(d c)p(c) P(d)

23 Naïve Bayes Classifier (I) c MAP = argmax c!c = argmax c!c = argmax c!c P(c d) P(d c)p(c) P(d) P(d c)p(c) MAP is maximum a posteriori = most likely class Bayes Rule Dropping the denominator

24 Naïve Bayes Classifier (II) c MAP = argmax c!c P(d c)p(c) = argmax c!c P(x 1, x 2,, x n c)p(c) Document d represented as features x1..xn

25 Naïve Bayes Classifier (IV) c MAP = argmax c!c P(x 1, x 2,, x n c)p(c) O( X n C ) parameters Could only be es1mated if a very, very large number of training examples was available. How often does this class occur? We can just count the relative frequencies in a corpus

26 Mul8nomial Naïve Bayes Independence Assump8ons P(x 1, x 2,, x n c) Bag of Words assump8on: Assume posi1on doesn t maler Condi8onal Independence: Assume the feature probabili1es P(x i c j ) are independent given the class c. P(x 1,, x n c) = P(x 1 c) P(x 2 c) P(x 3 c)... P(x n c)

27 Mul8nomial Naïve Bayes Classifier c MAP = argmax c!c c NB = argmax c!c P(x 1, x 2,, x n c)p(c) " x!x P(c j ) P(x c)

28 Applying Mul8nomial Naive Bayes Classifiers to Text Classifica8on positions all word posi1ons in test document c NB = argmax c j!c " i! positions P(c j ) P(x i c j )

29 Text Classification and Naïve Bayes Formalizing the Naïve Bayes Classifier

30 Text Classification and Naïve Bayes Naïve Bayes: Learning

31 Sec.13.3 Learning the Mul8nomial Naïve Bayes Model First alempt: maximum likelihood es1mates simply use the frequencies in the data ˆP(c j ) = doccount(c = c j ) N doc ˆP(w i c j ) = count(w i,c j ) count(w,c j ) " w!v

32 Parameter es8ma8on ˆP(w i c j ) = count(w i,c j ) count(w,c j ) " w!v frac1on of 1mes word w i appears among all words in documents of topic c j Create mega- document for topic j by concatena1ng all docs in this topic Use frequency of w in mega- document

33 Sec.13.3 Problem with Maximum Likelihood What if we have seen no training documents with the word fantas&c and classified in the topic posi8ve (thumbs- up)? ˆP("fantastic" positive) = count("fantastic", positive) " count(w, positive) Zero probabili1es cannot be condi1oned away, no maler the other evidence! w!v c MAP = argmax c ˆP(c)! ˆP(xi c) i = 0

34 Laplace (add- 1) smoothing for Naïve Bayes ˆP(w i c) = = " w!v # % $ count(w i, c)+1 count(w, c) +1 ( ) ) count(w i, c)+1 & " count(w, c) ( + V ' w!v

35 Mul8nomial Naïve Bayes: Learning From training corpus, extract Vocabulary Calculate P(c j ) terms For each c j in C do docs j all docs with class =c j docs P(c j )! j total # documents Calculate P(w k c j ) terms Text j single doc containing all docs j For each word w k in Vocabulary n k # of occurrences of w k in Text j n P(w k c j )! k +! n +! Vocabulary

36 Laplace (add- 1) smoothing: unknown words Add one extra word to the vocabulary, the unknown word w u count(w ˆP(w u c) = u,c)+1 # & % " count(w, c) ( + V +1 $ w!v ' 1 = # & % " count(w, c) ( + V +1 $ ' w!v

37 Text Classification and Naïve Bayes Naïve Bayes: Learning

38 Text Classification and Naïve Bayes Naïve Bayes: Rela1onship to Language Modeling

39 Genera8ve Model for Mul8nomial Naïve Bayes c=china X 1 =Shanghai X 2 =and X 3 =Shenzhen X 4 =issue X 5 =bonds 39

40 Naïve Bayes and Language Modeling Naïve bayes classifiers can use any sort of feature URL, address, dic1onaries, network features But if, as in the previous slides We use only word features we use all of the words in the text (not a subset) Then 40 Naïve bayes has an important similarity to language modeling.

41 Sec Each class = a unigram language model Assigning each word: P(word c) Assigning each sentence: P(s c)=π P(word c) 0.1 I 0.1 love 0.01 this 0.05 fun 0.1 film Class pos I love this fun film P(s pos) =

42 Sec Naïve Bayes as a Language Model Which class assigns the higher probability to s? Model pos Model neg 0.1 I 0.1 love 0.01 this 0.2 I love 0.01 this I love this fun film fun 0.1 film fun 0.1 film P(s pos) > P(s neg)

43 Text Classification and Naïve Bayes Naïve Bayes: Rela1onship to Language Modeling

44 Text Classification and Naïve Bayes Mul1nomial Naïve Bayes: A Worked Example

45 ˆP(w c) = 45 Priors: P(c)= P(j)= count(w, c)+1 count(c)+ V ˆP(c) = N c N Condi8onal Probabili8es: P(Chinese c) = P(Tokyo c) = (5+1) / (8+6) = 6/14 = 3/7 (0+1) / (8+6) = 1/14 P(Japan c) = (0+1) / (8+6) = 1/14 P(Chinese j) = (1+1) / (3+6) = 2/9 P(Tokyo j) = (1+1) / (3+6) = 2/9 P(Japan j) = (1+1) / (3+6) = 2/9 Doc Words Class Training 1 Chinese Beijing Chinese c 2 Chinese Chinese Shanghai c 3 Chinese Macao c 4 Tokyo Japan Chinese j Test 5 Chinese Chinese Chinese Tokyo Japan? Choosing a class: P(c d5) 3/4 * (3/7) 3 * 1/14 * 1/ P(j d5) 1/4 * (2/9) 3 * 2/9 * 2/

46 Naïve Bayes in Spam Filtering SpamAssassin Features: Men1ons Generic Viagra Online Pharmacy Men1ons millions of (dollar) ((dollar) NN,NNN,NNN.NN) Phrase: impress... girl From: starts with many numbers Subject is all capitals HTML has a low ra1o of text to image area One hundred percent guaranteed Claims you can be removed from the list 'Pres1gious Non- Accredited Universi1es' hlp://spamassassin.apache.org/tests_3_3_x.html

47 Summary: Naive Bayes is Not So Naive Very Fast, low storage requirements Robust to Irrelevant Features Irrelevant Features cancel each other without affec1ng results Very good in domains with many equally important features Decision Trees suffer from fragmentagon in such cases especially if lille data Op1mal if the independence assump1ons hold: If assumed independence is correct, then it is the Bayes Op1mal Classifier for problem A good dependable baseline for text classifica1on But we will see other classifiers that give bewer accuracy

48 Text Classification and Naïve Bayes Mul1nomial Naïve Bayes: A Worked Example

49 Text Classification and Naïve Bayes Precision, Recall, and the F measure

50 The 2- by- 2 con8ngency table correct not correct selected tp fp not selected fn tn

51 Precision and recall Precision: % of selected items that are correct Recall: % of correct items that are selected correct not correct selected tp fp not selected fn tn

52 A combined measure: F A combined measure that assesses the P/R tradeoff is F measure (weighted harmonic mean): F 2 1 (" + 1) PR = = # + (1!#) " P + R P R The harmonic mean is a very conserva1ve average; see IIR 8.3 People usually use balanced F1 measure i.e., with β = 1 (that is, α = ½): F = 2PR/(P+R)

53 Text Classification and Naïve Bayes Precision, Recall, and the F measure

54 Text Classification and Naïve Bayes Text Classifica1on: Evalua1on

55 More Than Two Classes: Sets of binary classifiers Sec.14.5 Dealing with any- of or mul1value classifica1on A document can belong to 0, 1, or >1 classes. For each class c C Build a classifier γ c to dis1nguish c from all other classes c C Given test doc d, Evaluate it for membership in each class using each γ c d belongs to any class for which γ c returns true 55

56 More Than Two Classes: Sets of binary classifiers Sec.14.5 One- of or mul1nomial classifica1on Classes are mutually exclusive: each document in exactly one class For each class c C Build a classifier γ c to dis1nguish c from all other classes c C Given test doc d, Evaluate it for membership in each class using each γ c d belongs to the one class with maximum score 56

57 57 Evalua8on: Classic Reuters Data Set Most (over)used data set, 21,578 docs (each 90 types, 200 toknens) 9603 training, 3299 test ar1cles (ModApte/Lewis split) 118 categories An ar1cle can be in more than one category Learn 118 binary category dis1nc1ons Average document (with at least one category) has 1.24 classes Only about 10 out of 118 categories are large Common categories (#train, #test) Earn (2877, 1087) Acquisitions (1650, 179) Money-fx (538, 179) Grain (433, 149) Crude (389, 189) Trade (369,119) Interest (347, 131) Ship (197, 89) Wheat (212, 71) Corn (182, 56) Sec

58 Reuters Text Categoriza8on data set (Reuters ) document Sec <REUTERS TOPICS="YES" LEWISSPLIT="TRAIN" CGISPLIT="TRAINING-SET" OLDID="12981" NEWID="798"> <DATE> 2-MAR :51:43.42</DATE> <TOPICS><D>livestock</D><D>hog</D></TOPICS> <TITLE>AMERICAN PORK CONGRESS KICKS OFF TOMORROW</TITLE> <DATELINE> CHICAGO, March 2 - </DATELINE><BODY>The American Pork Congress kicks off tomorrow, March 3, in Indianapolis with 160 of the nations pork producers from 44 member states determining industry positions on a number of issues, according to the National Pork Producers Council, NPPC. Delegates to the three day Congress will be considering 26 resolutions concerning various issues, including the future direction of farm policy and the tax law as it applies to the agriculture sector. The delegates will also debate whether to endorse concepts of a national PRV (pseudorabies virus) control and eradication program, the NPPC said. A large trade show, in conjunction with the congress, will feature the latest in technology in all areas of the industry, the NPPC added. Reuter 58 &#3;</BODY></TEXT></REUTERS>

59 Confusion matrix c For each pair of classes <c 1,c 2 > how many documents from c 1 were incorrectly assigned to c 2? c 3,2 : 90 wheat documents incorrectly assigned to poultry Docs in test set Assigned UK Assigned poultry Assigned wheat Assigned coffee Assigned interest True UK True poultry True wheat True coffee True interest Assigned trade 59 True trade

60 Sec Per class evalua8on measures Recall: Frac1on of docs in class i classified correctly:! j c ii c ij 60 Precision: Frac1on of docs assigned class i that are actually about class i: Accuracy: (1 - error rate) Frac1on of docs classified correctly:! j! i i c ii!! j c ii c ji c ij

61 Sec Micro- vs. Macro- Averaging If we have more than one class, how do we combine mul1ple performance measures into one quan1ty? Macroaveraging: Compute performance for each class, then average. Microaveraging: Collect decisions for all classes, compute con1ngency table, evaluate. 61

62 Sec Micro- vs. Macro- Averaging: Example Class 1 Class 2 Micro Ave. Table Truth: yes Classifier: yes Truth: no Classifier: no Truth: yes Classifier: yes Truth: no Classifier: no Truth: yes Classifier: yes Truth: no Classifier: no Macroaveraged precision: ( )/2 = 0.7 Microaveraged precision: 100/120 =.83 Microaveraged score is dominated by score on common classes 62

63 Development Test Sets and Cross- valida8on Training set Development Test Set Test Set Metric: P/R/F1 or Accuracy Unseen test set avoid overfiƒng ( tuning to the test set ) more conserva1ve es1mate of performance Cross- valida1on over mul1ple splits Handle sampling errors from different datasets Pool results over each split Compute pooled dev set performance Training Set Dev Test Training Set Dev Test Dev Test Training Set Test Set

64 Text Classification and Naïve Bayes Text Classifica1on: Evalua1on

65 Text Classification and Naïve Bayes Text Classifica1on: Prac1cal Issues

66 Sec The Real World Gee, I m building a text classifier for real, now! What should I do? 66

67 No training data? Manually written rules Sec If (wheat or grain) and not (whole or bread) then Categorize as grain Need careful cra ing Human tuning on development data Time- consuming: 2 days per class 67

68 Sec Very little data? Use Naïve Bayes Naïve Bayes is a high- bias algorithm (Ng and Jordan 2002 NIPS) Get more labeled data Find clever ways to get humans to label data for you Try semi- supervised training methods: Bootstrapping, EM over unlabeled documents, 68

69 Sec A reasonable amount of data? Perfect for all the clever classifiers SVM Regularized Logis1c Regression You can even use user- interpretable decision trees Users like to hack Management likes quick fixes 69

70 Sec A huge amount of data? Can achieve high accuracy! At a cost: SVMs (train 1me) or knn (test 1me) can be too slow Regularized logis1c regression can be somewhat beler So Naïve Bayes can come back into its own again! 70

71 Sec Accuracy as a function of data size With enough data Classifier may not maler 71 Brill and Banko on spelling correc1on

72 Real- world systems generally combine: Automa1c classifica1on Manual review of uncertain/difficult/"new cases 72

73 Underflow Preven8on: log space Mul1plying lots of probabili1es can result in floa1ng- point underflow. Since log(xy) = log(x) + log(y) BeLer to sum logs of probabili1es instead of mul1plying probabili1es. Class with highest un- normalized log probability score is s1ll most probable. c NB = argmax c j!c Model is now just max of sum of weights " i! positions log P(c j )+ log P(x i c j )

74 Sec How to tweak performance Domain- specific features and weights: very important in real performance Some1mes need to collapse terms: 74 Part numbers, chemical formulas, But stemming generally doesn t help Upweigh1ng: Coun1ng a word as if it occurred twice: 1tle words (Cohen & Singer 1996) first sentence of each paragraph (Murata, 1999) In sentences that contain 1tle words (Ko et al, 2002)

75 Text Classification and Naïve Bayes Text Classifica1on: Prac1cal Issues

76 Posi%ve or nega%ve movie review? unbelievably disappoin+ng Full of zany characters and richly applied sa+re, and some great plot twists this is the greatest screwball comedy ever filmed It was pathe+c. The worst part about it was the boxing scenes. 2

77 Google Product Search a 3

78 Bing Shopping a 4

79 Twi;er sen%ment versus Gallup Poll of Consumer Confidence Brendan O'Connor, Ramnath Balasubramanyan, Bryan R. Routledge, and Noah A. Smith From Tweets to Polls: Linking Text Sen+ment to Public Opinion Time Series. In ICWSM- 2010

80 Twi;er sen%ment: Johan Bollen, Huina Mao, Xiaojun Zeng TwiXer mood predicts the stock market, Journal of Computa+onal Science 2:1, /j.jocs

81 Bollen et al. (2011) CALM predicts DJIA 3 days later At least one current hedge fund uses this algorithm Dow Jones CALM 7

82 Target Sen%ment on Twi;er TwiXer Sen+ment App Alec Go, Richa Bhayani, Lei Huang TwiXer Sen+ment Classifica+on using Distant Supervision 8

83 Sen%ment analysis has many other names Opinion extrac+on Opinion mining Sen+ment mining Subjec+vity analysis 9

84 Why sen%ment analysis? Movie: is this review posi+ve or nega+ve? Products: what do people think about the new iphone? Public sen1ment: how is consumer confidence? Is despair increasing? Poli1cs: what do people think about this candidate or issue? Predic1on: predict elec+on outcomes or market trends from sen+ment 10

85 Scherer Typology of Affec%ve States Emo%on: brief organically synchronized evalua+on of a major event angry, sad, joyful, fearful, ashamed, proud, elated Mood: diffuse non- caused low- intensity long- dura+on change in subjec+ve feeling cheerful, gloomy, irritable, listless, depressed, buoyant Interpersonal stances: affec+ve stance toward another person in a specific interac+on friendly, flirta1ous, distant, cold, warm, suppor1ve, contemptuous AGtudes: enduring, affec+vely colored beliefs, disposi+ons towards objects or persons liking, loving, ha1ng, valuing, desiring Personality traits: stable personality disposi+ons and typical behavior tendencies nervous, anxious, reckless, morose, hos1le, jealous

86 Scherer Typology of Affec%ve States Emo%on: brief organically synchronized evalua+on of a major event angry, sad, joyful, fearful, ashamed, proud, elated Mood: diffuse non- caused low- intensity long- dura+on change in subjec+ve feeling cheerful, gloomy, irritable, listless, depressed, buoyant Interpersonal stances: affec+ve stance toward another person in a specific interac+on friendly, flirta1ous, distant, cold, warm, suppor1ve, contemptuous AGtudes: enduring, affec%vely colored beliefs, disposi%ons towards objects or persons liking, loving, ha1ng, valuing, desiring Personality traits: stable personality disposi+ons and typical behavior tendencies nervous, anxious, reckless, morose, hos1le, jealous

87 Sen%ment Analysis Sen+ment analysis is the detec+on of agtudes enduring, affec+vely colored beliefs, disposi+ons towards objects or persons Holder (source) of aftude 2. Target (aspect) of aftude 3. Type of aftude From a set of types Like, love, hate, value, desire, etc. Or (more commonly) simple weighted polarity: posi1ve, nega1ve, neutral, together with strength 4. Text containing the aftude Sentence or en+re document

88 Sen%ment Analysis Simplest task: Is the aftude of this text posi+ve or nega+ve? More complex: Rank the aftude of this text from 1 to 5 Advanced: Detect the target, source, or complex aftude types

89 Sen%ment Analysis Simplest task: Is the aftude of this text posi+ve or nega+ve? More complex: Rank the aftude of this text from 1 to 5 Advanced: Detect the target, source, or complex aftude types

90 Sentiment Analysis What is Sen+ment Analysis?

91 Sentiment Analysis A Baseline Algorithm

92 Sentiment Classification in Movie Reviews Polarity detec+on: Is an IMDB movie review posi+ve or nega+ve? Data: Polarity Data 2.0: Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan Thumbs up? Sen+ment Classifica+on using Machine Learning Techniques. EMNLP- 2002, Bo Pang and Lillian Lee A Sen+mental Educa+on: Sen+ment Analysis Using Subjec+vity Summariza+on Based on Minimum Cuts. ACL, hxp:// review- data

93 IMDB data in the Pang and Lee database when _star wars_ came out some twenty years ago, the image of traveling throughout the stars has become a commonplace image. [ ] when han solo goes light speed, the stars change to bright lines, going towards the viewer in lines that converge at an invisible point. cool. _october sky_ offers a much simpler image that of a single white dot, traveling horizontally across the night sky. [... ] snake eyes is the most aggrava+ng kind of movie : the kind that shows so much poten+al then becomes unbelievably disappoin+ng. it s not just because this is a brian depalma film, and since he s a great director and one who s films are always greeted with at least some fanfare. and it s not even because this was a film starring nicolas cage and since he gives a brauvara performance, this film is hardly worth his talents.

94 Baseline Algorithm (adapted from Pang and Lee) Tokeniza+on Feature Extrac+on Classifica+on using different classifiers Naïve Bayes MaxEnt SVM

95 Sen%ment Tokeniza%on Issues Deal with HTML and XML markup TwiXer mark- up (names, hash tags) Capitaliza+on (preserve for words in all caps) Phone numbers, dates Emo+cons Useful code: 21 Christopher PoXs sen+ment tokenizer Brendan O Connor twixer tokenizer PoXs emo+cons [<>]? # optional hat/brow! [:;=8] # eyes! [\-o\*\']? # optional nose! [\)\]\(\[ddpp/\:\}\{@\ \\] # mouth! #### reverse orientation! [\)\]\(\[ddpp/\:\}\{@\ \\] # mouth! [\-o\*\']? # optional nose! [:;=8] # eyes! [<>]? # optional hat/brow!

96 Extrac%ng Features for Sen%ment Classifica%on How to handle nega+on I didn t like this movie! vs I really like this movie! Which words to use? Only adjec+ves All words All words turns out to work bexer, at least on this data 22

97 Nega%on Das, Sanjiv and Mike Chen Yahoo! for Amazon: Extrac+ng market sen+ment from stock message boards. In Proceedings of the Asia Pacific Finance Associa+on Annual Conference (APFA). Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan Thumbs up? Sentiment Classification using Machine Learning Techniques. EMNLP-2002, Add NOT_ to every word between nega+on and following punctua+on: didn t like this movie, but I! didn t NOT_like NOT_this NOT_movie but I!

98 Reminder: Naïve Bayes c NB = argmax c j!c " i! positions P(c j ) P(w i c j ) ˆP(w c) = count(w,c)+1 count(c)+ V 24

99 Binarized (Boolean feature) Mul%nomial Naïve Bayes Intui+on: For sen+ment (and probably for other text classifica+on domains) Word occurrence may maxer more than word frequency The occurrence of the word fantas1c tells us a lot The fact that it occurs 5 +mes may not tell us much more. Boolean Mul+nomial Naïve Bayes Clips all the word counts in each document at 1 25

100 Boolean Mul%nomial Naïve Bayes: Learning From training corpus, extract Vocabulary Calculate P(c j ) terms Calculate P(w k c j ) terms For each c j in C do Text Remove j duplicates single doc in containing each doc: all docs j docs j all docs with class =c j For For each each word word w type k in Vocabulary w in doc j docs P(c j )! j n k Retain # of only occurrences a single instance of w k in of Text w j n total # documents P(w k c j )! k +! n +! Vocabulary

101 Boolean Mul%nomial Naïve Bayes on a test document d First remove all duplicate words from d Then compute NB using the same equa+on: c NB = argmax c j!c " i! positions P(c j ) P(w i c j ) 27

102 Normal vs. Boolean Mul%nomial NB Normal Doc Words Class Training 1 Chinese Beijing Chinese c 2 Chinese Chinese Shanghai c 3 Chinese Macao c 4 Tokyo Japan Chinese j Test 5 Chinese Chinese Chinese Tokyo Japan? Boolean Doc Words Class Training 1 Chinese Beijing c 2 Chinese Shanghai c 3 Chinese Macao c 4 Tokyo Japan Chinese j Test 28 5 Chinese Tokyo Japan?

103 Binarized (Boolean feature) Mul%nomial Naïve Bayes B. Pang, L. Lee, and S. Vaithyanathan Thumbs up? Sen+ment Classifica+on using Machine Learning Techniques. EMNLP- 2002, V. Metsis, I. Androutsopoulos, G. Paliouras Spam Filtering with Naive Bayes Which Naive Bayes? CEAS Third Conference on and An+- Spam. K.- M. Schneider On word frequency informa+on and nega+ve evidence in Naive Bayes text classifica+on. ICANLP, JD Rennie, L Shih, J Teevan Tackling the poor assump+ons of naive bayes text classifiers. ICML 2003 Binary seems to work bexer than full word counts This is not the same as Mul+variate Bernoulli Naïve Bayes MBNB doesn t work well for sen+ment or other text tasks Other possibility: log(freq(w)) 29

104 Cross- Valida%on Iteration Break up data into 10 folds (Equal posi+ve and nega+ve inside each fold?) For each fold Choose the fold as a temporary test set Train on 9 folds, compute performance on the test fold Report average performance of the 10 runs Test Training Test Training Training Test Training Training Test Training Test

105 Other issues in Classifica%on MaxEnt and SVM tend to do bexer than Naïve Bayes 31

106 Problems: What makes reviews hard to classify? Subtlety: Perfume review in Perfumes: the Guide: If you are reading this because it is your darling fragrance, please wear it at home exclusively, and tape the windows shut. Dorothy Parker on Katherine Hepburn She runs the gamut of emo+ons from A to B 32

107 Thwarted Expecta%ons and Ordering Effects This film should be brilliant. It sounds like a great plot, the actors are first grade, and the suppor+ng cast is good as well, and Stallone is axemp+ng to deliver a good performance. However, it can t hold up. Well as usual Keanu Reeves is nothing special, but surprisingly, the very talented Laurence Fishbourne is not so good either, I was surprised. 33

108 Sentiment Analysis A Baseline Algorithm

109 Sentiment Analysis Sen+ment Lexicons

110 The General Inquirer Philip J. Stone, Dexter C Dunphy, Marshall S. Smith, Daniel M. Ogilvie The General Inquirer: A Computer Approach to Content Analysis. MIT Press Home page: hxp:// List of Categories: hxp:// Spreadsheet: hxp:// Categories: Posi+v (1915 words) and Nega+v (2291 words) Strong vs Weak, Ac+ve vs Passive, Overstated versus Understated Pleasure, Pain, Virtue, Vice, Mo+va+on, Cogni+ve Orienta+on, etc Free for Research Use

111 LIWC (Linguis%c Inquiry and Word Count) Pennebaker, J.W., Booth, R.J., & Francis, M.E. (2007). Linguis+c Inquiry and Word Count: LIWC Aus+n, TX Home page: hxp:// words, >70 classes Affec%ve Processes nega+ve emo+on (bad, weird, hate, problem, tough) posi+ve emo+on (love, nice, sweet) Cogni%ve Processes Tenta+ve (maybe, perhaps, guess), Inhibi+on (block, constraint) Pronouns, Nega%on (no, never), Quan%fiers (few, many) $30 or $90 fee

112 MPQA Subjec%vity Cues Lexicon Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. Proc. of HLT-EMNLP Riloff and Wiebe (2003). Learning extraction patterns for subjective expressions. EMNLP Home page: hxp:// words from 8221 lemmas 2718 posi+ve 4912 nega+ve Each word annotated for intensity (strong, weak) GNU GPL 38

113 Bing Liu Opinion Lexicon Minqing Hu and Bing Liu. Mining and Summarizing Customer Reviews. ACM SIGKDD Bing Liu's Page on Opinion Mining hxp:// lexicon- English.rar 6786 words posi+ve 4783 nega+ve

114 Sen%WordNet Stefano Baccianella, Andrea Esuli, and Fabrizio Sebas+ani SENTIWORDNET 3.0: An Enhanced Lexical Resource for Sen+ment Analysis and Opinion Mining. LREC Home page: hxp://sen+wordnet.is+.cnr.it/ All WordNet synsets automa+cally annotated for degrees of posi+vity, nega+vity, and neutrality/objec+veness [es+mable(j,3)] may be computed or es+mated!pos 0 Neg 0 Obj 1! [es+mable(j,1)] deserving of respect or high regard!pos.75 Neg 0 Obj.25!

115 Disagreements between polarity lexicons Christopher PoXs, Sen+ment Tutorial, 2011 Opinion Lexicon General Inquirer Sen%WordNet LIWC MPQA 33/5402 (0.6%) 49/2867 (2%) 1127/4214 (27%) 12/363 (3%) Opinion Lexicon 32/2411 (1%) 1004/3994 (25%) 9/403 (2%) General Inquirer 520/2306 (23%) 1/204 (0.5%) Sen%WordNet 174/694 (25%) LIWC 41

116 Analyzing the polarity of each word in IMDB How likely is each word to appear in each sen+ment class? Count( bad ) in 1- star, 2- star, 3- star, etc. But can t use raw counts: Instead, likelihood: Make them comparable between words Scaled likelihood: PoXs, Christopher On the nega+vity of nega+on. SALT 20, P(w c) = P(w c) P(w) " w!c f (w,c) f (w, c)

117 Analyzing the polarity of each word in IMDB PoXs, Christopher On the nega+vity of nega+on. SALT 20, POS good (883,417 tokens) amazing (103,509 tokens) great (648,110 tokens) awesome (47,142 tokens) 0.28 l 0.27 l Scaled likelihood P(w c)/p(w) Pr(c w) l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l Rating NEG good (20,447 tokens) depress(ed/ing) (18,498 tokens) bad (368,273 tokens) terrible (55,492 tokens) 0.28 l Scaled likelihood P(w c)/p(w) l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l

118 Other sen%ment feature: Logical nega%on PoXs, Christopher On the nega+vity of nega+on. SALT 20, Is logical nega+on (no, not) associated with nega+ve sen+ment? PoXs experiment: Count nega+on (not, n t, no, never) in online reviews Regress against the review ra+ng

119 Po;s 2011 Results: More nega%on in nega%ve sen%ment a Scaled likelihood P(w c)/p(w)

120 Sentiment Analysis Sen+ment Lexicons

121 Sentiment Analysis Learning Sen+ment Lexicons

122 Semi- supervised learning of lexicons Use a small amount of informa+on A few labeled examples A few hand- built paxerns To bootstrap a lexicon 48

123 Hatzivassiloglou and McKeown intui%on for iden%fying word polarity Vasileios Hatzivassiloglou and Kathleen R. McKeown Predic+ng the Seman+c Orienta+on of Adjec+ves. ACL, Adjec+ves conjoined by and have same polarity Fair and legi+mate, corrupt and brutal *fair and brutal, *corrupt and legi+mate Adjec+ves conjoined by but do not fair but brutal 49

124 Hatzivassiloglou & McKeown 1997 Step 1 Label seed set of 1336 adjec+ves (all >20 in 21 million word WSJ corpus) posi+ve adequate central clever famous intelligent remarkable reputed sensi+ve slender thriving 679 nega+ve contagious drunken ignorant lanky listless primi+ve strident troublesome unresolved unsuspec+ng

125 Hatzivassiloglou & McKeown 1997 Step 2 Expand seed set to conjoined adjec+ves nice, helpful nice, classy 51

126 Hatzivassiloglou & McKeown 1997 Step 3 Supervised classifier assigns polarity similarity to each word pair, resul+ng in graph: helpful brutal nice corrupt irrational 52 fair classy

127 Hatzivassiloglou & McKeown 1997 Step 4 Clustering for par++oning the graph into two + - brutal helpful nice corrupt irrational 53 fair classy

128 Output polarity lexicon Posi+ve bold decisive disturbing generous good honest important large mature pa+ent peaceful posi+ve proud sound s+mula+ng straigh orward strange talented vigorous wixy Nega+ve ambiguous cau+ous cynical evasive harmful hypocri+cal inefficient insecure irra+onal irresponsible minor outspoken pleasant reckless risky selfish tedious unsupported vulnerable wasteful 54

129 Output polarity lexicon Posi+ve bold decisive disturbing generous good honest important large mature pa+ent peaceful posi+ve proud sound s+mula+ng straigh orward strange talented vigorous wixy Nega+ve ambiguous cau%ous cynical evasive harmful hypocri+cal inefficient insecure irra+onal irresponsible minor outspoken pleasant reckless risky selfish tedious unsupported vulnerable wasteful 55

130 Turney Algorithm Turney (2002): Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews 1. Extract a phrasal lexicon from reviews 2. Learn polarity of each phrase 3. Rate a review by the average polarity of its phrases 56

131 Extract two- word phrases with adjec%ves First Word Second Word Third Word (not extracted) JJ NN or NNS anything RB, RBR, RBS JJ Not NN nor NNS JJ JJ Not NN or NNS NN or NNS JJ Nor NN nor NNS RB, RBR, or RBS VB, VBD, VBN, VBG anything 57

132 How to measure polarity of a phrase? Posi+ve phrases co- occur more with excellent Nega+ve phrases co- occur more with poor But how to measure co- occurrence? 58

133 Pointwise Mutual Informa%on Mutual informa%on between 2 random variables X and Y I(X,Y ) =! P(x,y) P(x, y) log 2 P(x)P(y) x! y Pointwise mutual informa%on: How much more do events x and y co- occur than if they were independent? PMI(X,Y ) = log 2 P(x,y) P(x)P(y)

134 Pointwise Mutual Informa%on Pointwise mutual informa%on: How much more do events x and y co- occur than if they were independent? PMI(X,Y ) = log 2 P(x,y) P(x)P(y) PMI between two words: How much more do two words co- occur than if they were independent? PMI(word 1, word 2 ) = log 2 P(word 1,word 2 ) P(word 1 )P(word 2 )

135 How to Es%mate Pointwise Mutual Informa%on Query search engine (Altavista) P(word) es+mated by hits(word)/n! P(word 1,word 2 ) by hits(word1 NEAR word2)/n 2! PMI(word 1, word 2 ) = log 2 hits(word 1 NEAR word 2 ) hits(word 1 )hits(word 2 )

136 Does phrase appear more with poor or excellent? Polarity( phrase) = PMI( phrase,"excellent")! PMI( phrase,"poor") = log 2 hits(phrase NEAR "excellent") hits(phrase)hits("excellent") = log 2 hits(phrase NEAR "excellent") hits(phrase)hits("excellent")! log 2 hits(phrase NEAR "poor") hits(phrase)hits("poor") hits(phrase)hits("poor") hits(phrase NEAR "poor") 62! hits(phrase NEAR "excellent")hits("poor") $ = log 2 # & " hits(phrase NEAR "poor")hits("excellent")%

137 63 Phrases from a thumbs- up review Phrase POS tags Polarity online service JJ NN 2.8! online experience JJ NN 2.3! direct deposit JJ NN 1.3! local branch JJ NN 0.42! low fees JJ NNS 0.33! true service JJ NN -0.73! other bank JJ NN -0.85! inconveniently located JJ NN -1.5! Average 0.32!

138 64 Phrases from a thumbs- down review Phrase POS tags Polarity direct deposits JJ NNS 5.8! online web JJ NN 1.9! very handy RB JJ 1.4! virtual monopoly JJ NN -2.0! lesser evil RBR JJ -2.3! other problems JJ NNS -2.8! low funds JJ NNS -6.8! unethical prac+ces JJ NNS -8.5! Average -1.2!

139 Results of Turney algorithm 410 reviews from Epinions 170 (41%) nega+ve 240 (59%) posi+ve Majority class baseline: 59% Turney algorithm: 74% Phrases rather than words Learns domain- specific informa+on 65

140 Using WordNet to learn polarity WordNet: online thesaurus (covered in later lecture). Create posi+ve ( good ) and nega+ve seed- words ( terrible ) Find Synonyms and Antonyms Posi+ve Set: Add synonyms of posi+ve words ( well ) and antonyms of nega+ve words Nega+ve Set: Add synonyms of nega+ve words ( awful ) and antonyms of posi+ve words ( evil ) Repeat, following chains of synonyms Filter 66 S.M. Kim and E. Hovy Determining the sen+ment of opinions. COLING 2004 M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of KDD, 2004

141 Summary on Learning Lexicons Advantages: Can be domain- specific Can be more robust (more words) Intui+on Start with a seed set of words ( good, poor ) Find other words that have similar polarity: Using and and but Using words that occur nearby in the same document Using WordNet synonyms and antonyms

142 Sentiment Analysis Learning Sen+ment Lexicons

143 Sentiment Analysis Other Sen+ment Tasks

144 Finding sen%ment of a sentence Important for finding aspects or axributes Target of sen+ment The food was great but the service was awful! 70

145 Finding aspect/a;ribute/target of sen%ment M. Hu and B. Liu Mining and summarizing customer reviews. In Proceedings of KDD. S. Blair- Goldensohn, K. Hannan, R. McDonald, T. Neylon, G. Reis, and J. Reynar Building a Sen+ment Summarizer for Local Service Reviews. WWW Workshop. Frequent phrases + rules Find all highly frequent phrases across reviews ( fish tacos ) Filter by rules like occurs right a er sen+ment word great fish tacos means fish tacos a likely aspect Casino Children s Barber Greek Restaurant Department Store casino, buffet, pool, resort, beds haircut, job, experience, kids food, wine, service, appe+zer, lamb selec+on, department, sales, shop, clothing

146 Finding aspect/a;ribute/target of sen%ment The aspect name may not be in the sentence For restaurants/hotels, aspects are well- understood Supervised classifica+on Hand- label a small corpus of restaurant review sentences with aspect food, décor, service, value, NONE Train a classifier to assign an aspect to asentence Given this sentence, is the aspect food, décor, service, value, or NONE 72

147 PuGng it all together: Finding sen%ment for aspects S. Blair- Goldensohn, K. Hannan, R. McDonald, T. Neylon, G. Reis, and J. Reynar Building a Sen+ment Summarizer for Local Service Reviews. WWW Workshop Sentences & Phrases Sentences & Phrases Sentences & Phrases Reviews Final Summary Text Extractor Sentiment Classifier Aspect Extractor Aggregator 73

148 Results of Blair- Goldensohn et al. method Rooms (3/5 stars, 41 comments) (+) The room was clean and everything worked fine even the water pressure... (+) We went because of the free room and was pleasantly pleased... (- ) the worst hotel I had ever stayed at... Service (3/5 stars, 31 comments) (+) Upon checking out another couple was checking early due to a problem... (+) Every single hotel staff member treated us great and answered every... (- ) The food is cold and the service gives new meaning to SLOW. Dining (3/5 stars, 18 comments) (+) our favorite place to stay in biloxi.the food is great also the service... (+) Offer of free buffet for joining the Play

149 Baseline methods assume classes have equal frequencies! If not balanced (common in the real world) can t use accuracies as an evalua+on need to use F- scores Severe imbalancing also can degrade classifier performance Two common solu+ons: Resampling in training Random undersampling 2. Cost- sensi+ve learning Penalize SVM more for misclassifica+on of the rare thing

150 How to deal with 7 stars? 1. Map to binary Bo Pang and Lillian Lee Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. ACL, Use linear or ordinal regression Or specialized models like metric labeling 76

151 Summary on Sen%ment Generally modeled as classifica+on or regression task predict a binary or ordinal label Features: Nega+on is important Using all words (in naïve bayes) works well for some tasks Finding subsets of words may help in other tasks Hand- built polarity lexicons Use seeds and semi- supervised learning to induce lexicons

152 Scherer Typology of Affec%ve States Emo%on: brief organically synchronized evalua+on of a major event angry, sad, joyful, fearful, ashamed, proud, elated Mood: diffuse non- caused low- intensity long- dura+on change in subjec+ve feeling cheerful, gloomy, irritable, listless, depressed, buoyant Interpersonal stances: affec+ve stance toward another person in a specific interac+on friendly, flirta1ous, distant, cold, warm, suppor1ve, contemptuous AGtudes: enduring, affec+vely colored beliefs, disposi+ons towards objects or persons liking, loving, ha1ng, valuing, desiring Personality traits: stable personality disposi+ons and typical behavior tendencies nervous, anxious, reckless, morose, hos1le, jealous

153 Computa%onal work on other affec%ve states Emo%on: Detec+ng annoyed callers to dialogue system Detec+ng confused/frustrated versus confident students Mood: Finding trauma+zed or depressed writers Interpersonal stances: Detec+on of flirta+on or friendliness in conversa+ons Personality traits: Detec+on of extroverts

154 Detec%on of Friendliness Friendly speakers use collabora+ve conversa+onal style Laughter 80 Less use of nega+ve emo+onal words More sympathy That s too bad More agreement I think so too! Less hedges Ranganath, Jurafsky, McFarland I m sorry to hear that! kind of sort of a little!

155 Sentiment Analysis Other Sen+ment Tasks

Subjective Analysis of Text: Sentiment Analysis Opinion Analysis (using some material from Dan Jurafsky)

Subjective Analysis of Text: Sentiment Analysis Opinion Analysis (using some material from Dan Jurafsky) Subjective Analysis of Text: Sentiment Analysis Opinion Analysis (using some material from Dan Jurafsky) Why sentiment analysis? Movie: is this review positive or negative? Products: what do people think

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

Multilingual Sentiment and Subjectivity Analysis

Multilingual Sentiment and Subjectivity Analysis Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department

More information

A Comparison of Two Text Representations for Sentiment Analysis

A Comparison of Two Text Representations for Sentiment Analysis 010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational

More information

CS 446: Machine Learning

CS 446: Machine Learning CS 446: Machine Learning Introduction to LBJava: a Learning Based Programming Language Writing classifiers Christos Christodoulopoulos Parisa Kordjamshidi Motivation 2 Motivation You still have not learnt

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

Movie Review Mining and Summarization

Movie Review Mining and Summarization Movie Review Mining and Summarization Li Zhuang Microsoft Research Asia Department of Computer Science and Technology, Tsinghua University Beijing, P.R.China f-lzhuang@hotmail.com Feng Jing Microsoft Research

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models 1 Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models James B.

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina

More information

Case study Norway case 1

Case study Norway case 1 Case study Norway case 1 School : B (primary school) Theme: Science microorganisms Dates of lessons: March 26-27 th 2015 Age of students: 10-11 (grade 5) Data sources: Pre- and post-interview with 1 teacher

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

IN THIS UNIT YOU LEARN HOW TO: SPEAKING 1 Work in pairs. Discuss the questions. 2 Work with a new partner. Discuss the questions.

IN THIS UNIT YOU LEARN HOW TO: SPEAKING 1 Work in pairs. Discuss the questions. 2 Work with a new partner. Discuss the questions. 6 1 IN THIS UNIT YOU LEARN HOW TO: ask and answer common questions about jobs talk about what you re doing at work at the moment talk about arrangements and appointments recognise and use collocations

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com

More information

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer

More information

Using Games with a Purpose and Bootstrapping to Create Domain-Specific Sentiment Lexicons

Using Games with a Purpose and Bootstrapping to Create Domain-Specific Sentiment Lexicons Using Games with a Purpose and Bootstrapping to Create Domain-Specific Sentiment Lexicons Albert Weichselbraun University of Applied Sciences HTW Chur Ringstraße 34 7000 Chur, Switzerland albert.weichselbraun@htwchur.ch

More information

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Determining the Semantic Orientation of Terms through Gloss Classification

Determining the Semantic Orientation of Terms through Gloss Classification Determining the Semantic Orientation of Terms through Gloss Classification Andrea Esuli Istituto di Scienza e Tecnologie dell Informazione Consiglio Nazionale delle Ricerche Via G Moruzzi, 1 56124 Pisa,

More information

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Using Web Searches on Important Words to Create Background Sets for LSI Classification Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract

More information

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Richard Johansson and Alessandro Moschitti DISI, University of Trento Via Sommarive 14, 38123 Trento (TN),

More information

Robust Sense-Based Sentiment Classification

Robust Sense-Based Sentiment Classification Robust Sense-Based Sentiment Classification Balamurali A R 1 Aditya Joshi 2 Pushpak Bhattacharyya 2 1 IITB-Monash Research Academy, IIT Bombay 2 Dept. of Computer Science and Engineering, IIT Bombay Mumbai,

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

WELCOME! Of Social Competency. Using Social Thinking and. Social Thinking and. the UCLA PEERS Program 5/1/2017. My Background/ Who Am I?

WELCOME! Of Social Competency. Using Social Thinking and. Social Thinking and. the UCLA PEERS Program 5/1/2017. My Background/ Who Am I? Social Thinking and the UCLA PEERS Program Joan Storey Gorsuch, M.Ed. Social Champaign Champaign, Illinois j.s.gorsuch@gmail.com WELCOME! THE And Using Social Thinking and the UCLA PEERS Program Of Social

More information

Extracting and Ranking Product Features in Opinion Documents

Extracting and Ranking Product Features in Opinion Documents Extracting and Ranking Product Features in Opinion Documents Lei Zhang Department of Computer Science University of Illinois at Chicago 851 S. Morgan Street Chicago, IL 60607 lzhang3@cs.uic.edu Bing Liu

More information

Semantic and Context-aware Linguistic Model for Bias Detection

Semantic and Context-aware Linguistic Model for Bias Detection Semantic and Context-aware Linguistic Model for Bias Detection Sicong Kuang Brian D. Davison Lehigh University, Bethlehem PA sik211@lehigh.edu, davison@cse.lehigh.edu Abstract Prior work on bias detection

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

South Carolina English Language Arts

South Carolina English Language Arts South Carolina English Language Arts A S O F J U N E 2 0, 2 0 1 0, T H I S S TAT E H A D A D O P T E D T H E CO M M O N CO R E S TAT E S TA N DA R D S. DOCUMENTS REVIEWED South Carolina Academic Content

More information

A Vector Space Approach for Aspect-Based Sentiment Analysis

A Vector Space Approach for Aspect-Based Sentiment Analysis A Vector Space Approach for Aspect-Based Sentiment Analysis by Abdulaziz Alghunaim B.S., Massachusetts Institute of Technology (2015) Submitted to the Department of Electrical Engineering and Computer

More information

Online Updating of Word Representations for Part-of-Speech Tagging

Online Updating of Word Representations for Part-of-Speech Tagging Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org

More information

Tap vs. Bottled Water

Tap vs. Bottled Water Tap vs. Bottled Water CSU Expository Reading and Writing Modules Tap vs. Bottled Water Student Version 1 CSU Expository Reading and Writing Modules Tap vs. Bottled Water Student Version 2 Name: Block:

More information

(Sub)Gradient Descent

(Sub)Gradient Descent (Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

Virtually Anywhere Episodes 1 and 2. Teacher s Notes

Virtually Anywhere Episodes 1 and 2. Teacher s Notes Virtually Anywhere Episodes 1 and 2 Geeta and Paul are final year Archaeology students who don t get along very well. They are working together on their final piece of coursework, and while arguing over

More information

Writing Research Articles

Writing Research Articles Marek J. Druzdzel with minor additions from Peter Brusilovsky University of Pittsburgh School of Information Sciences and Intelligent Systems Program marek@sis.pitt.edu http://www.pitt.edu/~druzdzel Overview

More information

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Koshi Odagiri 1, and Yoichi Muraoka 1 1 Graduate School of Fundamental/Computer Science and Engineering, Waseda University,

More information

Emotions from text: machine learning for text-based emotion prediction

Emotions from text: machine learning for text-based emotion prediction Emotions from text: machine learning for text-based emotion prediction Cecilia Ovesdotter Alm Dept. of Linguistics UIUC Illinois, USA ebbaalm@uiuc.edu Dan Roth Dept. of Computer Science UIUC Illinois,

More information

Std: III rd. Subject: Morals cw.

Std: III rd. Subject: Morals cw. MORALS - CW Std: I rd. Subject: Morals cw. Sl. No Topic Peg No. 1. Being Brave. 2 2. Love of books. 3-4 3. Love hobby. 4 4. Love your Elders. 5 5. Kindness. 5-6 6. Love Mother India. 7 7. Nature loves

More information

What is Teaching? JOHN A. LOTT Professor Emeritus in Pathology College of Medicine

What is Teaching? JOHN A. LOTT Professor Emeritus in Pathology College of Medicine What is Teaching? JOHN A. LOTT Professor Emeritus in Pathology College of Medicine What is teaching? As I started putting this essay together, I realized that most of my remarks were aimed at students

More information

Matching Similarity for Keyword-Based Clustering

Matching Similarity for Keyword-Based Clustering Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web

More information

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Exploration. CS : Deep Reinforcement Learning Sergey Levine Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?

More information

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Active Learning. Yingyu Liang Computer Sciences 760 Fall Active Learning Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven,

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

Leadership Orange November 18, 2016

Leadership Orange November 18, 2016 Leadership Orange November 18, 2016 1 Curriculum & Instruc8on Understanding the Standards 2 Your child s experiences in school today are probably very different than what you experienced as a student.

More information

Detecting English-French Cognates Using Orthographic Edit Distance

Detecting English-French Cognates Using Orthographic Edit Distance Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National

More information

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz

More information

A non-profit educational institution dedicated to making the world a better place to live

A non-profit educational institution dedicated to making the world a better place to live NAPOLEON HILL FOUNDATION A non-profit educational institution dedicated to making the world a better place to live YOUR SUCCESS PROFILE QUESTIONNAIRE You must answer these 75 questions honestly if you

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence. NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and

More information

Introduction to Moodle

Introduction to Moodle Center for Excellence in Teaching and Learning Mr. Philip Daoud Introduction to Moodle Beginner s guide Center for Excellence in Teaching and Learning / Teaching Resource This manual is part of a serious

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Economics Unit: Beatrice s Goat Teacher: David Suits

Economics Unit: Beatrice s Goat Teacher: David Suits Economics Unit: Beatrice s Goat Teacher: David Suits Overview: Beatrice s Goat by Page McBrier tells the story of how the gift of a goat changed a young Ugandan s life. This story is used to introduce

More information

West s Paralegal Today The Legal Team at Work Third Edition

West s Paralegal Today The Legal Team at Work Third Edition Study Guide to accompany West s Paralegal Today The Legal Team at Work Third Edition Roger LeRoy Miller Institute for University Studies Mary Meinzinger Urisko Madonna University Prepared by Bradene L.

More information

Extracting Verb Expressions Implying Negative Opinions

Extracting Verb Expressions Implying Negative Opinions Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Extracting Verb Expressions Implying Negative Opinions Huayi Li, Arjun Mukherjee, Jianfeng Si, Bing Liu Department of Computer

More information

high writing writing high contests. school students student

high writing writing high contests. school students student Writing contests for high school students. It provides exercisesto practiset he stagesi ndividually (Appendix. In high cases, writing, you writing be asked to school on a high For or to Tsudents For contests..

More information

Truth Inference in Crowdsourcing: Is the Problem Solved?

Truth Inference in Crowdsourcing: Is the Problem Solved? Truth Inference in Crowdsourcing: Is the Problem Solved? Yudian Zheng, Guoliang Li #, Yuanbing Li #, Caihua Shan, Reynold Cheng # Department of Computer Science, Tsinghua University Department of Computer

More information

Distant Supervised Relation Extraction with Wikipedia and Freebase

Distant Supervised Relation Extraction with Wikipedia and Freebase Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational

More information

Using Hashtags to Capture Fine Emotion Categories from Tweets

Using Hashtags to Capture Fine Emotion Categories from Tweets Submitted to the Special issue on Semantic Analysis in Social Media, Computational Intelligence. Guest editors: Atefeh Farzindar (farzindaratnlptechnologiesdotca), Diana Inkpen (dianaateecsdotuottawadotca)

More information

The taming of the data:

The taming of the data: The taming of the data: Using text mining in building a corpus for diachronic analysis Stefania Degaetano-Ortlieb, Hannah Kermes, Ashraf Khamis, Jörg Knappen, Noam Ordan and Elke Teich Background Big data

More information

Loughton School s curriculum evening. 28 th February 2017

Loughton School s curriculum evening. 28 th February 2017 Loughton School s curriculum evening 28 th February 2017 Aims of this session Share our approach to teaching writing, reading, SPaG and maths. Share resources, ideas and strategies to support children's

More information

TextGraphs: Graph-based algorithms for Natural Language Processing

TextGraphs: Graph-based algorithms for Natural Language Processing HLT-NAACL 06 TextGraphs: Graph-based algorithms for Natural Language Processing Proceedings of the Workshop Production and Manufacturing by Omnipress Inc. 2600 Anderson Street Madison, WI 53704 c 2006

More information

Critical Thinking in Everyday Life: 9 Strategies

Critical Thinking in Everyday Life: 9 Strategies Critical Thinking in Everyday Life: 9 Strategies Most of us are not what we could be. We are less. We have great capacity. But most of it is dormant; most is undeveloped. Improvement in thinking is like

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

Genevieve L. Hartman, Ph.D.

Genevieve L. Hartman, Ph.D. Curriculum Development and the Teaching-Learning Process: The Development of Mathematical Thinking for all children Genevieve L. Hartman, Ph.D. Topics for today Part 1: Background and rationale Current

More information

Getting Started with Deliberate Practice

Getting Started with Deliberate Practice Getting Started with Deliberate Practice Most of the implementation guides so far in Learning on Steroids have focused on conceptual skills. Things like being able to form mental images, remembering facts

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion

More information

Lecture 1: Basic Concepts of Machine Learning

Lecture 1: Basic Concepts of Machine Learning Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010

More information

Machine Learning and Development Policy

Machine Learning and Development Policy Machine Learning and Development Policy Sendhil Mullainathan (joint papers with Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, Ziad Obermeyer) Magic? Hard not to be wowed But what makes

More information

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 98 (2016 ) 368 373 The 6th International Conference on Current and Future Trends of Information and Communication Technologies

More information

Introduction to Questionnaire Design

Introduction to Questionnaire Design Introduction to Questionnaire Design Why this seminar is necessary! Bad questions are everywhere! Don t let them happen to you! Fall 2012 Seminar Series University of Illinois www.srl.uic.edu The first

More information

Modeling user preferences and norms in context-aware systems

Modeling user preferences and norms in context-aware systems Modeling user preferences and norms in context-aware systems Jonas Nilsson, Cecilia Lindmark Jonas Nilsson, Cecilia Lindmark VT 2016 Bachelor's thesis for Computer Science, 15 hp Supervisor: Juan Carlos

More information

Fostering Success Coaching: Effective partnering with students from foster care. Maddy Day, MSW Jamie Crandell, MSW Courtney Maher

Fostering Success Coaching: Effective partnering with students from foster care. Maddy Day, MSW Jamie Crandell, MSW Courtney Maher Fostering Success Coaching: Effective partnering with students from foster care Maddy Day, MSW Jamie Crandell, MSW Courtney Maher Graphic courtesy of Foster Care Alumni of America. Fostercarealumni.org

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

Prediction of Maximal Projection for Semantic Role Labeling

Prediction of Maximal Projection for Semantic Role Labeling Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba

More information

Disambiguation of Thai Personal Name from Online News Articles

Disambiguation of Thai Personal Name from Online News Articles Disambiguation of Thai Personal Name from Online News Articles Phaisarn Sutheebanjard Graduate School of Information Technology Siam University Bangkok, Thailand mr.phaisarn@gmail.com Abstract Since online

More information

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

Theatre Arts Record Book

Theatre Arts Record Book Theatre Arts Record Book For use by New Jersey 4H Members in a Theatre Arts Project Written by Ellen Tillson Parker Somerset County 4H Member Name: Birthdate: Town: Grade: 4H County: Years in Project:

More information

Let's Learn English Lesson Plan

Let's Learn English Lesson Plan Let's Learn English Lesson Plan Introduction: Let's Learn English lesson plans are based on the CALLA approach. See the end of each lesson for more information and resources on teaching with the CALLA

More information