Lecture 2: Mixing Compositional Semantics and Machine Learning

Size: px
Start display at page:

Download "Lecture 2: Mixing Compositional Semantics and Machine Learning"

Transcription

1 Lecture 2: Mixing Compositional Semantics and Machine Learning Kyle Richardson April 14, 2016

2 Plan main paper: Liang and Potts 2015 (conceptual basis of class) secondary: Mooney 2007 (semantic parsing big ideas), Domingos 2012 (remarks about ML) 2

3 Classical Semantics vs. Statistical Semantics (caricature) Logical Semantics: Logic, algebra, set theory compositional analysis, beyond words, inference, brittle. Statistical Semantics: Optimization, algorithms, geometry distributional analysis, word-based, grounded, shallow. The types of approaches share the long-term vision of achieving deep natural language understanding... 3

4 Montague-style Compositional Semantics Principle of Compositionality: The meaning of a complex expression is a function of the meaning of its parts and the rules that combine them. Example: John studies. john John (λx.(study x)) studies 4

5 Montague-style Compositional Semantics Principle of Compositionality: The meaning of a complex expression is a function of the meaning of its parts and the rules that combine them. Example: John studies. john John (λx.(study x)) studies (λx.(study x))(john) (study john ) {True, False} john John (λx.(study x)) studies 4

6 A mini functional interpreter (python) Principle of Compositionality: The meaning of a complex expression is a function of the meaning of its parts and the rules that combine them. Example: John studies. john John (λx.(study x)) studies >>> students studying = set([ john, mary ]) >>> study = lambda x : x in students studying >>> fun application = lambda fun, val : fun(val) 5

7 A mini functional interpreter (python) Principle of Compositionality: The meaning of a complex expression is a function of the meaning of its parts and the rules that combine them. Example: John studies. john John (λx.(study x)) studies >>> students studying = set([ john, mary ]) >>> study = lambda x : x in students studying >>> fun application = lambda fun, val : fun(val) >>> fun application(study, bill ) ## What will we get? 5

8 A mini functional interpreter (python) Principle of Compositionality: The meaning of a complex expression is a function of the meaning of its parts and the rules that combine them. Example: John studies. john John (λx.(study x)) studies >>> students studying = set([ john, mary ]) >>> study = lambda x : x in students studying >>> fun application = lambda fun, val : fun(val) >>> fun application(study, bill ) ## What will we get? >>> False 5

9 A mini functional interpreter (python) Principle of Compositionality: The meaning of a complex expression is a function of the meaning of its parts and the rules that combine them. Example: John studies. john John (λx.(study x)) studies >>> students studying = set([ john, mary ]) >>> study = lambda x : x in students studying >>> fun application = lambda fun, val : fun(val) >>> fun application(study, mary ) ## What will we get? 6

10 A mini functional interpreter (python) Principle of Compositionality: The meaning of a complex expression is a function of the meaning of its parts and the rules that combine them. Example: John studies. john John (λx.(study x)) studies >>> students studying = set([ john, mary ]) >>> study = lambda x : x in students studying >>> fun application = lambda fun, val : fun(val) >>> fun application(study, mary ) ## What will we get? >>> True 6

11 Montague-style Compositional Semantics Principle of Compositionality: The meaning of a complex expression is a function of the meaning of its parts and the rules that combine them. Example: Bill does not study. bill Bill (λx.(study x)) study (λf.λx.(not (f x))) does not 7

12 Montague-style Compositional Semantics Principle of Compositionality: The meaning of a complex expression is a function of the meaning of its parts and the rules that combine them. Example: Bill does not study. bill Bill (λx.(study x)) study (λf.λx.(not (f x))) does not (λx.(not (study x)))(bill) bill (λx.(not (study x))) Bill (λf.λx.(not (f x))) does not (λx.(study x)) study 7

13 A mini functional interpreter (python) Principle of Compositionality: The meaning of a complex expression is a function of the meaning of its parts and the rules that combine them. Example: Bill does not study. bill Bill (λx.(study x)) study (λf.λx.(not (f x))) does not >>> students studying = set([ john, mary ]) >>> study = lambda x : x in students studying >>> fun application = lambda fun, val : fun(val) >>> neg = lambda F : (lambda x : not F(x)) 8

14 A mini functional interpreter (python) Principle of Compositionality: The meaning of a complex expression is a function of the meaning of its parts and the rules that combine them. Example: Bill does not study. bill Bill (λx.(study x)) study (λf.λx.(not (f x))) does not >>> students studying = set([ john, mary ]) >>> study = lambda x : x in students studying >>> fun application = lambda fun, val : fun(val) >>> neg = lambda F : (lambda x : not F(x)) >>> neg(study)( bill ) # True 8

15 A mini functional interpreter (python) Principle of Compositionality: The meaning of a complex expression is a function of the meaning of its parts and the rules that combine them. Example: Bill does not study. bill Bill (λx.(study x)) study (λf.λx.(not (f x))) does not >>> students studying = set([ john, mary ]) >>> study = lambda x : x in students studying >>> fun application = lambda fun, val : fun(val) >>> neg = lambda F : (lambda x : not F(x)) >>> neg(study)( bill ) # True >>> fun application(neg,study)( bill ) 8

16 A mini functional interpreter (python) Principle of Compositionality: The meaning of a complex expression is a function of the meaning of its parts and the rules that combine them. Example: Bill does not study. bill Bill (λx.(study x)) study (λf.λx.(not (f x))) does not >>> students studying = set([ john, mary ]) >>> study = lambda x : x in students studying >>> fun application = lambda fun, val : fun(val) >>> neg = lambda F : (lambda x : not F(x)) >>> neg(study)( bill ) # True >>> fun application(neg,study)( bill ) >>> fun application(fun application(neg,study), bill ) 8

17 A mini functional interpreter (python) Principle of Compositionality: The meaning of a complex expression is a function of the meaning of its parts and the rules that combine them. Example: Bill does not study. bill Bill (λx.(study x)) study (λf.λx.(not (f x))) does not >>> students studying = set([ john, mary ]) >>> study = lambda x : x in students studying >>> fun application = lambda fun, val : fun(val) >>> neg = lambda F : (lambda x : not F(x)) >>> neg(study)( bill ) # True >>> fun application(neg,study)( bill ) >>> fun application(fun application(neg,study), bill ) >>> neg(neg(sleep))( bill ) 8

18 Montague-style Compositional Semantics: What s needed Principle of Compositionality: The meaning of a complex expression is a function of the meaning of its parts and the rules that combine them. Example: Bill does not study. bill Bill (λx.(study x)) study (λf.λx.(not (f x))) does not Grammar rules for building syntactic structure. Interpretation rules to composing meaning. Decoding algorithm for generating structures 9

19 Montague-style Compositional Semantics: Issues Principle of Compositionality: The meaning of a complex expression is a function of the meaning of its parts and the rules that combine them. Example: Bill does not study. bill Bill (λx.(study x)) study (λf.λx.(not (f x))) does not Features and (Computational) Issues: compositional, provides a full analysis. supports further inferencing 10

20 Montague-style Compositional Semantics: Issues Principle of Compositionality: The meaning of a complex expression is a function of the meaning of its parts and the rules that combine them. Example: Bill does not study. bill Bill (λx.(study x)) study (λf.λx.(not (f x))) does not Features and (Computational) Issues: compositional, provides a full analysis. supports further inferencing issue: Does not provide an analysis of words (not grounded). 10

21 Montague-style Compositional Semantics: Issues Principle of Compositionality: The meaning of a complex expression is a function of the meaning of its parts and the rules that combine them. Example: Bill does not study. bill Bill (λx.(study x)) study (λf.λx.(not (f x))) does not Features and (Computational) Issues: compositional, provides a full analysis. supports further inferencing issue: issue: Does not provide an analysis of words (not grounded). Is brittle, cannot handle uncertainty. 10

22 Montague-style Compositional Semantics: Issues Principle of Compositionality: The meaning of a complex expression is a function of the meaning of its parts and the rules that combine them. Example: Bill does not study. bill Bill (λx.(study x)) study (λf.λx.(not (f x))) does not Features and (Computational) Issues: compositional, provides a full analysis. supports further inferencing issue: issue: issue: Does not provide an analysis of words (not grounded). Is brittle, cannot handle uncertainty. Says nothing about how the translation to logic works. 10

23 Statistical Approaches to Semantics Statistical semantics hypothesis: Statistical patterns of human word usage can be used to figure out what people mean Turney et al. (2010) corpus word-context matrix The furry dog is walking outside... furry walking shiny driving The shiny car is driving... dog A furry cat is walking around... cat A shiny bike is driving... car bike

24 Statistical Approaches to Semantics Statistical semantics hypothesis: Statistical patterns of human word usage can be used to figure out what people mean Turney et al. (2010) corpus word-context matrix furry walking shiny driving dog cat car bike

25 Example Tasks and Applications: Turney et al. (2010) Statistical semantic models are often used in downstream classification or clustering tasks/applications. Term-document matrices Document retrieval/clustering/classification. Question Answering and Retrieval. Essay scoring. Word-Context Matrices Word similarity/clustering/classification Word-sense disambiguation Automatic thesaurus generation/paraphrasing Pair-pair matrices Relational similarity/clustering/classification. Analogy comparison. 13

26 Statistical Approaches to Semantics Statistical semantics hypothesis: Statistical patterns of human word usage can be used to figure out what people mean Turney et al. (2010) corpus word-context matrix The furry dog is walking outside... furry walking shiny driving The shiny car is driving... dog A furry cat is walking around... cat A shiny bike is driving... car bike Features and Issues (caricature): Robust, requires little manual effort, grounded Can provide rich analysis of content words. 14

27 Statistical Approaches to Semantics Statistical semantics hypothesis: Statistical patterns of human word usage can be used to figure out what people mean Turney et al. (2010) corpus word-context matrix The furry dog is walking outside... furry walking shiny driving The shiny car is driving... dog A furry cat is walking around... cat A shiny bike is driving... car bike Features and Issues (caricature): Robust, requires little manual effort, grounded Can provide rich analysis of content words. issue: Hard to scale beyond words. 14

28 Statistical Approaches to Semantics Statistical semantics hypothesis: Statistical patterns of human word usage can be used to figure out what people mean Turney et al. (2010) corpus word-context matrix The furry dog is walking outside... furry walking shiny driving The shiny car is driving... dog A furry cat is walking around... cat A shiny bike is driving... car bike Features and Issues (caricature): Robust, requires little manual effort, grounded Can provide rich analysis of content words. issue: Hard to scale beyond words. issue: In general, hard to model logical operations, shallow. 14

29 Mixing compositional and statistical semantics Desiderata: Want a model of semantics that is robust, reflects real-word usage and learnable, but one that is also compositional. 15

30 Mixing compositional and statistical semantics Desiderata: Want a model of semantics that is robust, reflects real-word usage and learnable, but one that is also compositional. Generalization 15

31 Mixing compositional and statistical semantics Desiderata: Want a model of semantics that is robust, reflects real-word usage and learnable, but one that is also compositional. Generalization Logical semantics: generalize using composition and abstract recursive structures. 15

32 Mixing compositional and statistical semantics Desiderata: Want a model of semantics that is robust, reflects real-word usage and learnable, but one that is also compositional. Generalization Logical semantics: generalize using composition and abstract recursive structures. Machine Learning (classification): learns generalizations through real-world examples (e.g. target input-output) 15

33 Mixing compositional and statistical semantics Desiderata: Want a model of semantics that is robust, reflects real-word usage and learnable, but one that is also compositional. Generalization Logical semantics: generalize using composition and abstract recursive structures. Machine Learning (classification): learns generalizations through real-world examples (e.g. target input-output) Bridge: get our learning to target compositional structures. 15

34 A simple model: Liang and Potts Model: a simple discriminative learning framework. compositional model: (semantic) context-free grammar. learning model: linear classification and first-order optimization. 16

35 Compositional Model: Linguistic Objects: < u, s, d > u: utterance s: semantic representation (symbolized as ˆuˆ) d: denotation (symbolized as s ) 17

36 Compositional Model: Linguistic Objects: < u, s, d > u: utterance s: semantic representation (symbolized as ˆuˆ) d: denotation (symbolized as s ) Example: < seven minus five, (- 7 5), 2 > 17

37 Compositional Model: Linguistic Objects: < u, s, d > u: s: d: utterance semantic representation (symbolized as ˆuˆ) denotation (symbolized as s ) Example: < seven minus five, (- 7 5), 2 > < minus times, (* (- 2 2) 2), 0 > 17

38 Compositional Model: Linguistic Objects: < u, s, d > u: s: d: utterance semantic representation (symbolized as ˆuˆ) denotation (symbolized as s ) Example: < seven minus five, (- 7 5), 2 > < minus times, (* (- 2 2) 2), 0 > semantic parsing: u s 17

39 Compositional Model: Linguistic Objects: < u, s, d > u: s: d: utterance semantic representation (symbolized as ˆuˆ) denotation (symbolized as s ) Example: < seven minus five, (- 7 5), 2 > < minus times, (* (- 2 2) 2), 0 > semantic parsing: u s interpretation: s d 17

40 Computational Modeling: The full picture Standard processing pipeline input List samples that contain every major element Semantic Parsing sem (FOR EVERY X / MAJORELT : T; (FOR EVERY Y / SAMPLE : (CONTAINS Y X); (PRINTOUT Y))) Knowledge Representation Interpretation world sem ={S10019,S10059,...} Lunar QA system (Woods (1973)) 18

41 Compositional Model: Context-free grammar provides the background grammar and interpretation rules 19

42 Compositional Model: Context-free grammar provides the background grammar and interpretation rules example: u = times plus three N: (plus (mult 2 2) 3) N : (mult 2 2) R : plus N : 3 R : mult plus three times 20

43 Compositional Model: Context-free grammar provides the background grammar and interpretation rules example: u = times plus three N: (plus (mult 2 2) 3) N : (mult 2 2) R : plus N : 3 R : mult plus three times >>> plus = lambda x,y : x + y >>> mult = lambda x,y : x * y 20

44 Compositional Model: Context-free grammar provides the background grammar and interpretation rules example: u = times plus three N: (plus (mult 2 2) 3) N : (mult 2 2) R : plus N : 3 R : mult plus three times >>> plus = lambda x,y : x + y >>> mult = lambda x,y : x * y >>> plus(2,2) # 4 20

45 Compositional Model: Context-free grammar provides the background grammar and interpretation rules example: u = times plus three N: (plus (mult 2 2) 3) N : (mult 2 2) R : plus N : 3 R : mult plus three times >>> plus = lambda x,y : x + y >>> mult = lambda x,y : x * y >>> plus(plus(2,3),2) # 7 21

46 Compositional Model: Context-free grammar provides the background grammar and interpretation rules example: u = times plus three N: (plus (mult 2 2) 3) N : (mult 2 2) R : plus N : 3 R : mult plus three times >>> plus = lambda x,y : x + y >>> mult = lambda x,y : x * y >>> plus(mult(2,2),3) # 7 22

47 Compositional Model: Components Components: Grammar rules for building syntactic structure. 23

48 Compositional Model: Components Components: Grammar rules for building syntactic structure. Interpretation rules to composing meaning. 23

49 Compositional Model: Components Components: Grammar rules for building syntactic structure. Interpretation rules to composing meaning. Decoding algorithm for generating structures (later lecture) 23

50 Compositional Model: Components Components: Grammar rules for building syntactic structure. Interpretation rules to composing meaning. Decoding algorithm for generating structures (later lecture) Rule extraction (later lecture) 23

51 Compositional Model: Components Components: Grammar rules for building syntactic structure. Interpretation rules to composing meaning. Decoding algorithm for generating structures (later lecture) Rule extraction (later lecture) Issues: 23

52 Compositional Model: Components Components: Grammar rules for building syntactic structure. Interpretation rules to composing meaning. Decoding algorithm for generating structures (later lecture) Rule extraction (later lecture) Issues: example: u = times plus three N: (plus (mult 2 2) 3) N: (plus (plus 2 2) 3) N : (mult 2 2) R : plus N : 3 N : (plus 2 2) R : plus N : 3 R : mult plus three R : plus plus three times times 23

53 Compositional Model: Components Components: Grammar rules for building syntactic structure. Interpretation rules to composing meaning. Decoding algorithm for generating structures (later lecture) Issues: example: u = times plus three N: (plus (mult 2 2) 3) N: (mult (plus 2 2) 3) N : (mult 2 2) R : plus N : 3 N : (plus 2 2) R : mult N : 3 R : mult plus three R : mult plus three times times 24

54 Compositional Model: Components Components: Grammar rules for building syntactic structure. Interpretation rules to composing meaning. Decoding algorithm for generating structures (later lecture) Issues: example: u = times plus three N: (plus (mult 2 2) 3) N: (mult 2 (plus 2 3)) N : (mult 2 2) R : plus N : 3 R : mult N : (plus 2 3) R : mult plus three times R: plus N : 3 times plus three 25

55 Learning Model Goal: Helps us learn the correct derivations and handle uncertainty (word mappings, composition). Classifier: a system that inputs a vector of discrete and/or continuous feature values and outputs a single discrete value, the class. Domingos (2012). 26

56 Learning Model Goal: Helps us learn the correct derivations and handle uncertainty (word mappings, composition). Classifier: a system that inputs a vector of discrete and/or continuous feature values and outputs a single discrete value, the class. Domingos (2012). 26

57 Learning Model Goal: Helps us learn the correct derivations and handle uncertainty (word mappings, composition). Classifier: a system that inputs a vector of discrete and/or continuous feature values and outputs a single discrete value, the class. Domingos (2012). Components 26

58 Learning Model Goal: Helps us learn the correct derivations and handle uncertainty (word mappings, composition). Classifier: a system that inputs a vector of discrete and/or continuous feature values and outputs a single discrete value, the class. Domingos (2012). Components training data D = {(xi, y i ) i...n} 26

59 Learning Model Goal: Helps us learn the correct derivations and handle uncertainty (word mappings, composition). Classifier: a system that inputs a vector of discrete and/or continuous feature values and outputs a single discrete value, the class. Domingos (2012). Components training data D = {(xi, y i ) i...n} feature representation of data 26

60 Learning Model Goal: Helps us learn the correct derivations and handle uncertainty (word mappings, composition). Classifier: a system that inputs a vector of discrete and/or continuous feature values and outputs a single discrete value, the class. Domingos (2012). Components training data D = {(xi, y i ) i...n} feature representation of data scoring and objective function 26

61 Learning Model Goal: Helps us learn the correct derivations and handle uncertainty (word mappings, composition). Classifier: a system that inputs a vector of discrete and/or continuous feature values and outputs a single discrete value, the class. Domingos (2012). Components training data D = {(xi, y i ) i...n} feature representation of data scoring and objective function optimization procedure 26

62 Training data Goal: Find the correct derivations and output using our compositional model 27

63 Training data Goal: Find the correct derivations and output using our compositional model Logical forms (more information) (u = minus times, s = (* (- 2 2) 2)) 27

64 Training data Goal: Find the correct derivations and output using our compositional model Logical forms (more information) (u = minus times, s = (* (- 2 2) 2)) Denotations (less information) (u = minus times, r = 0) 27

65 Training data Goal: Find the correct derivations and output using our compositional model Logical forms (more information) (u = minus times, s = (* (- 2 2) 2)) Denotations (less information) (u = minus times, r = 0) Weakly Supervised: the learner. In both cases, details are still hidden from 27

66 Learning from Semantic Representations example: ( times plus three,(plus (mult 2 2) 3)) N: (plus (mult 2 2) 3) N: (plus (plus 2 2) 3) N : (mult 2 2) R : plus N : 3 N : (plus 2 2) R : plus N : 3 R : mul plus three R : plus plus three times times 28

67 Learning from Semantic Representations example: ( times plus three,(plus (mult 2 2) 3)) N: (plus (mult 2 2) 3) N: (plus (plus 2 2) 3) N : (mult 2 2) R : plus N : 3 N : (plus 2 2) R : plus N : 3 R : mul plus three R : plus plus three times times Trade off: More information (good) but more annotation (bad) 28

68 Learning from Denotations example: ( times plus three,7) N: (plus (mult 2 2) 3) N: (plus (plus 2 2) 3) N : (mult 2 2) R : plus N : 3 N : (plus 2 2) R : plus N : 3 R : mul plus three R : plus plus three times times 29

69 Learning from Denotations example: ( times plus three,7) N: (plus (mult 2 2) 3) N: (plus (plus 2 2) 3) N : (mult 2 2) R : plus N : 3 N : (plus 2 2) R : plus N : 3 R : mul plus three R : plus plus three times times Trade off: Less annotation (good) but less information (maybe bad) 29

70 Weak Supervision Goal: Find the correct derivations and output using our compositional model Logical forms (more information) (u = minus times, s = (* (- 2 2) 2)) Denotations (less information) (u = minus times, r = 0) Current learning methods for NLP require annotating large corpora with supervisory information...[e.g. pos tags, syntactic parse trees, semantic role labels]... Building such corpora is an expensive, arduous task. As one moves towards deeper semantic analysis the annotation task becomes increasingly more difficult and complex. Mooney (2008) 30

71 Feature Representations: General Remark At the end of the day, some machine learning projects succeed and fail. What makes the difference? Easily the most important factor is the features used. Domingos (2012) 31

72 Feature selection and overfitting What if the knowledge and data we have are not sufficient to completely determine the correct classifier? Then we run the risk of just hallucinating a classifier (or parts of it) that is not grounded in reality.. This problem is called overfitting. Domingos (2012) Bias: Tendency to consistently learn the wrong thing. Variance: Tendency to learn random things irrespective of the real signal. 32

73 Good vs. Bad Feature Selection 33

74 Feature Extraction Example input: x = times plus three. y 1 = N: (plus (mult 2 2) 3) y 2 = N: (plus (plus 2 2) 3) N : (mult 2 2) R : plus N : 3 N : (plus 2 2) R : plus N : 3 R : mult plus three R : plus plus three times times φ(x,y 1 ) = R : mult [ times ] 1 R : plus [ plus ] 1 top [ R : plus ] 1... φ(x,y 2 ) = R : plus [ times ] 1 R : plus [ plus ] 1 top [ R : plus ]

75 Scoring Function (Linear) Score Function Score w (x,y) = w φ(x, y) = d j=1 w jφ(x, y) 35

76 Scoring Function (Linear) Score Function Score w (x,y) = w φ(x, y) = d j=1 w jφ(x, y) weight vector w = [w 1 = 0.1 w 2 = 0.2 w 3 = ] 35

77 Scoring Function (Linear) Score Function Score w (x,y) = w φ(x, y) = d j=1 w jφ(x, y) weight vector w = [w 1 = 0.1 w 2 = 0.2 w 3 = ] φ(x,y 2 ) = w 1 R : plus [ times ] 1 w 2 R : plus [ plus ] 1 w 3 top [ R : plus ] 1... score w (x, y 2 ) = w φ(x, y 2 ) = ( ) + ( ) + ( ) 35

78 Scoring Function (Linear) Score Function Score w (x,y) = w φ(x, y) = d j=1 w jφ(x, y) weight vector w = [w 1 = 0.1 w 2 = 0.2 w 3 = ] prediction: arg-max y Y Score w (x, y) 36

79 Objectives: What do we want to learn? (informal) General Idea: want to learn a model (or weight vector) that can distinguish correct and incorrect derivations. y 1 = N: (plus (mult 2 2) 3) y 2 = N: (plus (plus 2 2) 3) N : (mult 2 2) R : plus N : 3 N : (plus 2 2) R : plus N : 3 R : mult plus three R : plus plus three times times φ(x,y 1 ) = R : mult [ times ] 1 R : plus [ plus ] 1 top [ R : plus ] 1... φ(x,y 2 ) = R : plus [ times ] 1 R : plus [ plus ] 1 top [ R : plus ]

80 Objectives: What do we want to learn? (informal) General Idea: want to learn a model (or weight vector) that can distinguish correct and incorrect derivations. y 1 = N: (plus (mult 2 2) 3) N: (mult 2 (plus 2 3)) N : (mult 2 2) R : plus N : 3 R : mult N : (plus 2 3) R : mult plus three times R: plus N : 3 times plus three φ(x,y 1 ) = R : mult [ times ] 1 R : plus [ plus ] 1 plus [ R : mult ] 1... φ(x,y 2 ) = R : plus [ times ] 1 R : plus [ plus ] 1 mult [ R : plus ]

81 Objectives: What do we want to learn? (formal) hinge loss: (learning from logical forms) min w R d n max y Y [Score w (x, y )+c(y, y )] Score w (x, y) (x,y) D ( minus times, s = (* (- 2 2) 2)) 39

82 Objectives: What do we want to learn? (formal) hinge loss: (learning from logical forms) min w R d n (x,y) D max y Y [Score w (x, y )+c(y, y )] Score w (x, y) ( minus times, s = (* (- 2 2) 2)) In English: select parameters that minimize the cumulative loss over the training data. 39

83 Objectives: What do we want to learn? (formal) hinge loss: (learning from logical forms) min w R d n (x,y) D max y Y [Score w (x, y )+c(y, y )] Score w (x, y) ( minus times, s = (* (- 2 2) 2)) In English: select parameters that minimize the cumulative loss over the training data. Missing: A decoding algorithm for generating Y (not trivial, Y might be very large). 39

84 Optimization: How do I achieve this objective? Stochastic gradient descent: An online learning and optimization algorithm (more about this in future lectures). 40

85 Optimization: Illustration 41

86 Learning Model Components training data: D = {(xi, y i ) i...n} 42

87 Learning Model Components training data: D = {(xi, y i ) i...n} feature representation of data 42

88 Learning Model Components training data: D = {(xi, y i ) i...n} feature representation of data scoring and objective function 42

89 Learning Model Components training data: D = {(xi, y i ) i...n} feature representation of data scoring and objective function optimization procedure 42

90 Learning Model Components training data: D = {(xi, y i ) i...n} feature representation of data scoring and objective function optimization procedure Important Ideas What kind of data do we learn from? (differs quite a bit) What kind of features do we need? 42

91 Experimentation and Evaluation Training Set: Test Set: A portion of the data to train model on. An unseen portion of the data to evaluate on. Dev Set : (optional) An unseen portion of the data for analysis, tuning hyper parameters,.. 43

92 Experimentation and Evaluation Training Set: Test Set: A portion of the data to train model on. An unseen portion of the data to evaluate on. Dev Set : (optional) An unseen portion of the data for analysis, tuning hyper parameters,.. Evaluation1: Given unseen examples, how often does my model produce the correct output semantic representation? Evaluation2: Given unseen examples, how often does my model produce the correct output answer? 43

93 Conclusions and Take Aways Presented a simple model that mixes machine learning and compositional semantics. Conceptually describes most of the work in this class. Technically describes many of the models we will use. Fundamental Problem: Which semantics representations do we use, and what do we learn from? 44

94 Conclusions and Take Aways Presented a simple model that mixes machine learning and compositional semantics. Conceptually describes most of the work in this class. Technically describes many of the models we will use. Fundamental Problem: Which semantics representations do we use, and what do we learn from? Question: Does this particular actually work? 44

95 Conclusions and Take Aways Presented a simple model that mixes machine learning and compositional semantics. Conceptually describes most of the work in this class. Technically describes many of the models we will use. Fundamental Problem: Which semantics representations do we use, and what do we learn from? Question: Does this particular actually work? Yes! Liang et al. (2011) (lecture 5), Berant et al. (2013); Berant and Liang (2014) (presentation papers) 44

96 Roadmap Lecture 2: Lecture 3: Lecture 4: Lecture 5: rule extraction, decoding (parsing perspective) rule extraction, decoding (MT perspective) structured classification and prediction. grounded learning (might skip). 45

97 References I Berant, J., Chou, A., Frostig, R., and Liang, P. (2013). Semantic parsing on Freebase from question-answer pairs. In in Proceedings of EMNLP-2013, pages Berant, J. and Liang, P. (2014). Semantic parsing via paraphrasing. In ACL (1), pages Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10): Liang, P., Jordan, M. I., and Klein, D. (2011). Learning dependency-based compositional semantics. In Proceedings of ACL-11, pages Mooney, R. (2008). Learning to connect language and perception. In Proceedings of AAAI Turney, P. D., Pantel, P., et al. (2010). From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research, 37(1): Woods, W. A. (1973). Progress in natural language understanding: an application to lunar geology. In Proceedings of the June 4-8, 1973, National Computer Conference and Exposition, pages

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

(Sub)Gradient Descent

(Sub)Gradient Descent (Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

CSL465/603 - Machine Learning

CSL465/603 - Machine Learning CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach #BaselOne7 Deep search Enhancing a search bar using machine learning Ilgün Ilgün & Cedric Reichenbach We are not researchers Outline I. Periscope: A search tool II. Goals III. Deep learning IV. Applying

More information

Lecture 1: Basic Concepts of Machine Learning

Lecture 1: Basic Concepts of Machine Learning Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Applications of memory-based natural language processing

Applications of memory-based natural language processing Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

Calibration of Confidence Measures in Speech Recognition

Calibration of Confidence Measures in Speech Recognition Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

More information

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.

More information

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cl] 2 Apr 2017 Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,

More information

Introduction to Simulation

Introduction to Simulation Introduction to Simulation Spring 2010 Dr. Louis Luangkesorn University of Pittsburgh January 19, 2010 Dr. Louis Luangkesorn ( University of Pittsburgh ) Introduction to Simulation January 19, 2010 1 /

More information

Second Exam: Natural Language Parsing with Neural Networks

Second Exam: Natural Language Parsing with Neural Networks Second Exam: Natural Language Parsing with Neural Networks James Cross May 21, 2015 Abstract With the advent of deep learning, there has been a recent resurgence of interest in the use of artificial neural

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad

More information

Evolutive Neural Net Fuzzy Filtering: Basic Description

Evolutive Neural Net Fuzzy Filtering: Basic Description Journal of Intelligent Learning Systems and Applications, 2010, 2: 12-18 doi:10.4236/jilsa.2010.21002 Published Online February 2010 (http://www.scirp.org/journal/jilsa) Evolutive Neural Net Fuzzy Filtering:

More information

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Active Learning. Yingyu Liang Computer Sciences 760 Fall Active Learning Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven,

More information

Knowledge Transfer in Deep Convolutional Neural Nets

Knowledge Transfer in Deep Convolutional Neural Nets Knowledge Transfer in Deep Convolutional Neural Nets Steven Gutstein, Olac Fuentes and Eric Freudenthal Computer Science Department University of Texas at El Paso El Paso, Texas, 79968, U.S.A. Abstract

More information

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Richard Johansson and Alessandro Moschitti DISI, University of Trento Via Sommarive 14, 38123 Trento (TN),

More information

Proof Theory for Syntacticians

Proof Theory for Syntacticians Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 5, 2012 Logics for Linguistics Many different kinds of logic are directly applicable to formalizing theories in syntax

More information

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers Chad Langley, Alon Lavie, Lori Levin, Dorcas Wallace, Donna Gates, and Kay Peterson Language Technologies Institute Carnegie

More information

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and

More information

Online Updating of Word Representations for Part-of-Speech Tagging

Online Updating of Word Representations for Part-of-Speech Tagging Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org

More information

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,

More information

A Comparison of Two Text Representations for Sentiment Analysis

A Comparison of Two Text Representations for Sentiment Analysis 010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,

More information

Chapter 2 Rule Learning in a Nutshell

Chapter 2 Rule Learning in a Nutshell Chapter 2 Rule Learning in a Nutshell This chapter gives a brief overview of inductive rule learning and may therefore serve as a guide through the rest of the book. Later chapters will expand upon the

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Beyond the Pipeline: Discrete Optimization in NLP

Beyond the Pipeline: Discrete Optimization in NLP Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We

More information

Natural Language Processing. George Konidaris

Natural Language Processing. George Konidaris Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2017 Natural Language Processing Understanding spoken/written sentences in a natural language. Major area of research in AI. Why? Humans

More information

Parsing of part-of-speech tagged Assamese Texts

Parsing of part-of-speech tagged Assamese Texts IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal

More information

South Carolina English Language Arts

South Carolina English Language Arts South Carolina English Language Arts A S O F J U N E 2 0, 2 0 1 0, T H I S S TAT E H A D A D O P T E D T H E CO M M O N CO R E S TAT E S TA N DA R D S. DOCUMENTS REVIEWED South Carolina Academic Content

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

School of Innovative Technologies and Engineering

School of Innovative Technologies and Engineering School of Innovative Technologies and Engineering Department of Applied Mathematical Sciences Proficiency Course in MATLAB COURSE DOCUMENT VERSION 1.0 PCMv1.0 July 2012 University of Technology, Mauritius

More information

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Using Web Searches on Important Words to Create Background Sets for LSI Classification Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract

More information

Compositional Semantics

Compositional Semantics Compositional Semantics CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Words, bag of words Sequences Trees Meaning Representing Meaning An important goal of NLP/AI: convert natural language

More information

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

BYLINE [Heng Ji, Computer Science Department, New York University,

BYLINE [Heng Ji, Computer Science Department, New York University, INFORMATION EXTRACTION BYLINE [Heng Ji, Computer Science Department, New York University, hengji@cs.nyu.edu] SYNONYMS NONE DEFINITION Information Extraction (IE) is a task of extracting pre-specified types

More information

STA 225: Introductory Statistics (CT)

STA 225: Introductory Statistics (CT) Marshall University College of Science Mathematics Department STA 225: Introductory Statistics (CT) Course catalog description A critical thinking course in applied statistical reasoning covering basic

More information

Softprop: Softmax Neural Network Backpropagation Learning

Softprop: Softmax Neural Network Backpropagation Learning Softprop: Softmax Neural Networ Bacpropagation Learning Michael Rimer Computer Science Department Brigham Young University Provo, UT 84602, USA E-mail: mrimer@axon.cs.byu.edu Tony Martinez Computer Science

More information

arxiv: v1 [cs.lg] 15 Jun 2015

arxiv: v1 [cs.lg] 15 Jun 2015 Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy arxiv:1506.04477v1 [cs.lg] 15 Jun 2015 Sang-Woo Lee Min-Oh Heo School of Computer Science and

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Axiom 2013 Team Description Paper

Axiom 2013 Team Description Paper Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

INPE São José dos Campos

INPE São José dos Campos INPE-5479 PRE/1778 MONLINEAR ASPECTS OF DATA INTEGRATION FOR LAND COVER CLASSIFICATION IN A NEDRAL NETWORK ENVIRONNENT Maria Suelena S. Barros Valter Rodrigues INPE São José dos Campos 1993 SECRETARIA

More information

CS 446: Machine Learning

CS 446: Machine Learning CS 446: Machine Learning Introduction to LBJava: a Learning Based Programming Language Writing classifiers Christos Christodoulopoulos Parisa Kordjamshidi Motivation 2 Motivation You still have not learnt

More information

CS 598 Natural Language Processing

CS 598 Natural Language Processing CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@

More information

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18 Version Space Javier Béjar cbea LSI - FIB Term 2012/2013 Javier Béjar cbea (LSI - FIB) Version Space Term 2012/2013 1 / 18 Outline 1 Learning logical formulas 2 Version space Introduction Search strategy

More information

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important

More information

A study of speaker adaptation for DNN-based speech synthesis

A study of speaker adaptation for DNN-based speech synthesis A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

arxiv: v1 [cs.cl] 20 Jul 2015

arxiv: v1 [cs.cl] 20 Jul 2015 How to Generate a Good Word Embedding? Siwei Lai, Kang Liu, Liheng Xu, Jun Zhao National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academy of Sciences, China {swlai, kliu,

More information

KLI: Infer KCs from repeated assessment events. Do you know what you know? Ken Koedinger HCI & Psychology CMU Director of LearnLab

KLI: Infer KCs from repeated assessment events. Do you know what you know? Ken Koedinger HCI & Psychology CMU Director of LearnLab KLI: Infer KCs from repeated assessment events Ken Koedinger HCI & Psychology CMU Director of LearnLab Instructional events Explanation, practice, text, rule, example, teacher-student discussion Learning

More information

WHEN THERE IS A mismatch between the acoustic

WHEN THERE IS A mismatch between the acoustic 808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders

More information

Generative models and adversarial training

Generative models and adversarial training Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?

More information

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition Chapter 2: The Representation of Knowledge Expert Systems: Principles and Programming, Fourth Edition Objectives Introduce the study of logic Learn the difference between formal logic and informal logic

More information

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT The Journal of Technology, Learning, and Assessment Volume 6, Number 6 February 2008 Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the

More information

Attributed Social Network Embedding

Attributed Social Network Embedding JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, MAY 2017 1 Attributed Social Network Embedding arxiv:1705.04969v1 [cs.si] 14 May 2017 Lizi Liao, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua Abstract Embedding

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

Evolution of Symbolisation in Chimpanzees and Neural Nets

Evolution of Symbolisation in Chimpanzees and Neural Nets Evolution of Symbolisation in Chimpanzees and Neural Nets Angelo Cangelosi Centre for Neural and Adaptive Systems University of Plymouth (UK) a.cangelosi@plymouth.ac.uk Introduction Animal communication

More information

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Exploration. CS : Deep Reinforcement Learning Sergey Levine Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?

More information

Prediction of Maximal Projection for Semantic Role Labeling

Prediction of Maximal Projection for Semantic Role Labeling Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba

More information

Some Principles of Automated Natural Language Information Extraction

Some Principles of Automated Natural Language Information Extraction Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract

More information

An investigation of imitation learning algorithms for structured prediction

An investigation of imitation learning algorithms for structured prediction JMLR: Workshop and Conference Proceedings 24:143 153, 2012 10th European Workshop on Reinforcement Learning An investigation of imitation learning algorithms for structured prediction Andreas Vlachos Computer

More information

Distant Supervised Relation Extraction with Wikipedia and Freebase

Distant Supervised Relation Extraction with Wikipedia and Freebase Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email Marilyn A. Walker Jeanne C. Fromer Shrikanth Narayanan walker@research.att.com jeannie@ai.mit.edu shri@research.att.com

More information

B.S/M.A in Mathematics

B.S/M.A in Mathematics B.S/M.A in Mathematics The dual Bachelor of Science/Master of Arts in Mathematics program provides an opportunity for individuals to pursue advanced study in mathematics and to develop skills that can

More information

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011 CAAP Content Analysis Report Institution Code: 911 Institution Type: 4-Year Normative Group: 4-year Colleges Introduction This report provides information intended to help postsecondary institutions better

More information

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick

More information

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence. NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and

More information

Writing Research Articles

Writing Research Articles Marek J. Druzdzel with minor additions from Peter Brusilovsky University of Pittsburgh School of Information Sciences and Intelligent Systems Program marek@sis.pitt.edu http://www.pitt.edu/~druzdzel Overview

More information

Learning Methods for Fuzzy Systems

Learning Methods for Fuzzy Systems Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8

More information

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm syntax: from the Greek syntaxis, meaning setting out together

More information

Technical Manual Supplement

Technical Manual Supplement VERSION 1.0 Technical Manual Supplement The ACT Contents Preface....................................................................... iii Introduction....................................................................

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

Indian Institute of Technology, Kanpur

Indian Institute of Technology, Kanpur Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar

More information