A Generative Model for Parsing Natural Language to Meaning Representations

Size: px

Start display at page:

Download "A Generative Model for Parsing Natural Language to Meaning Representations"

Kerry George
5 years ago
Views:

1 A Generative Model for Parsing Natural Language to Meaning Representations Jake Vasilakes March 9, 2015

2 Outline Background Key Concepts Purpose and Structure Generative Model Process Tree probability Parameters Decoding Discriminative reranking Averaged Perceptron Evaluation Methodology Results

3 Key Concepts Outline Background Key Concepts Purpose and Structure Generative Model Process Tree probability Parameters Decoding Discriminative reranking Averaged Perceptron Evaluation Methodology Results

4 Key Concepts Meaning Representation (MR): Formal representation of meaning. Written using a meaning representation language (MRL).

5 Key Concepts Meaning Representation (MR): Formal representation of meaning. Written using a meaning representation language (MRL). Semantic Category Function Symbol Arguments

6 Key Concepts Meaning Representation (MR): Formal representation of meaning. Written using a meaning representation language (MRL). Semantic Category Function Symbol Arguments NUM : count(state)

7 Key Concepts Meaning Representation (MR): Formal representation of meaning. Written using a meaning representation language (MRL). Semantic Category Function Symbol Arguments NUM : count(state) Semantic Parsing: Mapping of natural language (NL) sentences to meaning representations.

8 Purpose and Structure Outline Background Key Concepts Purpose and Structure Generative Model Process Tree probability Parameters Decoding Discriminative reranking Averaged Perceptron Evaluation Methodology Results

9 Purpose and Structure Purpose Learn a generative model to map NL sentences to MR trees. Learn an implicit grammar.

10 Purpose and Structure Purpose Learn a generative model to map NL sentences to MR trees. Learn an implicit grammar. System Structure

11 Process Outline Background Key Concepts Purpose and Structure Generative Model Process Tree probability Parameters Decoding Discriminative reranking Averaged Perceptron Evaluation Methodology Results

12 Process Goal Simultaneous generation of NL sentence and MR structure.

13 Process Goal Simultaneous generation of NL sentence and MR structure. How many states do not have rivers?

14 Tree probability Outline Background Key Concepts Purpose and Structure Generative Model Process Tree probability Parameters Decoding Discriminative reranking Averaged Perceptron Evaluation Methodology Results

15 Tree probability P(ŵ, m, T ) = P(M a ) P(m a M a ) P(w 1 M b w 2 M c m a ) P(m b m a, arg = 1) P(... m b ) P(m c m a, arg = 2) P(... m c ) ŵ: words m: MR structures T : hybrid tree

16 Tree probability P(ŵ, m, T ) = P(M a ) P(m a M a ) P(w 1 M b w 2 M c m a ) P(m b m a, arg = 1) P(... m b ) P(m c m a, arg = 2) P(... m c ) ŵ: words m: MR structures T : hybrid tree P(w 1 M b w 2 M c m a ) =P(m wywz m a ) P(w 1 m a ) P(M b m a, w 1 ) P(w 2 m a, w 1, M b ) P(M c m a, w 1, M b, w 2 ) P(END m a, w 1, M b, w 2, M c )

17 Parameters Outline Background Key Concepts Purpose and Structure Generative Model Process Tree probability Parameters Decoding Discriminative reranking Averaged Perceptron Evaluation Methodology Results

18 Parameters MR model parameters: m ρ(m m j, arg = k) = 1 for all j and k = 1,2 Pattern parameters: r φ(r m j) = 1 for all j r: hybrid pattern, e.g. wywz Emission parameters: t θ(t m j, Λ) = 1 for all j t: any node in T Λ: preceding context

19 Parameters Different contexts (Λ) result in different models. Model I: θ(t k m j, Λ) = P(t k m j ) (Unigram) Model II: θ(t k m j, Λ) = P(t k m j, t k 1 ) (Bigram) Model III: θ(t k m j, Λ) = 1 2 (Model I + Model II) (Interpolation)

20 Parameters Estimation MR model parameters: count and normalize. Pattern and Emission parameters: EM algorithm Unknown alignment between NL words and MR structures in training data.

21 Parameters EM: inside and outside probabilities Inside and outside probabilities used to calculate estimated counts. O(n 6 m) time for 1 EM iteration, where n is length of NL sentence and m the size of the MR structure. Modification implemented to bring complexity down to O(n 3 m).

22 Parameters Modification Idea: aggregate probabilities of NL-MR subsequences to use in subsequent computations. Aggregate probabilities for a given NL-MR subsequence m v, w v and a given pattern r, e.g. wywz. This aggregate probability can be used to calculate the partial inside or outside probability for a given m v, w v. By summing over all r, we get the total inside or outside probability.

23 Decoding Outline Background Key Concepts Purpose and Structure Generative Model Process Tree probability Parameters Decoding Discriminative reranking Averaged Perceptron Evaluation Methodology Results

24 Decoding Goal: Most probable MR structure m given NL sentence ŵ. m = argmax m P( m, T ŵ) But summing over all possible trees T is expensive. Approximate with the most likely tree (Viterbi approximation). T m = argmax m max P( m, T ŵ) = argmax T m max P(ŵ, m, T ) T In practice, ranked list of k best trees is output.

25 Averaged Perceptron Outline Background Key Concepts Purpose and Structure Generative Model Process Tree probability Parameters Decoding Discriminative reranking Averaged Perceptron Evaluation Methodology Results

26 Averaged Perceptron

27 Averaged Perceptron Generative model cannot model long range dependencies within trees. Use discriminative classifier to rerank the list of k best trees generated by the generative model (k = 50). Averaged perceptron with separating plane.

28 Averaged Perceptron Feature function maps a given tree T to a feature vector Φ(T ). Weight vector w associated with Φ(T ). T with highest score based on weights is picked as output.

29 Averaged Perceptron Feature function maps a given tree T to a feature vector Φ(T ). Weight vector w associated with Φ(T ). T with highest score based on weights is picked as output. Separating Plane After w is learned, set a threshold score value b. Reject a given T if it s score is less than b. Choose b that results in maximum F-score

30 Averaged Perceptron Features Features 1-5 are binary {0,1}. Feature 6 is real valued.

31 Methodology Outline Background Key Concepts Purpose and Structure Generative Model Process Tree probability Parameters Decoding Discriminative reranking Averaged Perceptron Evaluation Methodology Results

32 Methodology Evaluated on two corpora: GEOQUERY and ROBOCUP. Precision, recall, and F-score reported. GEOQUERY: MR structure considered correct if it retrieves the same answer as the reference MR structure when used as a query to the database, regardless of differences in the string representation. ROBOCUP: MR structure considered correct if it has the same string representation as the reference MR structure.

33 Results Outline Background Key Concepts Purpose and Structure Generative Model Process Tree probability Parameters Decoding Discriminative reranking Averaged Perceptron Evaluation Methodology Results

34 Results

35 Results Comparison to previous work

36 Results Comparison to previous work (Evaluated on a subset of GEOQUERY.)

37 Summary Learn a generative model which outputs a list of k best NL-MR hybrid trees from a given NL sentence. Rerank the k best list according to score assigned by the averaged perceptron with separating plane. Choose tree with highest score as output.

38 Appendix References References I W. Lu, H. T. Ng, W. S. Lee, L. S. Zettlemoyer. A Generative Model for Parsing Natural Language to Meaning Representations. Conference on Empirical Methods on Natural Language Processing, 2008.

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art