Constituency Parsing Computational Linguistics: Jordan Boyd-Graber University of Maryland INTRO / CHART PARSING Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 1 / 24
Motivation A More Grounded Syntax Theory A central question in linguistics is how do we know when a sentence is grammatical? Chomsky s generative grammars attempted to mathematically formalize this question Linguistic phrases contained a universal, hierarchical structure formalized as parse trees Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 2 / 24
Motivation A More Grounded Syntax Theory A central question in linguistics is how do we know when a sentence is grammatical? Chomsky s generative grammars attempted to mathematically formalize this question Linguistic phrases contained a universal, hierarchical structure formalized as parse trees Today A formalization Foundation of all computational syntax Learnable from data Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 2 / 24
Context Free Grammars Context Free Grammars Definition N: finite set of non-terminal symbols Σ: finite set of terminal symbols R: productions of the form X Y 1...Y n, where X N, Y (N Σ) S: a start symbol within N Examples of non-terminals: np for noun phrase vp for verb phrase Often correspond to multiword syntactic abstractions Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 3 / 24
Context Free Grammars Context Free Grammars Definition N: finite set of non-terminal symbols Σ: finite set of terminal symbols R: productions of the form X Y 1...Y n, where X N, Y (N Σ) S: a start symbol within N Examples of terminals: dog play the Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 3 / 24
Context Free Grammars Context Free Grammars Definition N: finite set of non-terminal symbols Σ: finite set of terminal symbols R: productions of the form X Y 1...Y n, where X N, Y (N Σ) S: a start symbol within N Examples of productions: n dog np n np adj n Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 3 / 24
Context Free Grammars Context Free Grammars Definition N: finite set of non-terminal symbols Σ: finite set of terminal symbols R: productions of the form X Y 1...Y n, where X N, Y (N Σ) S: a start symbol within N In NLP applications, by convention we use S as the start symbol Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 3 / 24
Context Free Grammars Flexibility of CFG Productions Unary rules: nn man Mixing terminals and nonterminals on RHS: np Congress Vt the pooch np the nn Empty terminals np ε adj ε Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 4 / 24
Context Free Grammars Derivations A derivation is a sequence of strings s1...s T where s1 S, the start symbol st Σ : i.e., the final string is only terminals si, i > 1, is derived from s i 1 by replacing some non-terminal X in s i 1 and replacing it by some β, where x β R. Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 5 / 24
Context Free Grammars Derivations A derivation is a sequence of strings s1...s T where s1 S, the start symbol st Σ : i.e., the final string is only terminals si, i > 1, is derived from s i 1 by replacing some non-terminal X in s i 1 and replacing it by some β, where x β R. Example: parse tree Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 5 / 24
Context Free Grammars Example Derivation Productions s np vp vp AdvP vz Det the nn dot vz barked. s 1 = np Det nn np AdjP nn Det a nn cat vz ran. S vp vz np pro Det an nn mouse vz sat. Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 6 / 24
Context Free Grammars Example Derivation Productions s np vp vp AdvP vz Det the nn dot vz barked. s 2 = np Det nn np AdjP nn Det a nn cat vz ran. S vp vz np pro Det an nn mouse vz sat. VP Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 6 / 24
Context Free Grammars Productions sexample np Derivation vp vp AdvP vz Det the nn dot vz barked. s 3 = np Det nn np AdjP nn Det a nn cat vz ran. S vp vz np pro Det an nn mouse vz sat. VP Det NN Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 6 / 24
Context Free Grammars Productions s np vp vp Example AdvP Derivation vz Det the nn dot vz barked. s 4 = np Det nn np AdjP nn Det a nn cat vz ran. S vp vz np pro Det an nn mouse vz sat. VP Det NN the Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 6 / 24
Context Free Grammars Productions s np vp vp Example AdvP Derivation vz Det the nn dot vz barked. s 5 = np Det nn np AdjP nn Det a nn cat vz ran. S vp vz np pro Det an nn mouse vz sat. VP Det NN the cat Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 6 / 24
Context Free Grammars Productions s np vp vp Example AdvP Derivation vz Det the nn dot vz barked. s 6 = np Det nn np AdjP nn Det a nn cat vz ran. S vp vz np pro Det an nn mouse vz sat. VP Det NN VZ the cat Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 6 / 24
Context Free Grammars Productions s np vp vp Example AdvP Derivation vz Det the nn dot vz barked. s 7 = np Det nn np AdjP nn Det a nn cat vz ran. S vp vz np pro Det an nn mouse vz sat. VP Det NN VZ the cat sat Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 6 / 24
s 7 = Context Free Grammars S Example Derivation VP Det NN VZ the cat sat Ambiguous Yields The yield of a parse tree is the collection of terminals produced by the parse tree. Given a yield s. Parsing / Decoding Given, a yield s and a grammar G, determine the set of parse trees that could have produced that sequence of terminals: T G (s). Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 6 / 24
s 7 = Context Free Grammars S Example Derivation VP Det NN VZ the cat sat Ambiguous Yields The yield of a parse tree is the collection of terminals produced by the parse tree. Given a yield s. Parsing / Decoding Given, a yield s and a grammar G, determine the set of parse trees that could have produced that sequence of terminals: T G (s). Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 6 / 24
Context Free Grammars S Ambiguity Vt Example sentence: The man saw the dog with the telescope Grammatical: TG (s) > 0 Ambiguous: TG (s) > 1 DT the VP NN man saw DT NN IN the dog with PP DT NN S S the telescope DT the VP NN man Vt saw PP S DT NN the dog IN with DT NN Which should we prefer? the telescope DT the NN man Vt saw VP DT the NN dog VP IN with PP DT the NN telescope Figure 3: Two parse trees (derivations) for the sentence the man saw the dog with the telescope, underthecfginfigure1. VP DT NN VP PP the man Computational Linguistics: Jordan VtBoyd-Graber IN UMD 5 Constituency Parsing 7 / 24
Context Free Grammars S Ambiguity Vt Example sentence: The man saw the dog with the telescope Grammatical: TG (s) > 0 Ambiguous: TG (s) > 1 DT the VP NN man saw DT NN IN the dog with PP DT NN S S the telescope DT the NN man Vt saw S DT the VP NN dog IN with PP DT the Which should we prefer? NN telescope One is more probable VP than the other Add DT NN probabilities! VP PP the man Computational Linguistics: Jordan VtBoyd-Graber IN UMD 5 Constituency Parsing 7 / 24 DT the NN man Vt saw VP DT the NN dog VP IN with PP DT the NN telescope Figure 3: Two parse trees (derivations) for the sentence the man saw the dog with the telescope, underthecfginfigure1.
Probabilistic Context Free Grammars Goals What we want is a probability distribution over possible parse trees t T G (s) t,p(t) 0 p(t) = 1 (1) Rest of this lecture: t T G (s) How do we define the function p(t) (paramterization) How do we learn p(t) from data (estimation) Given a sentence, how do we find the possible parse trees (parsing / decoding) Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 8 / 24
Probabilistic Context Free Grammars Parameterization: Defining Score Function Parametrization For every production α β, we assume we have a function q(α β) We consider it a conditional probability of β (LHS) being derived from α (RHS) q(α β) = 1 (2) α β R:α=X The total probability of a tree t {α1 β 1...α n β n } is n p(t) = q(α i β i ) (3) i=1 Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 9 / 24
Probabilistic Context Free Grammars Estimation Estimation Get a bunch of grad students to make parse trees for a million sentences Mitch Markus: Penn Treebank (Wall Street Journal) To compute the conditional probability of a rule, q(np Det adj nn) Count(np Det adj nn) Count(np) Where Count is the number of times that derivation appears in the sentences Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 10 / 24
Probabilistic Context Free Grammars Estimation Estimation Get a bunch of grad students to make parse trees for a million sentences Mitch Markus: Penn Treebank (Wall Street Journal) To compute the conditional probability of a rule, q(np Det adj nn) Count(np Det adj nn) Count(np) Where Count is the number of times that derivation appears in the sentences Why no smoothing? Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 10 / 24
Dynamic Programming Like for dependency parsing, we build a chart to consider all possible subtrees First, however, we ll just consider whether a sentence is grammatical or not Build up a chart with all possible derivations of spans Then see entry with start symbol over the entire sentence: those are all grammatical parses Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 11 / 24
CYK Algorithm (deterministic) Assumptions Assumes binary grammar (not too difficult to extend) and no recursive rules Given sentence w of length N, grammar (N,Σ,R,S) Initialize array C[s, t, n] as array of booleans, all false ( ) Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 12 / 24
CYK Algorithm (deterministic) Assumptions Assumes binary grammar (not too difficult to extend) and no recursive rules Given sentence w of length N, grammar (N,Σ,R,S) Initialize array C[s, t, n] as array of booleans, all false ( ) for i = 0...N do for For each production r j N a w i do set C[i,i,a] Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 12 / 24
CYK Algorithm (deterministic) Assumptions Assumes binary grammar (not too difficult to extend) and no recursive rules Given sentence w of length N, grammar (N,Σ,R,S) Initialize array C[s, t, n] as array of booleans, all false ( ) for i = 0...N do for For each production r j N a w i do set C[i,i,a] for l = 2...n (length of span) do for s = 1...N l + 1 (start of span) do for k = 1...l 1 (pivot within span) do for each production r α βγ do if C[s,s + l,α] then C[s,s + l,α] C[s,s + k 1,β] C[s + k,s + l,γ] Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 12 / 24
Chart Parsing Book the flight through Houston Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 13 / 24
Chart Parsing 0 1 2 3 4 5 Book the flight through Houston Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 13 / 24
Chart Parsing Det P 0 1 2 N 3 4 N 5 Book the flight through Houston Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 13 / 24
Chart Parsing Det P N 0 V 1 2 N 3 4 N 5 Book the flight through Houston Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 13 / 24
Chart Parsing Det P N 0 V 1 2 N 3 4 N 5 Book the flight through Houston Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 13 / 24
Chart Parsing VP Det P N 0 V 1 2 N 3 4 N 5 Book the flight through Houston Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 13 / 24
Chart Parsing S VP Det P N 0 V 1 2 N 3 4 N 5 Book the flight through Houston Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 13 / 24
Chart Parsing S VP Det P N 0 V 1 2 N 3 4 N 5 Book the flight through Houston Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 13 / 24
Chart Parsing S VP Det P N 0 V 1 2 N 3 4 N 5 Book the flight through Houston Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 13 / 24
Chart Parsing S DP VP Det P N 0 V 1 2 N 3 4 N 5 Book the flight through Houston Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 13 / 24
Chart Parsing PP S DP VP Det P N 0 V 1 2 N 3 4 N 5 Book the flight through Houston Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 13 / 24
Chart Parsing PP S DP VP Det P N 0 V 1 2 N 3 4 N 5 Book the flight through Houston Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 13 / 24
Chart Parsing S PP S DP VP Det P N 0 V 1 2 N 3 4 N 5 Book the flight through Houston Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 13 / 24
Chart Parsing DP S PP S DP VP Det P N 0 V 1 2 N 3 4 N 5 Book the flight through Houston Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 13 / 24
Chart Parsing S DP S PP S DP VP Det P N 0 V 1 2 N 3 4 N 5 Book the flight through Houston Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 13 / 24
Complexity? Chart has n 2 cells Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 14 / 24
Complexity? Chart has n 2 cells Each cell has n options Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 14 / 24
Complexity? Chart has n 2 cells Each cell has n options Times the number of productions G Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 14 / 24
Complexity? Chart has n 2 cells Each cell has n options Times the number of productions G Thus, O(n 3 G ) Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 14 / 24
How to deal with PCFG ambiguity In addition to keeping track of non-terminals in cell, also include max probability of forming non-terminal from sub-trees C[s,s + k,α] max(c[s,s + k,α],c[s,s + l 1,β] C[s + l,s + k,γ]) The score associated with S in the top of the chart is the best overall parse-tree (given the yield) Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 15 / 24
Recap Hierarchical syntax model: context free grammar Probabilistic interpretation: learn from data to solve ambiguity In class (next time): Work through example to resolve ambiguity Scoring a sentence Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 16 / 24
A pcfg Assume the following grammar s np vp 1.0 v sleeps 0.4 vp v np 0.7 v saw 0.6 vp vp pp 0.2 nn man 0.1 vp v 0.1 nn woman 0.1 np dt nn 0.2 nn telescope 0.3 np np pp 0.8 nn dog 0.5 pp p np 1.0 dt the 1.0 p with 0.6 p in 0.4 Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 17 / 24
Evaluating the probability of a sentence What is the probability of the parse S VP DT NN V the dog sleeps Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 18 / 24
Evaluating the probability of a sentence 1.0 det the 0.5 n dog 1.0 v sleeps 0.1 vp v 0.2 np dt n 1.0 = 0.002 s np vp Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 18 / 24
Span 0 1. C[8,8,nn] = ln(0.3) = 1.2 2. C[7,7,dt] = ln(1.0) = 0.0 3. C[6,6,p] = ln(0.6) = 0.51 4. C[5,5,nn] = ln(0.5) = 0.69 5. C[4,4,dt] = ln(1.0) = 0.0 6. C[3,3,v] = ln(0.6) =.51 7. C[3,3,vp] = ln(0.6) + ln(0.1) = 2.8 8. C[2,2,nn] = ln(0.1) = 2.3 9. C[1,1,dt] = ln(1.0) = 0.0 Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 19 / 24
Span 1 1. C[1,2,np] = 0.0 +ln( 2.3 C[1,1,DT] C[2,2,NN] )+ln( 0.2 np dt n ) = 2.3+ 1.6 = 3.9 Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 20 / 24
Span 1 1. C[1,2,np] = 0.0 +ln( 2.3 C[1,1,DT] C[2,2,NN] )+ln( 0.2 np dt n ) = 2.3+ 1.6 = 3.9 2. C[4,5,np] = 0.0 C[4,4,DT] +.69 C[5,5,NN] +ln( 0.2 ) = 0.69 + 1.6 = 2.3 np dt n Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 20 / 24
Span 1 1. C[1,2,np] = 0.0 +ln( 2.3 C[1,1,DT] C[2,2,NN] )+ln( 0.2 np dt n ) = 2.3+ 1.6 = 3.9 2. C[4,5,np] = 0.0 C[4,4,DT] +.69 C[5,5,NN] +ln( 0.2 ) = 0.69 + 1.6 = 2.3 np dt n 3. C[7,8,np] = 0.0 C[7,7,DT] + 1.2 C[8,8,NN] +ln( 0.2 ) = 1.2 + 1.6 = 2.8 np dt n Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 20 / 24
Span 2 1. C[1,3,s] = 3.9 C[1,2,] + 2.8 C[3,3,VP] +ln( 1.0 ) = 6.7 s np vp Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 21 / 24
Span 2 1. C[1,3,s] = 3.9 C[1,2,] 2. C[3, 5, vp] = 0.5 C[3,3,V] + 2.8 C[3,3,VP] + 2.3 C[4,5,] +ln( 1.0 ) = 6.7 s np vp +ln( 0.7 ) = 2.8 0.36 = 3.2 vp v np Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 21 / 24
Span 2 1. C[1,3,s] = 3.9 C[1,2,] 2. C[3, 5, vp] = 0.5 C[3,3,V] 3. C[6,8,pp] = 0.51 C[6,6,P] + 2.8 C[3,3,VP] + 2.3 C[4,5,] + 2.8 C[7,8,] +ln( 1.0 ) = 6.7 s np vp +ln( 0.7 ) = 2.8 0.36 = 3.2 vp v np +ln( 1.0 ) = 3.3 + 1.6 = 3.3 pp p np Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 21 / 24
Span 4 1. C[1,5,s] = 3.9 C[1,2,] + 3.2 C[3,5,VP] +ln( 1.0 ) = 7.1 s np vp Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 22 / 24
Span 4 1. C[1,5,s] = 3.9 C[1,2,] 2. C[4, 8, np] = 2.3 C[4,5,] + 3.2 C[3,5,VP] + 3.3 C[6,8,PP] +ln( 1.0 ) = 7.1 s np vp +ln( 0.8 ) = 5.6 + 0.2 = 5.8 np np pp Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 22 / 24
Span 5 C[3, 8,vp] = max( (4) 3.2 C[3,5,VP] + 3.3 C[6,8,PP] + 1.6, (5) vp vp pp 0.5 + 5.8 +.36 ) (6) C[3,3,V] C[4,8,] vp v np = max( 8.1, 6.7) = 6.7 (7) Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 23 / 24
Span 7 1. C[1,8,s] = 3.9 + 6.7 = 10.6 C[1,2,] C[3,8,VP] Computational Linguistics: Jordan Boyd-Graber UMD Constituency Parsing 24 / 24