Resolving Dependency Ambiguity of Subordinate Clauses using Support Vector Machines

Similar documents
Natural language processing implementation on Romanian ChatBot

arxiv: v1 [cs.dl] 22 Dec 2016

'Norwegian University of Science and Technology, Department of Computer and Information Science

Management Science Letters

Fuzzy Reference Gain-Scheduling Approach as Intelligent Agents: FRGS Agent

E-LEARNING USABILITY: A LEARNER-ADAPTED APPROACH BASED ON THE EVALUATION OF LEANER S PREFERENCES. Valentina Terzieva, Yuri Pavlov, Rumen Andreev

Consortium: North Carolina Community Colleges

part2 Participatory Processes

Application for Admission

CONSTITUENT VOICE TECHNICAL NOTE 1 INTRODUCING Version 1.1, September 2014

HANDBOOK. Career Center Handbook. Tools & Tips for Career Search Success CALIFORNIA STATE UNIVERSITY, SACR AMENTO

VISION, MISSION, VALUES, AND GOALS

A Syllable Based Word Recognition Model for Korean Noun Extraction

also inside Continuing Education Alumni Authors College Events

Prediction of Maximal Projection for Semantic Role Labeling

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

A Comparison of Two Text Representations for Sentiment Analysis

Learning Methods in Multilingual Speech Recognition

Linking Task: Identifying authors and book titles in verbose queries

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

On March 15, 2016, Governor Rick Snyder. Continuing Medical Education Becomes Mandatory in Michigan. in this issue... 3 Great Lakes Veterinary

2014 Gold Award Winner SpecialParent

Indian Institute of Technology, Kanpur

Using dialogue context to improve parsing performance in dialogue systems

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Parsing of part-of-speech tagged Assamese Texts

Some Principles of Automated Natural Language Information Extraction

Reducing Features to Improve Bug Prediction

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Accurate Unlexicalized Parsing for Modern Hebrew

Learning Computational Grammars

Beyond the Pipeline: Discrete Optimization in NLP

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

CS Machine Learning

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Ensemble Technique Utilization for Indonesian Dependency Parser

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

The College Board Redesigned SAT Grade 12

Disambiguation of Thai Personal Name from Online News Articles

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Loughton School s curriculum evening. 28 th February 2017

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Human Emotion Recognition From Speech

Online Updating of Word Representations for Part-of-Speech Tagging

Probabilistic Latent Semantic Analysis

Matching Similarity for Keyword-Based Clustering

Calibration of Confidence Measures in Speech Recognition

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Distant Supervised Relation Extraction with Wikipedia and Freebase

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

A Vector Space Approach for Aspect-Based Sentiment Analysis

Experiments with a Higher-Order Projective Dependency Parser

An Interactive Intelligent Language Tutor Over The Internet

LTAG-spinal and the Treebank

Word Segmentation of Off-line Handwritten Documents

CS 598 Natural Language Processing

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

GCSE. Mathematics A. Mark Scheme for January General Certificate of Secondary Education Unit A503/01: Mathematics C (Foundation Tier)

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Cross Language Information Retrieval

A Graph Based Authorship Identification Approach

Multilingual Sentiment and Subjectivity Analysis

Building a Semantic Role Labelling System for Vietnamese

Lecture 1: Machine Learning Basics

A General Class of Noncontext Free Grammars Generating Context Free Languages

Extracting and Ranking Product Features in Opinion Documents

Chapter 2 Rule Learning in a Nutshell

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Natural Language Processing. George Konidaris

INPE São José dos Campos

A Named Entity Recognition Method using Rules Acquired from Unlabeled Data

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

The Discourse Anaphoric Properties of Connectives

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Som and Optimality Theory

Context Free Grammars. Many slides from Michael Collins

The Smart/Empire TIPSTER IR System

Transcription:

Resolvig Depedecy Ambiguity of Subordiate Clauses usig Support Vector Machies Sag-Soo Kim, Seog-Bae Park, ad Sag-Jo Lee Abstract I this paper, we propose a method of resolvig depedecy ambiguities of Korea subordiate clauses based o Support Vector Machies (SVMs). Depedecy aalysis of clauses is well kow to be oe of the most difficult tasks i parsig seteces, especially i Korea. I order to solve this problem, we assume that the depedecy relatio of Korea subordiate clauses is the depedecy relatio amog verb phrase, verb ad edigs i the clauses. As a result, this problem is represeted as a biary classificatio task. I order to apply SVMs to this problem, we selected two kids of features: static ad dyamic features. The experimetal results o STEP2000 corpus show that our system achieves the accuracy of 73.5%. Keywords Depedecy aalysis, subordiate clauses, biary classificatio, support vector machies. I. INTRODUCTION N Korea, the depedecy aalysis of clauses is kow as I oe of the most difficult tasks i parsig seteces because of the characteristics of Korea. The characteristics of Korea are that (i) it is a partially free word-order, (ii) the omissio of compoets is commo, (iii) it is a head-fial laguage, ad (iv) the spacig uit is a composite of oe or more words. Especially, what makes the clause depedecy aalysis difficult is the third factor. The edigs (Eomi) ca be freely combied with a verb, ad they cotai the sematic relatioship with other verbs. The steps of parsig Korea seteces are as follows. First, the iput setece is aalyzed ito the morphemes, ad the the part-of-speech (POS) of the morphemes is determied by some meas. Fially, the sytactic relatio is aalyzed usig the results of the previous steps. Due to the characteristics of Korea, the depedecy grammar rather tha phrase-structure grammar is geerally used i parsig Korea []. This process is ot much differet from other laguages. However, sice each word used i a setece becomes a processig uit, the complexity of parsig gets too large especially with log seteces, which results i severe ambiguities i parsig This research was supported i part by MIC & IITA through IT Leadig R&D Support Project, ad by grat No. R0-2006-000-96-0 from the Basic Research Program of the Korea Sciece & Egieerig Foudatio. Authors are with Departmet of Computer Egieerig, Kyugpook Natioal Uiversity, Daegu 702-70, Korea (correspodig author to provide e-mail: sskim@sejog.ku.ac.kr). Korea. Recetly, i order to solve this problem, elargig the processig uit gais much iterest from researchers of Korea laguage processig. May kids of research results are reported o Korea text chukig [2,3], ad they gives relatively stable results. I additio, some researchers have studied to fid the boudaries of a clause [4]. Whe the clause boudaries are kow, itra-clause parsig is a simple task compared to iter-clause parsig. This is because Korea is head-fial. However, the relatio betwee clauses is ot determied by the iformatio give by text chukig ad clause boudaries. Due to the freedom of word orderig i Korea seteces, it is extremely difficult to determie the relatio betwee clauses by seeig just the eighbor words. That is, it has bee believed that the surface form of a clause is ot sufficiet to aalyzig the relatio amog clauses. As a result, may previous works o parsig Korea have focused o how to use sematic iformatio of verb phrases i determiig the clause relatio []. However, it is a very expesive ad time-cosumig task to build sematic kowledge for the task. I this paper, we propose a ovel method of aalyzig depedecy relatio of Korea subordiate clauses without exteral kowledge. For this task, we witessed that the most importat compoet i determiig the depedecies is the base verb phrase composed of a verb ad a few edigs, rather tha the complemet ad supplemet compoets withi a clause. Therefore, i order to solve the problem, we assume that the depedecy relatio of Korea subordiate clauses is the depedecy relatio of base verb phrases. I additio, we formulate the depedecy aalysis of Korea clauses to a biary classificatio task. As a classifier for this task, we adopt a support vector machie (SVM) which is kow as the best classifier for may kids of real-world classificatio problems. The rest of this paper is orgaized as follows. Sectio 2 surveys the previous work o clause recogitio ad aalysis of iter-clause relatio, ad Sectio 3 itroduces how a support vector machie works which is adopted as a base learer for the task. Sectio 4 describes the proposed method for clausal depedecy aalysis usig support vector machies. Sectio 5 explais the corpus used i the experimets ad presets the experimetal results. Fially, Sectio 5 draws coclusios ad suggests some future work. 95

II. RELATED WORK There have bee a umber of studies for aalyzig depedecy relatio of subordiate clauses i the clause idetificatio ad the depedecy structure aalysis. The clause idetificatio is a task of recogizig the embeddedess of clauses, while the clause idetificatio is to fid the startig ad edig poits of clauses. I 200, there was a competitio for this task at the Coferece o Computatioal Laguage Learig (CoNLL). The best two methods are a boostig tree [5] ad a hidde Markov Model [6]. However, ulike wester laguages, i Korea the depedecy relatio is ot easily determied eve if the clause boudaries are idetified. Most previous work o depedecy aalysis of seteces has focused o the words rather tha clauses. That is, istead of fidig the depedecy relatio amog clauses, the relatio amog verb phrases withi a clause has bee the core of the research. Uchimoto et al. used a maximum etropy model ad various kids of features to idetify depedecy structure of seteces [7]. They reported the experimetal results o the relatioship betwee feature types ad depedecy aalysis. Kudo ad Matsumoto formulated the aalysis of depedecy structure as a biary classificatio task, ad adopted support vector machies as a classifier [8]. The features used i traiig support vector machies are grammatical features such as lexicos ad part-of-speech tags, ad some fuctioal features such as fuctioal words ad iflectio iformatio. Gao ad Suzuki solved the problem of aalyzig depedecy relatio by traiig a laguage model through a usupervised learig [9]. Utsuro et al. classified text chuks ito several types accordig to the fuctioal words of the fial word i a setece. With the classified type, they determied the depedecy relatio amog chuks [0]. I Korea laguage processig, most research o sytactic aalysis has bee focused o the Josa ad Eomi, ad their depedecy relatio. As a result, most works are based o the had-crafted rules []. Especially, the research o the subordiate clauses was performed o the recogitio simple setece ad restoratio of the omitted compoets i the simple setece. The first effort to use a machie learig algorithm i hadlig clauses was doe for clause boudary detectio. Lee et al. extracted -gram iformatio from a setece, ad the recogized the boudary of a clause usig the iformatio []. However, their work was limited to detectio of clauses, ad did ot suggest ay method for aalyzig their depedecy. The mai reaso why the machie learig algorithms are rare i hadlig Korea clauses is that there is o stadard large-scale dataset for the task. Recetly the great fudig of the Korea govermet i writig a large-scale tree-tagged corpus makes it possible to trasform the corpus ito the data for clause detectio ad their depedecy aalysis. III. SUPPORT VECTOR MACHINES Support Vector Machie (SVM) proposed by Vapik is a kid of machie learig algorithms, ad is well kow as the most successful biary classifier, ad have bee applied to may classificatio tasks. I the field of atural laguage processig, it has bee successfully applied to text categorizatio, spam-mail filterig ad chuk idetificatio, ad it is reported to accomplish high performace without fallig ito over-fittig eve with a large umber of features [2, 3]. Assume that the traiig data with either positive or egative class as follows: x, y ),( x, y ),...,( x, y ) ( 2 2 xi R, yi { +, } where x i is a feature vector of the i-th traiig datum i a -dimesioal space, ad y i is its class label. I the basic SVM framework, the hyperplae is defied as follows: ( w x ) + b = 0, w R, b R. Accordig to the hyperplae defiitio, there could be the ifiite umber of hyperplaes that ca separate traiig data ito two classes correctly. Fig. The margi of a hyperplae Amog such hyperplaes, we defie the optimal hyperplae as the oe with the largest margi betwee two classes. Fig. illustrates the otio of the margi. The solid lie, hyperplae, correctly divides traiig data ito two classes without misclassificatio. Two dash lies which are parallel with the hyperplae represet the distace betwee hyperplae ad the closest istace. The distace betwee each parallel dash lies, d, is called the margi. Thus, assumig that the earest distace is, the margi ca be rewritte as: ( w x) + b + ( w x) + b 2 d = w Therefore, SVM geerates a hyperplae which maximizes a margi by miimizig w uder the costraits: [( w x ) + b] y i d l l 96

Fig. 2 A example of a depedecy relatio betwee clauses SVMs have a advatage over covetioal machie learig algorithms such as eural etworks or decisio trees. SVMs show higher geeralizatio performace idepedet of the dimesio of feature vectors. Covetioal machie learig algorithms usually require careful feature selectio, which is ofte optimized heuristically to avoid over-fittig. SVMs also ca carry out their learig with all combiatios of give features without icreasig computatioal complexity by itroducig the kerel fuctio. IV. ANALYZING DEPENDENCY RELATION OF SUBORDINATE CLAUSES A. The Probability Model ad Geeratig Traiig Data Let a sequece of clauses be {c, c 2,..., c } deoted by C, ad the sequece depedecy patters be {Dep(), Dep(2),, Dep(-)} deoted by D, where Dep(i)=j implies that the clause c i modifies the clause c j. I Kora uder this framework, this depedecy relatio has to satisfy some costraits. A clause has oly oe depedecy relatio except for the rightmost oe. It meas that a clause modifies oly oe clause. A depedecy relatio is defied as a searchig problem for depedecy patter D that maximizes the coditioal probability P(D C). That is, D = arg max PD ( C) best D If we assume that the depedecy probability is idepedet oe aother, P(D C) ca be rewritte as: m PDC ( ) = PDepi ( ( ) = j f } i= f = { f,..., f} R where f is a -dimesioal feature vector that represets relatio betwee clauses. I order to use SVMs i aalyzig the clausal depedecy, we geerate positive ad egative examples. We adopt simple ad effective method for this purpose. (f, y ) = {(f, y ),(f, y ),...,(f, y )} U i m i+ j m 2 2 23 23 m m m m f = { f,..., f } R y { Dep( + ), Not Dep( )} TABLE I FEATURES USED FOR ANALYZING DEPENDENCY RELATION Static Features Lexico Iformatio Positio Iformatio Dyamic Features Left Clause A word of verb POS tag of verb A word of edigs POS tag of edigs Right Clause A word of verb POS tag of verb A word of edigs POS tag of edigs Distace betwee left ad right clause Positio idex of left ad right Clauses A sytactic relatio betwee clauses Accordig to the above equatio, we geerate pairs of two clauses i the traiig data, ad the take a pair of clauses that are i a depedecy relatio as a positive example, ad two clauses that appear i a setece but are ot with a depedecy relatio as a egative example. Fig. 2 shows a example of depedecy relatio extractio betwee clauses. I this example, clause, 2 ad 3 meas shower-room was moved to the gymasium, office-room was closed, ad a rest room was made i the place (origial place of shower ad office room). I this case, we ca geerate oe positive example (Case 2 i Fig. 2) ad oe egative example (Case i Fig. 2). B. Feature Selectio for Aalyzig Depedecy Relatio I Korea laguage, the clauses are divided ito three types that are oe to modify other clause (called cojuctive clause), oe to modify a ou phrase (called preomial clause), ad oe to imply the ed of setece (called fial edig clause). Amog these clause types, preomial clause ad cojuctive clause make depedecy relatio. We select depedecy relatio that cojuctive clause was deped o other clause, because preomial clause make a simple depedecy relatio with to modify a ext appearig ou phrase. The cojuctive clause makes the depedecy relatio very complex ad, thus, it is difficult to recogize depedecy relatio. The relatio ca be determied ot accordig to simple sytactic iformatio such as verb type ad positio i setece but accordig to the 97

Fig. 4 A example of determiig depedecy relatio usig dyamic features cotext of setece ad the iflectio of edigs. I the previous sectio, we assume that the depedecy relatio of Korea subordiate clauses is the depedecy relatio of verb phrase, verb ad edigs, i the clauses. Accordig to this assumptio, we select two features that are static ad dyamic features. The feature set is show i Table I. We defie lexico ad positio iformatio appearig i a setece as the static iformatio. The lexico iformatio is a word ad POS tags of verb ad edigs i the pair of left ad right clauses. The positioal iformatio is the distace betwee clauses, ad positio idex is the locatio of clauses i a setece. We expect that this static features weakly represet the sematic iformatio betwee clauses. Fig. 3 is show the static features for Fig. 2. N o 2 Lexico iformatio Positio iformatio Left Clause Right Clase (Distace, Positio Idex) 옮기 /pvg 고 /ecc 하 /px ㄴ /etm, 옮기 /pvg 하 /xsv 었 /ep 고 /ecc 다 /ef 2, 2 Fig. 3 The example static features The dyamic features are the sytactic iformatio i a setece. Therefore, we make a simple CKY cart parser so that it captures sytactic iformatio i the setece. Table 2 shows a rule set for the chart parser. CC implies a cojuctive clause ad PC implies a preomial clause. TABLE II THE RULE USED FOR CHART PARSING Rule : CC CC CC Rule 2: CC PC CC Rule 3: PC PC PC Rule 4: PC CC PC With dyamic features we ca apply a sytactic relatio of clauses to traiig the support vector machies. The sytactic relatio states if a clause is composed of just oe simple clause or more tha oe clause. Fig. 4 shows a example of a aalyzig depedecy relatio usig the dyamic feature. The static feature of Case ad Case 2 are same, but the dyamic features are differet. The dyamic feature of Case is PC CC determied by rule 2, but that of Case 2 is CC. Table III shows the whole features for Fig. 3. No 2 3 TABLE III THE WHOLE FEATURES Static features Lexico iformatio 옮기 /pvg 고 /ecc 하 /px ㄴ /etm 옮기 /pvg 고 /ecc 하 /px ㄴ /etm 옮기 /pvg 고 /ecc 하 /xsv 었 /ep 다 /ef Positio iformatio Dyamic features, PC CC, CC 2, 2 PC V. EXPERIMENTS For the evaluatio of the proposed method, a data set for depedecy aalysis of clauses i Korea is prepared. This dataset is derived from the parse corpus, which is a product of STEP2000 project supported by the Korea govermet. The corpus cosists of 6,934 seteces with 26,876 clauses. The corpus is divided ito two parts: traiig (90%) ad test (0%) set. Table IV shows a simple statistics o the corpus. TABLE IV COUNTS ON THE DATASET Iformatio Traiig Set Test Set No. of all seteces 6,240 694 No. of all clauses 24,226 2,650 No. of preomial ad fi al edig clauses 5,457,666 No. of cojuctive clauses 8,769 984 98

Fig. 5 shows a example of depedecy relatio i the subordiate clause dataset. For the format of this dataset, we follow that of CoNLL-200 shared task ad additioally add the depedecy relatio of clauses to it. Each istace i the traiig ad test data cosists of six colums. The first colum cotais the lexico, the secod presets a part-of-speech tag. The third colum cotais the chuk tag. The verb phrases i these colums are used as static features. The fourth ad fifth cotai a begiig, S, ad a edig, E, of clauses. The sixth colum gives the relatio idex of clauses. We apply SVM Light [4] for support vector machie, ad experimet o three cases. The first ad secod case used oly words ad POS tags of clauses ad the all of static features. The last case used both static features ad dyamic features. The evaluatio measure is defied as: correctly recogized depedecy relatio of clauses Accuracy = 00 total depedecy relatio Whe a clause makes several pairs of depedecy relatio with more tha oe clause, we select a pair which has the largest margi. Table V shows the experimetal results. The base lie is the model that determies the goveror of a clause as the earest oe. TABLE V THE EXPERIMENTAL RESULTS Features Accuracy (% ) Base Lie 57.50 Case Oly words ad POS tags of clauses 64.40 Case 2 All of Static features 68.59 Case 3 All of Static ad Dyamic features 73.50 I Case, whe oly words ad POS tags of clauses are used, the accuracy is just 68.59%, That is, the proposed model improves 6.09% over the base lie. It implies that the verb ad edigs have a depedecy relatio weakly. The secod case which uses all of the static features shows 68.59% of accuracy. It meas that the positioal iformatio i static features affect the depedecy relatio. I the last case, the results with both static ad dyamic features are far better tha those without dyamic features. That is, the model with dyamic features outperforms that with static features oly. The performace of our approach is a little bit lower tha a performace of other researches that aalyze the depedecy relatio i Japaese ad Europea laguages. It seems that our approach select oly a relatio of clauses without relatios of word ad phrases. It is easier to aalyze the relatios of word ad phrases tha to aalyze the relatio of clauses. VI. CONCLUSION We have proposed a method for aalyzig depedecy relatio of Korea subordiate clauses based o Support Vector Machies (SVMs). I other to solve this problem, we assume that the depedecy relatio of Korea subordiate clauses is the depedecy relatio of verb phrase, verb ad edigs, i the clauses. We formulate this problem as a biary classificatio task. We selected two kid of features, static ad dyamic features, for applyig SVMs to this problem. The static features are word, POS tag, ad the positioal iformatio, while the dyamic features iclude the sytactic iformatio of the caluses. For extractig the dyamic iformatio, we make a simple CKY chart parser with simple rules. The experimetal results o STEP2000 corpus show that our system achieves the accuracy of 73.5%. 샤워실 c B-NP S X 0 shower room 2 을 jco I-NP X X 0 POST 3 체육관 c B-NP X X 0 gymasium 4 으로 jca I-NP X X 0 POST 5 옮기 pcg B-VP X X 0 move 6 고 ecc I-NP X E ENDING 7 사무실 c B-NP S X 0 office room 8 을 jco I-NP X X 0 POST 9 폐쇄 cpa B-VP X X 0 closed 0 하 xsv I-VP X X 0 ENDING ㄴ etm I-VP X E 2 ENDING 2 그곳 pd B-NP X X 0 that place 3 에 jca I-NP X X 0 ENDING 4 휴게실 c B-NP X X 0 rest room 5 을 jco I-NP X X 0 POST 6 만들 pvg B-VP X X 0 Make 7 었 ep I-VP X X 0 ENDING 8 다 ef I-VP X X 0 ENDING 9. sf O X E - Fig. 5 A example of depedecy relatio i the subordiate clause dataset REFERENCES [] K.-J. Seo, A Korea laguage parser usig sytactic depedecy relatios betwee word-phrases, M.S. Thesis, KAIST, 993. [2] S.-B. Park ad B.-T. Zhag, Text Chukig by Combiig Had-Crafted Rules ad Memory-Based Learig, I Proceedigs of the 4st Aual Meetig of the Associatio for Computatioal Liguistics, pp. 497--504, 2003. [3] H.-P. Shi, Maximally Efficiet Sytactic Parsig with Miimal Resources, I Proceedigs of the Coferece o Hagul ad Korea Laguage Iformatio Processig, pp. 242-244, 999. (I Korea) [4] H.-J. Lee, S.-B. Park, S.-J. Lee, ad S.-Y Park, Clause Boudary Recogitio Usig Support Vector Machies, I Proceedigs of the 9th Pacific Rim Iteratioal Coferece o Artificial Itelligece, pp. 505--54, 2006. 99

[5] X. Carreras ad L. Marquez, Boostig Trees for Clause Splittig, I Proceedigs of the 5 th Coferece o Computatioal Natural Laguage Learig, pp. -3, 200. [6] A. Molia ad F. Pla, Clause Detectio usig HMM, I Proceedigs of the 5 th Coferece o Computatioal Natural Laguage Learig, pp. 70-72, 200. [7] K. Uchimoto, S. Sekie, ad H. Isahara, Japaese Depedecy Structure Aalysis Based o Maximum Etropy Models, I Proceedigs of the 9th Coferece of the Europea Chapter of the Associatio for Computatioal Liguistics, pp. 96-203, 999. [8] T. Kudo ad Y. Matsumoto, Japaese Depedecy Structure Aalysis Based o Support Vector Machies, I Proceedigs of the Joit SIGDAT Coferece o Empirical Methods i Natural Laguage Processig ad Very Large Corpora, pp. 8-25, 2000. [9] J. Gao ad H. Suzuki, Usupervised Learig of Depedecy Structure of Laguage Modelig, I Proceedigs of the 4st Aual Meetig of the Associatio for Computatioal Liguistics, pp. 52-528, 2003. [0] T. Utsuro, S. Nishiokauama, M. Fujio, ad Y. Matsumoto, Aalyzig Depedecies of Japaese Subordiate Clauses based o Statistics of Scope Embeddig Preferece, I Proceedigs of the st Coferece o North America Chapter of the Associatio for Computatioal Liguistics, pp. 0-7, 2000. [] H.-J. Lee, S.-B. Park, S.-J. Lee, ad S.-Y Park, Clause Boudary Recogitio Usig Support Vector Machies, I Proceedigs of the 9th Pacific Rim Iteratioal Coferece o Artificial Itelligece, pp. 505-54, 2006. [2] N. Cristiaii ad J. Shawe-Taylor, A Itroductio to Support Vector Machies ad Other Kerel-based Learig Methods, Cambridge Uiversity Press, 2000. [3] T. Joachims, Text Categorizatio with Support Vector Machies: Learig with May Relevat Features, I Proceedigs of the Europea Coferece o Machie Learig, pp. 37--42, 998. [4] T. Joachims, Makig Large-Scale SVM Learig Practical, LS8, Uiversitaet Dortmud, 998. 00