A Comparison of Two Text Representations for Sentiment Analysis

Similar documents
Probabilistic Latent Semantic Analysis

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Linking Task: Identifying authors and book titles in verbose queries

Using dialogue context to improve parsing performance in dialogue systems

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Switchboard Language Model Improvement with Conversational Data from Gigaword

Matching Similarity for Keyword-Based Clustering

Multilingual Sentiment and Subjectivity Analysis

Cross Language Information Retrieval

AQUA: An Ontology-Driven Question Answering System

Reducing Features to Improve Bug Prediction

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

A Case Study: News Classification Based on Term Frequency

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Assignment 1: Predicting Amazon Review Ratings

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Python Machine Learning

Rule Learning With Negation: Issues Regarding Effectiveness

Australian Journal of Basic and Applied Sciences

Universiteit Leiden ICT in Business

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

Lecture 1: Machine Learning Basics

Learning Methods for Fuzzy Systems

Parsing of part-of-speech tagged Assamese Texts

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Advanced Grammar in Use

Ensemble Technique Utilization for Indonesian Dependency Parser

Formulaic Language and Fluency: ESL Teaching Applications

Using Web Searches on Important Words to Create Background Sets for LSI Classification

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Vocabulary Usage and Intelligibility in Learner Language

1. Introduction. 2. The OMBI database editor

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

Evolution of Symbolisation in Chimpanzees and Neural Nets

Speech Recognition at ICSI: Broadcast News and beyond

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

BULATS A2 WORDLIST 2

Metadiscourse in Knowledge Building: A question about written or verbal metadiscourse

Loughton School s curriculum evening. 28 th February 2017

Robust Sense-Based Sentiment Classification

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

A Bayesian Learning Approach to Concept-Based Document Classification

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Language Independent Passage Retrieval for Question Answering

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Extracting and Ranking Product Features in Opinion Documents

Handling Sparsity for Verb Noun MWE Token Classification

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

Corpus Linguistics (L615)

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Developing Grammar in Context

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

On the Combined Behavior of Autonomous Resource Management Agents

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Australia s tertiary education sector

Learning Methods in Multilingual Speech Recognition

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Word Segmentation of Off-line Handwritten Documents

arxiv: v1 [cs.lg] 3 May 2013

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

Conversational Framework for Web Search and Recommendations

Procedia - Social and Behavioral Sciences 154 ( 2014 )

An Interactive Intelligent Language Tutor Over The Internet

The Smart/Empire TIPSTER IR System

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Rule Learning with Negation: Issues Regarding Effectiveness

Cross-lingual Short-Text Document Classification for Facebook Comments

What the National Curriculum requires in reading at Y5 and Y6

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume 11 : 12 December 2011 ISSN

Determining the Semantic Orientation of Terms through Gloss Classification

Learning From the Past with Experiment Databases

Georgetown University at TREC 2017 Dynamic Domain Track

Speech Emotion Recognition Using Support Vector Machine

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Some Principles of Automated Natural Language Information Extraction

Cross-Lingual Text Categorization

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Online Updating of Word Representations for Part-of-Speech Tagging

Automatic document classification of biological literature

SARDNET: A Self-Organizing Feature Map for Sequences

TABE 9&10. Revised 8/2013- with reference to College and Career Readiness Standards

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

Transcription:

010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational Software Guangzhou University Guangzhou, China xiong.ong@gmail.com Andy Dong Design Lab University of Sydney Sydney, Australia a.dong@arch.usyd.edu.au Abstract This paper compares two representations of text within the same experimental setting for sentiment orientation analysis, and in particular focuses on the sensitivity of the analysis to sentence. The two representations compared in this paper are bag-of-words (BoW) and nine dimensional vector (9Dim). The former represents text with a high dimensional feature vector, which ignores grammatical structure and is lexicon-dependent. In contrast, the 9Dim representation encodes grammatical knowledge of clauses in sentences into a compact nine dimensional vector, which is lexicon-independent. Text is composed by multiple sentences since the grammatical structure of a single sentence or clause may not provide sufficient information for sentiment orientation classification. A convenient way to enrich grammatical knowledge in a text is to compose the text with multi-sentences, thereby ening the sample. We consider the of text is an important factor in text classification. The aim of this paper is to demonstrate how text sentiment orientation classifiers performance is improved when the of the sentence comprising a training vector is varied. The experimental results indicated that the accuracy of the classifiers benefits from the increasing of the text s, and the results also illustrated that the 9Dim method can provide comparable results to BoW under the same sentiment classification algorithm, support vector machines (SVM). Keywords-sentiment analysis; text representations; bagofwords; 9Dim I. INTRODUCTION Sentiment analysis concentrates on classifying documents according to their opinion and emotions expressed by their authors. Judging a document's orientation as positive or negative is a common two-class problem in sentiment analysis [1-3], which is also known as sentiment orientation analysis in text classification. With the expansion of internet, text classification has been found to be helpful in many respects. In some on-line e- commerce companies' websites, visitors or customers are encouraged to leave their comments or feedback. Summarising these comments or feedback with sentiment orientation analysis technology would help the websites to stimulate suppliers and interest potential suppliers and customers, or even appeal to value-adding [3]. Text classification is also helpful in information retrieval [4] and could be applied to recognise and filter spam. Furthermore, by analysing on-line communication such as on-line forums, chat rooms, newsgroups, and Dark Web [5], sentiment orientation analysis could provide support in tracking extremist groups, terrorists and hate groups []. In this paper, we investigate the effectiveness of adopting the support vector machines classification algorithm to the sentiment orientation analysis problem. An interesting issue of this problem is the relationship between the performance of text classifier and the of training text. For example, what is the optimal for a training text example? With that, a trained text classifier for sentiment orientation analysis would perform with the best performance and less memory space. Thus, apart from showing the results from the two text representation methods, we also analyse this issue to acquire a deeper understanding of the results we obtained. The remainder of this paper is organised as follows. Section II presents a review of related works in sentiment orientation analysis. Section III will identify research gaps and questions. Data collection and processing of design text is described in Section IV. Section V presents experiments used to compare the BoW and 9Dim methods. Section VI concludes with closing remarks. II. RELATED WORK In recent studies, the BoW text representation method has enjoyed much attention and achieved outstanding performance in sentiment analysis of semantic orientation in natural language []. The BoW representation expresses text without considering word order or word usage. BoW represents a document in this way: initially, a feature list (wordlist) is composed with all the words in the corpus the document belongs to. A numerical vector is created to represent the given document; each entry of the vector corresponds to a word which is contained in the wordlist. The vector is initialised as a zero-vector when a word is contained in both the document and the wordlist. The corresponding entry of the vector is labelled with 1 for the appearance, or a higher positive integer to indicate the frequency of the word in the document. The BoW method has been successfully adopted in both natural language processing [6, 7] and computer vision [8]. Let us take the following example from a document with two clauses (documents): (A) It is a great masterpiece. and (B) Martin is a good designer. The composed wordlist is {Martin, It, is, a, 978-1-444-737-6/$6.00 C 010 IEEE V11-35

010 International Conference on Computer Application and System Modeling (ICCASM 010) good, great, designer, masterpiece}. The representation vector for clause A is [0 1 1 1 0 1 0 1], and [1 0 1 1 1 0 1 0] is for clause B. Pang [3] demonstrated a system with the BoW representation and the common machine learning classification algorithm support vector machines (SVM) as a two-class classification problem. They compared Naїve Bayes, Maximum Entropy Classification and SVM classification techniques, machine learning methods known to be successful at topic classification tasks, to the semantic orientation of movie reviews. Within the theory of systemicfunctional linguistics, Martin and White [9] provide a rigorous, network-based model for sentiment, which linguists characterise as the construal of emotions and interpersonal relations in language. The model has been partially implemented [10] and the performance has improved about 7% compared to results reported by other researchers on the same data set, movie reviews. Given the superior performance of support vector machines in sentiment orientation analysis, we have opted to use support vector machines as the machine learning formalism for sentiment orientation analysis. The issue becomes one of ascertaining the best representation of the text for the machine learning algorithm. In our previous research [11, 1], we show how to represent a document by a nine dimension numerical vector (9Dim), which encodes grammatical knowledge and taxonomical semantic information of being about activities (Process), about objects (Product) or about agents (People), but is otherwise lexicon independent. The structure of the 9Dim vector is shown as follows: [ PCN PCV PCA PDN PDV PDA PPN PPV PPA] where PC = Process, PD = Product, PP = People, N = Noun, V = verb, and A = adjective/adverb. According to the noun's pertinence ratio with Process, Product or People in a design context [11, 13], a weight is distributed into each of these three categories. This pertinence ratio is a 3Dim numerical vector and named as K. For each rated sentence, part-of-speech tagging [14] will provide the phrase structure trees and typed dependency in order to obtain the grammatical relationships. A noun-based clustering algorithm is then applied. The basic idea is to identify every noun in a sentence and put all verbs and modifiers (adjectives and adverbs) connected to the noun together with it. Each noun is looked up (queried) in the WordNet [15] lexicographer database to ascertain the logical grouping that might indicate the appropriate category (Product, Process, People) for the word. The WordNet lexicographer database and their syntactic category and logical groupings were used to categorise words (nouns) as being about Product, Process or People. Verbs, adjectives and adverbs are categorised according to the category(ies) of the noun they relate to grammatically. These clusters of syntactically related words are called word groups. For the noun in each word group, rules were applied to identify which of the WordNet logical groupings would contain nouns in the categories [17]. Two correction factors are multiplied with the count of the frequency of occurrence of a word in the target clause applied: K, which is inversely 1 proportional to the number of possible Process-Product- People categories a WordNet logical grouping can belong to; and K mentioned above, since the correction factor K for a word may have up to three value, it is normally expressed as a vector of the form K ( word) = [ K, PC K ],, PD K,, PP,. The semantic orientation (SO) of the words in each word group is calculated using the SO-PMI measure, which is in turn based on their pointwise mutual information (PMI) [14]. The strategy for calculating the SO-PMI is to calculate the log-odds ((1)) of a canonical basket of positive (Pwords) or negative (Nwords) words appearing with the target word on the assumption that if the canonical good or bad word appears frequently with the target word then the target word has a similar semantic orientation. The log odds that two words co-occur: p( word1& word) PMI IR( word1, word) = log (1) pword ( 1) pword ( ) In this study, we used a Google query with the NEAR operator to look up the co-occurrence of the target word with the canonical basket of positive and negative words. The SO- PMI based on the NEAR operator is described by (). The semantic orientation of word based on mutual cooccurrence with a canonical basket of positive and negative words: Π pword Pwordshits( wordnearpword ) Π nword Nwords hits( nword ) SO PMI ( word ) = log Π ( ) Π ( ) nword Nwordshits pword nword Nwords hits wordnearnword () We selected a basket of 1 canonical positive and negative words. Adjectives and adverbs were selected based on most frequent occurrence in written and spoken English according to the British National Corpus [11]. Because this list is published separately, we joined both lists and ordered them by frequency per million words. We selected only those adjectives and adverbs which were judged positive or negative modifiers according to the General Inquirer corpus [http://www.wjh.harvard.edu/~inquirer/]. The basis for the selection of these frequently occurring words as the canonical words is the increased likelihood of finding documents which contain both the canonical word and the word for which the PMI IR is being calculated. This increases the accuracy of the SO-PMI measurement. Table I lists the canonical Pwords and Nwords and their frequency per million words. The SO-PMI of all unigrams (noun, verb, modifiers) in the target lexicon are pre-calculated and saved in a database to speed up the analysis. A rated sentence is processed with both of grammatical and semantic analyses. When all word-clusters in a sentence are processed, a complete 9-dimensional vector is generated. For detailed description and implementation of a complete 9- dimensional vector, please refer to one of our previous papers [11]. V11-36

010 International Conference on Computer Application and System Modeling (ICCASM 010) TABLE I. CANONICAL POSITIVE AND NEGATIVE WORDS Positive Words Negative Words good (176) bad (64) well (1119) difficult (0) great (635) dark (104) important (39) cold (103) able (304) cheap (68) III. clear (39) dangerous (58) RESEARCH GAPS AND QUESTIONS Based on the literature review in previous research, we have identified several important research gaps. The BoW method ignores semantic knowledge about words and grammar. It relies on a very high-dimension representation that hinges on training a system on a text domain which contains a high coverage of words that are likely to appear in the target corpus. On the other hand, the 9Dim representation embeds grammatical knowledge in a lower dimension vector, which is a lexicon-independent representation. It abstracts lexical knowledge toward potential sentiment content and does not need a connecting lexicon between the training corpora and the target corpora. The differences between the BoW representation and 9Dim create the following research questions. First, does the lower dimensional representation method cost less memory in implementation? If that is the case, that means we can get a text classifier with lower cost memory and comparable ability as its counterpart with higher dimension representation. Secondly, 9Dim is a sentence-based representation for text, which means a single 9Dim vector comprised of data from multiple sentences can be treated as a training example. Intuitively, the information contained by a simple sentence is less than that embedded in a complex sentence. Text is known to affect text classification. Researchers indicated that the of sentence is an important fact in text classification when each vector represents one sentence [16]. If so, is the of text (paragraph or multi-sentences) that is represented in a single training vector an important factor in text classification as well? IV. DATA COLLECTION AND PROCESSING This section presents the data collection process and processing methods which are used to produce the tagged data for the training and validation of the computational system. To conduct this research, it is necessary to create labelled design text. In this research, we studied text from creative industries engaged in design. That is, we needed to create a new data set consisting of text about design works, the process of designing, and designers, which were labelled for semantic orientation and category. By adopting a popular way in computational linguistics [3] to create data sets, a cohort of three native English speakers with a background in a design-related discipline (e.g., engineering, architecture, and computer science) was tasked with reading and categorising various design texts. The texts included formal and informal design text from various on-line sources and across various design-related disciplines. All design texts were collected by the author. Each coder was paid to classify the texts. The rating cohorts were trained to identify the proper category and its semantic orientation according to the context. Training lasted for one hour. During coding, two of the three coders had to agree on the semantic meaning (category), semantic orientation (orientation), and the value of the orientation, that is, positive or negative. Working in two-hour time blocks, the coders read various design texts, including formal design reports, reviews of designed works, reviews of designers, and transcripts of conversations of designers working together. After the rated text data was collected, spell-checked and grammar-checked, the data was saved in a sentence pool for composing training data and testing data. Data sets generated in this way could be statistically significant, but the difference was small enough to be utterly unimportant. V. ANALYSIS AND EXPERIMENTS RESULTS This section analyses the space complexity of the BoW and the 9Dim methods, then presents the results of the experiments on the appraisal system with different data sets. To compare the two representations' memory cost, the standard method is to compare the space complexity of both. Space complexity is the limiting behavior of the use of memory space of an algorithm when the size of the problem goes to infinity [17]. As discussed before, the implementation of the BoW consists of two steps: 1) compose vector for each paragraph with the wordlist; and ) train and validate a SVM classifier with the represented vectors. For the first step, the space complexity is dependent on the of the wordlist. If it is feature, then the space complexity is O( feature ). For the second step, the space complexity is dependent on light implementation of SVM. SVM is the implementation light adopted in this study; the time complexity of SVM is On ( ) [18], where n is the number of training examples. The total space complexity for the BoW implementation is O = O( feature ) + O( n ). BoW For the 9Dim method, there are three processing steps: 1) part-of-speech tagging; ) look up the K to get the pertinence ratio for each selected word to compose the 9Dim vector; and 3) train and validate a SVM classifier with the represented vectors. For the first step, the space complexity is Om ( ) [14], where m is the number of rated sentences. For the second step, the space complexity is dependent on the of K, K, so it is O. For the third step, it is K the same as the second step of the BoW, On ( ). The total space complexity for 9Dim implementation is O9 = O( m) + O + O( n ). Because n is the same in Dim K V11-37

010 International Conference on Computer Application and System Modeling (ICCASM 010) both O and O, BoW 9Dim Om ( ), OK and O( feature ) are the terms with lower order in their formula and can be ignored. Therefore, O9 Dim = O( n ) and OBoW = O( n ). The 9Dim and the BoW representation methods have the same time complexity. However, in practice, due to the lower feature dimension of the 9Dim representation when compared to the BoW representation, 9Dim has lower memory cost in implementation. The feature dimension of the 9Dim representation is a constant, 9, whereas for BoW it is the of wordlist. We implemented Pang's unigram experiment [3] and applied the same experiment setting on design text semantic orientation classification. Each sentence in the rated sentence pool is represented by a BoW vector with 1111-word wordlist. The represented rated sentence pool is split into two parts, one for composing training examples, and another for validation examples. The size of training or validation examples set is set to be 500. Each training or validation example is comprised by one or more represented rated sentence(s) which are chosen randomly from the training or validation sentence pool. The number of represented rated sentences in a training or validation example is adjusted gradually from one to 0. For each number, five iterations of training and validation are run; the accuracy bars of these experiments are shown in Figure 1 For the 9Dim method experiment, the rated sentence pool is represented by the 9Dim representation. Similar experimental conditions were adopted from the experiment based on the BoW representation. The only difference was that the number of represented rated sentences in a training or validation example was adjusted gradually from one to 80 and the train-validation iteration number is set to be 50. The accuracy bars of these 9Dim experiments are shown in Figure Figure 1 and Figure show that the accuracy of sentiment orientation classification accuracy is improved when the number of represented rated sentences in each example is increased. Figure. 9Dim-based design text sentiment orientation analysis VI. CONCLUSIONS In this paper, we compared two text representation methods for design text in two respects: the space complexity of their implementation; and sentiment orientation classification. The results show that it is possible to encode semantic information and grammatical knowledge into a lower dimension vector to represent text for the purposes of sentiment classification. It also shows that a grammatical knowledge-embedding representation method can provide extra information for the classification algorithm to identify sentiment orientation and thereby reduce the space complexity in implementation. The complexity analysis indicates that the 9Dim representation method is superior to BoW in space complexity in practice, and provides comparable accuracy in classification. Nevertheless, the results from this study also point out that text is also an important factor in text classification. The reason may be that the longer text may contain more information about the semantic orientation features. However, if the text is "too" long, it may include sentences of conflicting sentiment. So, while more sentences per vector for training may be desirable, a fewer number of sentences may be better for the classification stage. ACKNOWLEDGMENT This research was supported under Australian Research Council s Discovery Projects funding scheme (project number DP0557346). The first author would like to thank an early career grant from Guangzhou University. This research was carried out while the first author was studying in the University of Sydney as a PhD student. Figure 1. BoW-based design text sentiment orientation analysis. REFERENCES [1] P.D. Turney, Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews, Proc. of the 40th Annual Meeting on Association for Computational Linguistics (ACL '0), 00, pp. 417-44. [] A. Ahmed, C. Hsinchun, and S. Arab, Sentimen analysis in multiple languages: Feature selection for opnion classification V11-38

010 International Conference on Computer Application and System Modeling (ICCASM 010) in Web forums, ACM Trans. Inf. Syst. 6 (3) (008), pp. 1-34. [3] B. Pang, L. Lee, and S. Vaithyanathan, Thumbs up? sentiment classification using machine learning techniques, Proc. of the Conference on Empirical Methods in Natural Language Processing (EMNLP 0), 00, pp. 79-86. [4] W. Lam, M.E. Ruiz, and P. Srinivasan, Automatic text categorization and its application to text retrieval, IEEE Transactions on Knowledge and Data Engineering, 11(6), 1999, pp. 865-879. [5] H. Chen, Intelligence and Security Informatics for International Security: Information Sharing and Data Mining. New York: Springer-Verlag Inc., 006. [6] T.K. Landauer, P.W. Foltz, and D. Laham, An introduction to latent semantic analysis, Discourse Processes, 5(),1998, pp. 59--84. [7] D.M. Blei, A.Y. Ng, and M.I. Jordan, Latent dirichlet allocation, Journal of Machine Learning Research 3 (003), pp. 993-10. [8] G. Wang, Y. Zhang, and L. Fei-Fei, Using dependent regions for object categorization in a generative framework, Proc. of the 006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, 006, pp. 1597-1604. [9] J.R. Martin, and P.R.R. White, The Language of Evaluation: Appraisal in English, New York: Palgrave Macmillan, 005. [10] C. Whitelaw, N. Garg, and S. Argamon, Using appraisal groups for sentiment analysis, Proc. of the 14th ACM international conference on Information and knowledge management, New York: ACM, 005, pp. 65-631. [11] J. Wang, and A. Dong, A case study of computing appraisals in design text, in J.S. Gero (Ed.), Design Computing and Cognition '08 (DCC'08), Springer Netherlands, 008, pp. 573-59. [1] J. Wang, and A. Dong, How am I doing : computing the language of appraisal in design, Proc. of 16th International Conference on Engineering Design (ICED'07), 007, pp. ICED'07/14. [13] A. Dong, The Language of Design-Theory and Computation. London: Springer, 009. [14] M. Marneffe, B. MacCartney, and C.D. Manning, Generating typed dependency parses from phrase structure parses, Proc. of the IEEE / ACL 006 Workshop on Spoken Language Technology, 006. [15] C. Fellbaum, WordNet: an electronic lexical database, Cambridge: MIT Press, 1998. [16] E. Kelih, P. Grzybek, G. Antić, and E. Stadlober, Quantitative text typology the impact of sentence, Proc. of the 9th Annual Conference of the Gesellschaft für Klassifikation e.v., Berlin Heidelberg: Springer, 006, pp. 38-389. [17] U.S. National Institute of Standards and Technology, Algorithms and Theory of Computation Handbook, CRC Press LLC, 1999. [18] T. Joachims, Making large-scale SVM learning practical in Advances in Kernel Methods -- Support Vector Learning, MIT Press, 1999, pp. 169 184. V11-39