Analysing the Style of Textual Labels in ı Models

Similar documents
Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Context Free Grammars. Many slides from Michael Collins

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

Grammars & Parsing, Part 1:

AQUA: An Ontology-Driven Question Answering System

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

An Interactive Intelligent Language Tutor Over The Internet

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Specifying a shallow grammatical for parsing purposes

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

The stages of event extraction

LTAG-spinal and the Treebank

Prediction of Maximal Projection for Semantic Role Labeling

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

The Role of the Head in the Interpretation of English Deverbal Compounds

BULATS A2 WORDLIST 2

Parsing of part-of-speech tagged Assamese Texts

CS 598 Natural Language Processing

THE VERB ARGUMENT BROWSER

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Compositional Semantics

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Loughton School s curriculum evening. 28 th February 2017

Universiteit Leiden ICT in Business

The Smart/Empire TIPSTER IR System

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Some Principles of Automated Natural Language Information Extraction

Memory-based grammatical error correction

South Carolina English Language Arts

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Proof Theory for Syntacticians

Writing a composition

SUPPORTING TERMINOLOGICAL STANDARDIZATION IN CONCEPTUAL MODELS - A PLUGIN FOR A META- MODELLING TOOL

Learning Computational Grammars

A Comparison of Two Text Representations for Sentiment Analysis

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Using dialogue context to improve parsing performance in dialogue systems

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

On-Line Data Analytics

Nonfunctional Requirements: From Elicitation to Conceptual Models

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

A Graph Based Authorship Identification Approach

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

Functional requirements, non-functional requirements, and architecture should not be separated A position paper

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Derivational and Inflectional Morphemes in Pak-Pak Language

Underlying and Surface Grammatical Relations in Greek consider

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

ScienceDirect. Malayalam question answering system

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

A Computational Evaluation of Case-Assignment Algorithms

Senior Stenographer / Senior Typist Series (including equivalent Secretary titles)

Accurate Unlexicalized Parsing for Modern Hebrew

Applications of memory-based natural language processing

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Mercer County Schools

Words come in categories

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Training and evaluation of POS taggers on the French MULTITAG corpus

Learning Methods in Multilingual Speech Recognition

Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

Procedia - Social and Behavioral Sciences 154 ( 2014 )

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

2.1 The Theory of Semantic Fields

The College Board Redesigned SAT Grade 12

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Survey on parsing three dependency representations for English

Developing Grammar in Context

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

cmp-lg/ Jan 1998

The Ups and Downs of Preposition Error Detection in ESL Writing

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

FIGURE IT OUT! MIDDLE SCHOOL TASKS. Texas Performance Standards Project

Rendezvous with Comet Halley Next Generation of Science Standards

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

NATURAL LANGUAGE PARSING AND REPRESENTATION IN XML EUGENIO JAROSIEWICZ

Ensemble Technique Utilization for Indonesian Dependency Parser

A Domain Ontology Development Environment Using a MRD and Text Corpus

The taming of the data:

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

The Indiana Cooperative Remote Search Task (CReST) Corpus

Emmaus Lutheran School English Language Arts Curriculum

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Transcription:

Analysing the Style of Textual Labels in ı Models Arian Storch 1, Ralf Laue 2, and Volker Gruhn 3 1 it factum GmbH arian.storch@it-factum.de 2 University of Applied Sciences of Zwickau, Department of Information Science ralf.laue@fh-zwickau.de 3 Paluno - The Ruhr Institute for Software Technology, University of Duisburg-Essen volker.gruhn@paluno.uni-due.de Abstract. An important quality aspect for conceptual models (such as ı models) is the quality of textual labels. Naming conventions are aimed to make sure that labels are used in a consistent manner. We present a tool that checks automatically whether a textual label in an ı model adheres to a set of naming conventions. This does not only help to enforce the use of a consistent labelling style, it also helps to detect modelling errors such as goals in ı models that should be softgoals (or vice versa). 1 Introduction ı is a frequently used visual language for modelling the social relationships between actors. The language contains graphical symbols for various concepts (such as goal or task) which have to be used correctly if the model should be useful. In [1], Horkoff et al. analysed 30 ı models from student works and academic papers in order to discover model elements that use a convention contrary to the generally accepted guidelines published in the ı Wiki 1. They found a large number of problems in these models that result from the fact that the wrong type of model element was used. In the 30 models, Horkoff et al. identified 10 problems of the type softgoal should be goal, 15 problems goal should be softgoal, 8 problems task should be softgoal and 7 problems softgoal should be task. We believe that a way to reduce such problems can be to use a consistent labelling style throughout the model. By forcing the modeler to think whether an element should be named reduce waste or waste to be reduced, he or she is also forced to think whether the concept should be represented as a task or as a goal. Furthermore, the usage of common labelling styles can reduce the difficulty to understand a model. 1 http://istar.rwth-aachen.de/tiki-index.php?page=i*+guide

2 Labelling Styles for ı Models We are aware of three papers that suggest conventions for the style of ı element labels. In [2], the authors recommend the style Element type Syntax Example Goal object + be + verb in passive Information is published voice Softgoal quality attribute (+ object or Secure access task) Task verb in infinitve + object Confirm agreement Resource name of the object User access [3] suggests a similar style: Element type Syntax Example Goal subject + be + verb Result be correct Softgoal softgoal [topic] Improve [IT skills] Task verb + object Fill out application form Resource Noun Confirmation [4] suggests the labelling style: Element type Syntax Example Goal object + passive verb Information collected Softgoal goal syntax + complement (object) complement Information checked quickly Timely [Virus List] ([Dependum]) Task verb (+ object) (+ complement) Answer doubts by e-mail Resource (adjective +) object Updated Virus List In the above tables, parentheses are used to denote optional elements; brackets are part of the label. To our best knowledge, current ı modelling tools are not able to validate labels with respect to such conventions. However, there is a great amount of work on checking the labels in business process models. When developing our approach for style checks of ı labels, we made use of the experiences with tools developed in this context (see [5 7]; the last reference contains an overview on more papers on the topic). Although the styles shown in the above tables seem to be quite similar, a closer look shows that there are subtle differences. Let s assume that a Resource label should be a name of the object. This is more restrictive than saying it should be a noun, because nouns can be accompanied by articles, attributive and adjectives. In that case mail would be valid, but sent mail won t. Similarly, if we would require that a Softgoal should rather be labeled with a quality attribute than with a Goal syntax, usable would be valid, but not User interface is usable. For this reason, we decided to regard a label as having the correct style if it adheres to the following superset of style rules:

Element type Syntax Example Goal object + passive verb Trip advice is provided Softgoal quality attribute (+ object) (+ complement) goal syntax (+ complement) Precise information Document is sent securely Task verb (in present form) + object Use back-end user interface (+ complement) Resource object Route card 3 Label-Checking Algorithm and Patterns 3.1 Third-Party Frameworks for Natural Language Processing To analyse, process and validate a label, we first need to know what kinds of words it contains. This process is called part-of-speech (POS) tagging. POStaggers typically combine lexical databases with statistical algorithms to determine the kind of a word, a part-of-speech or even a phrase within a sentence[8]. One of the most popular POS tagger is the Stanford Parser 2 which is contained in a toolset developed by the Stanford Natural Language Processing Group 3. Another frequently used tool is WordNet[9] 4, a lexical database which provides information about semantic and lexical relations between words. A combined API for both frameworks is provided by a tool called qap (quality assurance project) which is currently developed by the bflow* Toolbox team 5. The API provides an easy access to the Stanford Parser and WordNet. Using that API, we could focus on the implementation of the labelling style checks which will be discussed in the following subsections. 3.2 Our Approach Our goal is to compare the label of an ı model element to the style rules for this element type. For this purpose, we make use of the Stanford Parser, a statistical parser that works out the grammatical structure of sentences. It can recognize which groups of words go together as phrases and which words are the subject or object of a verb 2. The parser provides viable results when the input is a complete sentence. However, this is unfortunately often not the case for a label in an ı model. Assume that a Resource is labeled with log message. This phrase is ambiguous, because log can be either a verb or a subject. To determine the correctness, the parser needs more context information, which we can derive from the element type. By analysing log message only, the parser will find that log is a verb and message is a noun. In that case, it cannot recognize that this phrase is 2 http://nlp.stanford.edu/software/lex-parser.shtml 3 http://www-nlp.stanford.edu/ 4 http://wordnet.princeton.edu/ 5 http://www.bflow.org/

an object which is valid for a Resource label. Similar problems exist with other element types. To deal with such problems, we have decided to add additional words to the labels, thus trying to create complete sentences. Our general approach can be summarized as follows: First, we complement the label with a prefix that depends on the element type such that for correctly named labels, we get a complete sentence. For the label to be valid, the resulting sentence has to be syntactically correct. In the next step, the Stanford Parser processes the sentence and creates the so-called phrase structure tree of the sentence. It assigns to each word a part-of-speech (POS) tag such as CC (coordinating conjunction), DT (determiner), EX (existential there), IN (preposition), JJ (adjective), NN (noun), VB (verb), VBN (Verb, past participle) and VBZ (verb, 3rd person singular present)[10]. We define a pattern of valid sequences of POS tags for each type of modelling element. The label is regarded as valid if its POS tags match this pattern, allowing that the label may contain additional words after the pattern (this way both Test database as Test database for consistency would be regarded as a task). We describe the style rules by a pattern in the Extended Backus-Naur Form. At first, we describe the POS tags of an Object. Examples of valid objects and the corresponding sequences of POS tags are: Label Sequence of POS tags bank NN test run NN NN list of credit cards NN IN NN NN list of valid credit cards NN IN JJ NN NN summarized balance sheet JJ NN NN This is expressed by the following rules: NNSEQ := NN, {NN}; (examples: bank, test run ) JNS := [JJ, {JJ}], NNSEQ; (examples: cheque, valid cheque ) OBJECT := [DT], JNS, [IN, JNS]; (example: a complete list of valid credit cards ) Resource style check Given a label L, we complement it with the prefix There is a and the suffix. (period), i.e. the parser analyses the (potential) sentence There is a L.. We conclude that the label is correct, if L matches the pattern of an object. We use There is as prefix in order to increase the probability of identifying is as the only verb of the sentence. For instance, if we have to validate the label summarized balance sheet, summarized (without this prefix) would be wrongly tagged as verb. Task style check Given a label L, the parser analyses the (potential) sentence I L.

Fig. 1. Result of a Style Check in openome L is required to match the pattern of a Task: TASK := VB, [IN], OBJECT; (examples: Test database, Add to score ) Goal style check A Goal label is required to match the pattern: GOAL := OBJECT, ( is are ), VBN; (example: Results are corrected ) Softgoal style check Because there are two competing rules, the validation is done in two steps. First, the label is validated by the quality attribute rule. For this purpose, the parser analyses the (potential) sentence It is L.. Second, it is checked whether the label is a goal, followed by an arbitrary complement. L is required to match the pattern of a Softgoal (possibly followed by an arbitrary complement): SG := QA GOAL; (quality attribute or goal) QA := JJ, {JJ}, [OBJECT]; (example: inexpensive delivery ) Fig.1 shows the validation result of a model from our tool within the ı modelling tool openome 6. 6 http://www.cs.toronto.edu/km/openome/

4 Conclusion In this paper, we presented a set of POS tag patterns that can be used to validate the adherence of ı labels to a set of style recommendations. We derive the correctness of a label from its element type and the related style recommendation. We achieved reliable results by complementing the given (normally short) labels to (potentially) whole sentences. These patterns have been implemented using the tool qap that provides an API to WordNet and the Stanford Parser. Our tool prototype is both extensible as configurable. It is easy to change our patterns, remove or add new ones. Though we tested our patterns within openome, there is no technical dependency between openome and our prototype. qap and our style checks can be used with any other tool as well. A drawback we observed quite often is an incorrect spelling which flaws the reliability of the checks. For instance, relevant advices will lead to another result than relevant advises because the POS tagger cannot identify the word correctly. In future, we plan to add additional linguistic analysis functionality to our tool in order to make more sophisticated analysis possible. References 1. Horkoff, J., Elahi, G., Abdulhadi, S., Yu, E.: Reflective analysis of the syntax and semantics of the i* framework. In: Advances in Conceptual Modeling, Challenges and Opportunities. Volume 5232 of LNCS. Springer (2008) 249 260 2. de Pádua Albuquerque Oliveira, A., do Prado Leite, J.C.S., Cysneiros, L.M.: Using i* meta modeling for verifying i* models. In: 4th International i* Workshop4. (2010) 76 80 3. de Pádua Albuquerque Oliveira, A., Cysneiros, L.M.: Defining strategic dependency situations in requirements elicitation. In: Workshop em Engenharia de Requisitos. (2006) 12 23 4. Martínez, C.P.A.: Systematic Construction Of Goal-Oriented COTS Taxonomies. PhD thesis, Universitat Politecnica de Catalunya (2008) 5. Becker, J., Delfmann, P., Herwig, S., Lis, L., Stein, A.: Formalizing linguistic conventions for conceptual models. (2009) 70 83 6. Leopold, H., Smirnov, S., Mendling, J.: On the refactoring of activity labels in business process models. Inf. Syst. 37 (2012) 443 459 7. Leopold, H., Eid-Sabbagh, R.H., Mendling, J., Azevedo, L.G., Baião, F.A.: Detection of naming convention violations in process models for different languages. Decision Support Systems 56 (2013) 310 325 8. Megyesi, B.: Shallow parsing with pos taggers and linguistic features. The Journal of Machine Learning Research 2 (2002) 639 668 9. Fellbaum, C., ed.: WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press (1998) 10. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: The Penn Treebank. Comp. linguistics 19 (1993) 313 330