Available online at ScienceDirect. Procedia Computer Science 72 (2015 )

Similar documents
ScienceDirect. Malayalam question answering system

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Procedia - Social and Behavioral Sciences 154 ( 2014 )

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Procedia - Social and Behavioral Sciences 146 ( 2014 )

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

AQUA: An Ontology-Driven Question Answering System

Problems of the Arabic OCR: New Attitudes

Speech Emotion Recognition Using Support Vector Machine

Laporan Penelitian Unggulan Prodi

ScienceDirect. Noorminshah A Iahad a *, Marva Mirabolghasemi a, Noorfa Haszlinna Mustaffa a, Muhammad Shafie Abd. Latif a, Yahya Buntat b

Parsing of part-of-speech tagged Assamese Texts

Some Principles of Automated Natural Language Information Extraction

Procedia - Social and Behavioral Sciences 143 ( 2014 ) CY-ICER Teacher intervention in the process of L2 writing acquisition

Software Maintenance

A Comparison of Two Text Representations for Sentiment Analysis

Using dialogue context to improve parsing performance in dialogue systems

Research Journal ADE DEDI SALIPUTRA NIM: F

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA 2013

UNIVERSITY ASSET MANAGEMENT SYSTEM (UniAMS) CHE FUZIAH BINTI CHE ALI UNIVERSITI TEKNOLOGI MALAYSIA

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Lexical Collocations (Verb + Noun) Across Written Academic Genres In English

Vocabulary Usage and Intelligibility in Learner Language

Implementing a tool to Support KAOS-Beta Process Model Using EPF

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

BULATS A2 WORDLIST 2

Procedia - Social and Behavioral Sciences 136 ( 2014 ) LINELT 2013

Textbook Evalyation:

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

Abdul Rahman Chik a*, Tg. Ainul Farha Tg. Abdul Rahman b

Quality Framework for Assessment of Multimedia Learning Materials Version 1.0

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Rule Learning With Negation: Issues Regarding Effectiveness

Procedia - Social and Behavioral Sciences 209 ( 2015 )

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Guidelines for Writing an Internship Report

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 -

SIMILARITY MEASURE FOR RETRIEVAL OF QUESTION ITEMS WITH MULTI-VARIABLE DATA SETS SITI HASRINAFASYA BINTI CHE HASSAN UNIVERSITI TEKNOLOGI MALAYSIA

Procedia - Social and Behavioral Sciences 98 ( 2014 ) International Conference on Current Trends in ELT

Physical and psychosocial aspects of science laboratory learning environment

Procedia - Social and Behavioral Sciences 237 ( 2017 )

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

Using interactive simulation-based learning objects in introductory course of programming

Automating the E-learning Personalization

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing

CEFR Overall Illustrative English Proficiency Scales

Procedia - Social and Behavioral Sciences 180 ( 2015 )

Taxonomy of the cognitive domain: An example of architectural education program

SEMAFOR: Frame Argument Resolution with Log-Linear Models

STUDENTS SATISFACTION LEVEL TOWARDS THE GENERIC SKILLS APPLIED IN THE CO-CURRICULUM SUBJECT IN UNIVERSITI TEKNOLOGI MALAYSIA NUR HANI BT MOHAMED

Rule Learning with Negation: Issues Regarding Effectiveness

Procedia - Social and Behavioral Sciences 197 ( 2015 )

A sustainable framework for technical and vocational education in malaysia

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) Feb 2015

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Linking Task: Identifying authors and book titles in verbose queries

Mining Association Rules in Student s Assessment Data

Proof Theory for Syntacticians

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Written by: YULI AMRIA (RRA1B210085) ABSTRACT. Key words: ability, possessive pronouns, and possessive adjectives INTRODUCTION

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

The College Board Redesigned SAT Grade 12

Institutional repository policies: best practices for encouraging self-archiving

Ensemble Technique Utilization for Indonesian Dependency Parser

Cross Language Information Retrieval

Secondary English-Language Arts

Generation of Referring Expressions: Managing Structural Ambiguities

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Continuing Education for Professional Development at UTMSPACE - Experience, Development and Trends

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Management of time resources for learning through individual study in higher education

Nearing Completion of Prototype 1: Discovery

November 2012 MUET (800)

ROSETTA STONE PRODUCT OVERVIEW

Compositional Semantics

Phonological and Phonetic Representations: The Case of Neutralization

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Professional Development Guideline for Instruction Professional Practice of English Pre-Service Teachers in Suan Sunandha Rajabhat University

Effects of connecting reading and writing and a checklist to guide the reading process on EFL learners learning about English writing

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

USING VOKI TO ENHANCE SPEAKING SKILLS

Abstractions and the Brain

L1 and L2 acquisition. Holger Diessel

Matching Similarity for Keyword-Based Clustering

Visit us at:

GUIDELINES FOR PRACTICUM REPORT

Different Requirements Gathering Techniques and Issues. Javaria Mushtaq

A Case Study: News Classification Based on Term Frequency

GARIS PANDUAN BAGI POTONGAN PERBELANJAAN DI BAWAH PERENGGAN 34(6)(m) DAN 34(6)(ma) AKTA CUKAI PENDAPATAN 1967 BAGI MAKSUD PENGIRAAN CUKAI PENDAPATAN

INCREASING STUDENTS ABILITY IN WRITING OF RECOUNT TEXT THROUGH PEER CORRECTION

Senior Stenographer / Senior Typist Series (including equivalent Secretary titles)

DOES OUR EDUCATIONAL SYSTEM ENHANCE CREATIVITY AND INNOVATION AMONG GIFTED STUDENTS?

Transcription:

Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 72 (2015 ) 261 268 The Third Information Systems International Conference A Survey on Ambiguity Awareness towards Malay System Requirement Specification (SRS) among Industrial IT Practitioners Hazlina Haron a, Abdul Azim Abdul Ghani b a School of Computing, College of Arts and Sciences, UUM Sintok, 06010 Kedah Malaysia b Faculty of Computer Science and Information Technology, UPM Serdang, 43000 Selangor, Malaysia Abstract We conducted a survey amongst software development s IT practitioners on their awareness towards the ambiguity that occurs in the System Requirement Specification (SRS) written in the Malay language. Previous research shows that there exists an acknowledged and unacknowledged ambiguity when people deal with SRS. Our objective is to investigate and confirm that most likely IT practitioners in Malaysia tend to overlook the potential of ambiguities in the documentation of SRS written in the Malay language. Inaccurate SRS will usually give an impact towards an end product of software systems. We believe it can complicate software development processes and result in an unsatisfying agreement between user and software product team. The result shows that there exist ambiguity and vagueness in industrial Malay textual SRS in the form of multiple interpretations as users faced difficulties in understanding what the requirements of the SRS thereby significantly affecting the designing process. 2015 The Published Authors. Published by Elsevier by Elsevier Ltd. B.V. Selection This is and/or open access peer-review article under under the CC responsibility BY-NC-ND license of the scientific (http://creativecommons.org/licenses/by-nc-nd/4.0/). committee of The Third Information Systems International Conference (ISICO 2015) Peer-review under responsibility of organizing committee of Information Systems International Conference (ISICO2015) Keywords: linguistic ambiguity. malay ambiguity. syntactic analysis. vague concept. malay grammar 1. Introduction Natural language is prevalent in Requirement Specification (RS). However, ambiguity is unavoidable in natural language. Ambiguity in RS is crucial and may cause numerous problems in the process of software development life cycle. Ambiguity is defined as a sentence that leads to more than one interpretation for a single sentence by different readers. Although ambiguous RS may seem insignificant in the document, but it does lead to serious impact at later stages if the errors are not resolved at the start of the process. Common types of bugs that may originate from ambiguous RS are the Design Bug, Functional Bug, Logical Bug, Performance Bug, Requirement Bug and UI Bug [1]. The awareness of ambiguity is still lacking on the part of readers. There are two common scenarios when one deals with *Hazlina Haron. Tel.: +601112217513 Email address: delinn1612@yahoo.com 1877-0509 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of organizing committee of Information Systems International Conference (ISICO2015) doi:10.1016/j.procs.2015.12.139

262 Hazlina Haron and Abdul Azim Abdul Ghani / Procedia Computer Science 72 ( 2015 ) 261 268 textual requirements; a. They noticed occurrences of ambiguity in the sentence but choose to ignore it, and b. They are not aware of occurrences of ambiguity when they read or write textual requirements [2]. An inaccurate textual system requirement significantly impacts the end result of a software system as people become confused people on the exact initial intention. Consider a scenario where a Requirement Engineer (RE) writes an RS based on his understanding of a discussion with users that has been agreed upon by both parties, but it may lead to more than one interpretation on the part of the System Developers or Testers. Although the fault can be rectified by reconfirming it with either the RE or the users, it will interrupt the smoothness of the software development processes. In a study of 11 requirement documents, a total of 3404 sentences out of 26829 sentences contained instances of ambiguity [3]. Although statistically it is low in percentage (12.68%) but that level of misunderstanding may jeopardize the accuracy of the initial intension of users. In a recent similar study, out of 487 sentences taken from requirement statements, 92 sentences has been detected as being ambiguous [1]. 2. Related Works The ability to detect and resolve or minimize ambiguity is important in software development. People who deal with RS may get diverted from the intended meaning by the owner of the system. The people who are usually involved with RS are Business Analysts, Requirement Engineers, System Developers and Quality Assurance Team. Their ability to write an accurate Requirement Statement which will remain as intended until the end process is somewhat arguable. When it is possible to interprete a statement in more than one way, it is defined as being ambiguous [4]. Linguistic ambiguity can come from many sources, among others it includes multiple word senses [5], syntactic and structural ambiguity of sentences [5] such as negations and misuse of quantifiers [6], long-ranged relationship in terms of word referencing [5], [7], imprecise usage of words [5], misconception of word meanings [8], customers do not really know their requirement and the existence of communication and a knowledge gap exist between customers, software engineers and project managers [9]. Throughout previous research, ambiguity has been categorised into a few most common groups. They are lexical ambiguity, syntactical ambiguity, semantic ambiguity, pragmatic ambiguity [8], [10], [11]. The description of the mentioned types of ambiguity is detailed in the Table 1 below. Table 1: Types of Ambiguity and area in NL Ambiguity Type Lexical Syntactic Semantic Pragmatic Description Occurs when a word has several meanings or when two words of different of different origin has the same spelling and phonetics [1], [6], [8], [12]. Occurs when a given sequence of words can penetrate more than one grammatical structure where each has different meaning [1], [6], [8], [13]. Occurs when the sentence has more than one way of reading it within its context although it does not contain syntactix or lexical ambiguity [10]. Occurs when a sentence has several meanings in the context in which it is uttered [10]. Element and area involved in NL The words and terms used in the text. Grammatical Rules and Words Dependency Relationship. Logical representation between words in the sentences. Logical representation of the whole text. Ambiguity and vagueness have similarities, but both have different characteristics. Vagueness occurs when a phrase has a single meaning from a grammatical point of view, but still leaves room for interpretation [10]. For example, The system should respond as fast as possible. The word fast is

Hazlina Haron and Abdul Azim Abdul Ghani / Procedia Computer Science 72 ( 2015 ) 261 268 263 vague in such a way that it creates more than one interpretation. Vagueness is a subset to ambiguity. It provokes ambiguity to happen. A statement in a requirement is said to be ambiguous when there is multiple interpretation of a same sentence by different sets of people. A requirement specification is affected by textual ambiguity when it provokes more than one way of interpreting a statement. For example, the customer enters a card and a numeric personal code. If it is not valid then the ATM rejects the card. This sentence is potentially ambiguous because the word it could refer to two distinct objects. It could refer to either a card or a numeric personal code [6]. Another example is Sistem perlu menghantar tindak balas sebelum petang (System has to send feedback before evening ). The phrase sebelum petang (before the afternoon) holds more than one interpretation as it can be by 6pm or 7 pm or 5 pm. Words can be ambiguous in many ways. [14] categorized linguistic ambiguity into several main groups such as semantic, syntactic, pragmatic and lexical. This has been agreed upon and then being enhanced into other types of ambiguity such as coordination ambiguity [15] and anaphoric ambiguity. The Malay language is used widely in documents throughout Malaysia, Indonesia and Brunei. Many small software companies in Malaysia still use Malay language in their RS. The Malay language structure is very much different from other language such as English that is widely used in the world. For example, the position of verb agreements, the use of articles and comparative discourse are different. Meanings in Malay sentence may vary even though they have the same words, phrases or even sentences. Most of the Malay language structure is dissimilar with English language or other language such as Arabic, Chinese, and Japanese etc. Example, pronouns in Malay is different from English; antecedent he/she clearly defines the gender of a person, while in Malay, dia, baginda could refer to male or female [16]. Part of speech sentence tagging is important because it is one of the common procedures for morphological analysis. The sentences and phrases needs to be parsed into its root form in order to detect its intended meaning [17], [18]. However, unlike western languages, some words in Malay can be tagged into more than one grammatical class. There are words in Malay that seem to correspond to verbs and they are also adverb, nouns can be prepositions. For example, the word telefon can be categorized under noun and also verb class. The word boleh can also be categorized under noun and also verb [19]. Rojas [20] suggested the potential ambiguous word groups can be vague adverbs usually modifying nouns(such as acceptable, high, low, fast, etc), non deterministic adverbs usually modifying verbs (such as continually, periodically, regularly etc, general verbs that reflects inaccurate description (such as process, monitor, support, etc), non deterministic constructs such as and/or, any, not limited to, etc). 3. Methodology We conducted a study to investigate the awareness towards ambiguity and vagueness that usually occurs in textual RS. There are four objectives of the survey which are; i)to observe users perception and interpretation over the sentences in textual requirements, ii)to collect and gather potential vague and ambiguous Malay words from users to be incorporated in a corpus, iii)to investigate the probability of whether the system will be able to be designed based on users understanding and interpretation on given SRS and iv)to assess the level of awareness from industrial IT practitioners towards the occurrence of potential ambiguity SRS written in the Malay language. The approach to achieve the above mentioned objectives is by sending out one set of working SRS written in Malay collected from industry and analyze users understanding towards the functional specification stated. Hence, the ability and capability to design the desired system based on the requirement will be gauged [2], [12]. This approach starts with the first step which is the instrument preparation. We collected 20 sets of SRS written in Malay from the industry. We analyzed the SRS to select the most ambiguous document that aligns with our research scope and definition. We then selected one module from one document by a government research agency

264 Hazlina Haron and Abdul Azim Abdul Ghani / Procedia Computer Science 72 ( 2015 ) 261 268 on Operation Electronic Procurement (E-Perolehan Operasi). Our rationale for choosing only one document is because we wanted to test the original working SRS from industry without the input of any outside words and we do not want to exhaust the respondents with many SRS. We just wanted to observe and analyze the level of understanding the IT practitioners have towards one set of SRS. The document was written by the vendor with the objective of building a system to replace a manual procurement processes. There were a total of nine RS statements about the e-procurement functions, that is to be designed and developed with additional two questions about overall capability and the ability to design the system. The second step is the development of Malay Ambiguous Words (MAW) Corpus. We creates a data repository which we named as Malay Ambiguous Words (MAW) corpus. This corpus contains a list of high potential ambiguous Malay words that were extracted from the collected SRS and has been verified by the linguistic experts. The extraction was based on our constructed Model of Vagueness (MOV) comprises of the characteristics of Malay ambiguous words that were mapped with a list of quality indicators. We also managed to compile four basic criterion of potentially vague Malay words [21]. With our small scale data and a few domains, a number of 120 potentially vague Malay words managed to be extracted, verified and stored in the MAW corpus. We choose linguistic people to be our expert verifier because the scope of this study is on lexical ambiguity. The third step is Selecting the Expertises as the Respondents. The target respondents were the selected expertises that liaise whether directly or indirectly with RS from the industry. Thirteen IT practitioners were selected from software companies in KL where the majority has had in between 11 to 20 years of working experience. They were also selected from both private and public sectors with their main business related to software development. The last step is the implementation and data analysis. The author has made an effort to meet all the 13 domain experts selected to be the respondents for this survey. The author met the respondents on a one to one basis, explained in detail about the survey and what they have to do although the instructions were already stated in the document. The respondents were given one set of working SRS and asked on their understanding and interpretation of the functional specification. At the end of the document, they were also asked on their capability to design the system based on the requirements. 4. Results and Discussion The survey was conducted on thirteen respondents (IT practitioners) who work in either an IT company or IT department. We gave 7 choices of common IT position that are closely related directly or indirectly with SRS as an option of answers. Out of 13 respondents, the majority is the System Analysts (6 people), followed by the Project Managers (4 people). The others are the Programmer and Quality Assurance/Testers and Business Analysts. Eleven respondents have working experience between 11 to 20 years and only 2 have had between 1 to 10 years of working experience. Table 2 below shows the number of interpretations and understanding level for nine requirement statements. A total of 179 interpretations was found overall. For every statement, there were more than one interpretation from different users. The highest number of interpretations can yield for one single statement is 32 interpretations as shown in S4 (statement 4) whereby eight respondents (61%) understood partly of what the statement really wanted. For example, in S4, there are 11 options of answers (A,B,C, D,E,F,G,H,I,J,K) to be selected. The list of answers are all the possible interpretation the statement could be. Respondents can select one or more than one for what they think the interpretation are. This shows

Hazlina Haron and Abdul Azim Abdul Ghani / Procedia Computer Science 72 ( 2015 ) 261 268 265 that more than half of the total number of respondents did not fully understand what the statement requires. This result satisfies the first objective of the study. We have also gathered the feedback on their understanding towards the requirement statements. Referring to Table 3, out of nine statements given, all users stated they understood only part of what the statement wanted while two statements are otherwise. There were two users who did not understand at all what the RS statement wanted. Table 3 shows two highest answers that were selected by the respondents. In S1, there were ten respondents selected answer 1 and four users selected answer 2 where the no of interpretation of these two answers are the highest as compared to other options. In S2, there were 6 interpretations for option answer 1 and 5 interpretations for option answer 2 giving out 11 interpretations overall (55%). There were none shows 1interpretation for one single RS statement. What we try to show here is in some of the requirement statements, no of interpretation could yield to more than 50%. This confirms that there are high possibility RS statements being being interpreted differently by different users. Table 4 is the example of one of the requirement statements (S4) and their list of possible interpretations given. Table 2: Number of Interpretations and Users Understanding Q INTERPRETATION UNDERSTANDING YES PARTLY NO S1 25 6 7 0 S2 20 5 7 1 S3 18 5 7 1 S4 32 4 8 1 S5 20 5 7 1 S6 18 7 4 1 S7 16 6 4 3 S8 12 4 2 1 S9 18 5 8 1 T 179 47 54 10 *note: Q=question, S*= no of requirement statements, *Yes = respondents totally understood the statement, Partly = respondents understood only some parts of the statements, No = respondents did not understand at all. Table 3: Percentage of Interpretations for Two Highest no of Interpretations Q Highest & 2 nd highest no. of interpretation TOTAL Answer 1 Answer 2 No. % S1 10 4 14 56 % S2 6 5 11 55 % S3 6 5 11 61 % S4 8 7 15 47 % S5 8 4 12 60 % S6 6 3 9 50 % S7 6 4 10 63 % S8 4 3 7 58 % S9 6 5 11 61 % T 60 40 100 56 % *note: Q=question, S*= no of requirement statements

266 Hazlina Haron and Abdul Azim Abdul Ghani / Procedia Computer Science 72 ( 2015 ) 261 268 Table 4: Example of requirement statement and list of possible interpretations. STATEMENT 4 (S4) Pelulus Ketua Projek meluluskan permohonan operasi dan memilih pelulus Pengarah Bahagian Pusat Khidmat. INTERPRETATIONS A. Ketua Projek meluluskan permohonan operasi melalui paparan maklumat di dalam sistem B. Ketua Projek meluluskan permohonan operasi melalui email yang dihantar dari sistem C. Ketua Projek meluluskan permohonan operasi melalui notifikasi SMS yang dihantar oleh sistem D. Ketua Projek meluluskan permohonan operasi melalui email dan juga paparan maklumat di dalam sistem E. Ketua Projek meluluskan permohonan operasi melalui notifikasi SMS dan paparan maklumat di dalam sistem F. Ketua Projek meluluskan permohonan operasi melalui paparan maklumat di dalam sistem dan notifikasi SMS yang dihantar oleh sistem G. Ketua Projek meluluskan permohonan operasi melalui paparan maklumat di dalam sistem, email dan notifikasi SMS H. Ketua Projek memilih pelulus melalui senarai pelulus yang dipaparkan di dalam sistem I. Ketua Projek memilih pelulus dengan menginput nama dan jawatan pelulus J. Lain-lain (nyatakan): K. Tiada jawapan di atas We asked the users opinion on the occurrence of ambiguity for each RS statement. The result is depicted in Table 5 below. We are interested to observe the users awareness of ambiguity and their ability to design on the SRS given to satisfy the third objective of the study. In all of nine requirement statements, the majority (more than 50%) stated that there are ambiguity instances in each of the statements and only one user answered otherwise. However, there were also a number of users who did not answer either. In the perception of the users ability to design the functions wanted by statements, the number of users that were able to design outnumbered the users who can design, but nevertheless more than 60% of the total number of users did not answer the question. We made an assumption that they either overlooked the question or they themselves were not sure whether or not they can design. Our third objective is hereby satisfied. Table 5: Users Awareness of Ambiguity and Ability To Design Q AMBIGUOUS? CAN DESIGN? YES NO NULL YES NO NULL S1 9 60% 2 15% 2 15% 1 8% 2 15% 10 77% S2 11 85% 2 15% 0 0% 0 0% 4 31% 9 60% S3 11 85% 2 15% 0 0% 1 8% 3 23% 9 60% S4 6 46% 3 23% 4 31% 1 8% 3 23% 9 60% S5 9 60% 2 15% 2 15% 0 0% 2 15% 11 85% S6 8 62% 3 23% 2 15% 1 8% 3 23% 9 60% S7 8 62% 2 15% 3 23% 0 0% 3 23% 10 77% S8 6 46% 1 8% 6 46% 0 0% 1 8% 12 92% S9 5 38% 6 46% 2 15% 1 8% 1 8% 11 85% *note: Q=question, S*=lines of requirement statements Table 6 below shows the comparison of interpretations between respondents and our data in MAW Corpus. There are seven potential ambiguous words that respondents did not manage detect. However, there were four words that respondents think are ambiguous where the words are not recorded in our corpus. We will carefully consider all the highlighted potential ambiguous words from this survey to be included in our MAW corpus for our next task as they are genuinely marked by the respondents. The number in column Frequency Selected by Users shows that the higher the frequency is, the higher the possibility of the word being ambiguous. The comparison of words detected shows the users level of

Hazlina Haron and Abdul Azim Abdul Ghani / Procedia Computer Science 72 ( 2015 ) 261 268 267 ambiguity awareness as well as what our corpus lack. Hence, we reached the second and the fourth objective of this study. Table 6: List of Ambiguous Words Detected by Users Potential Ambiguous Malay Words/Phrase Frequency Perceived by Users Comparison with MAW Corpus Maklumat 2 maklumat permohonan perolehan operasi 4 (maklumat, permohonan, operasi) Pengguna 3 Operasi 1 Permohonan 1 Menghantar 8 X sistem menghantar permohonan 4 pelulus 0 perolehan 0 menghantar permohonan 2 X Meluluskan 2 meluluskan permohonan operasi 2 Notifikasi 2 notifikasi untuk kelulusan 2 (Notifikasi, untuk) Mengeluarkan 7 X mengeluarkan local order 3 X Menerima 6 menerima local order 2 (Menerima) mengemaskini maklumat 5 untuk 0 dan 0 khidmat 0 sekiranya 0 dengan 0 The overall result shows that there exists the instances of ambiguity in the Malay RS from the industry. The number of interpretations which is more than one interpretation confirms that in Malay written textual documents, specifically SRS, ambiguity does occur. Users tend to mislead and misunderstand the real meaning what the SRS really wanted and it differs from one another users. Users have some difficulties in designing the functions stated in the requirements. We understand that failing to understand and design a software system would always affect the end product. The next stage after system testing is User Acceptance Test (UAT) and this is where the customers would complain about the newly built system. Hence, more often than not, developers and system analysts needed to redesign or customize and modify the already built system. This will promote cost bursting and budgeting. 5. Conclusion and Future Work While research in this area has been dominated by studies on the English language, this research intend to focus on Malay language documents. In this paper, we presented the results from our survey on the level of understanding and number of interpretations one SRS could have by industrial IT practitioners. Through the survey, we conclude that IT practitioners do have the tendency not to notice the occurrence of ambiguity particularly in SRS. It also shows that there exists a high possibility of having

268 Hazlina Haron and Abdul Azim Abdul Ghani / Procedia Computer Science 72 ( 2015 ) 261 268 multi-interpretation among several readers on a same sentence is proofed. Despite of users awareness of the ambiguities in the requirements, there are more than 50% of the users still unaware of the occurrence. The needs of having a tool or technique to assist in handling and minimizing the problem cannot be denied. Our next task will be the development of an automated tool to assist in reducing the abovementioned problem that we named it as MADS (Malay Ambiguity Detection System). This tool will be able to detect potential Malay ambiguous words from SRS. It should provide not only the assistance in detecting an ambiguity, but also it should be able to expedite the writing process. References [1] Nigam A, Arya N, Nigam B, Jain D, Tool for Automatic Discovery of Ambiguity in Requirements, vol. 9, no. 5, pp. 350 356, 2012. [2] Chantree FJ, de Roeck A, Nuseibeh B, Willis A, Identifying Nocuous Ambiguity in Natural Language Requirements, The Open University, UK, 2006. [3] Yang H, Willis A, de Roeck A, Nuseibeh B, Automatic detection of nocuous coordination ambiguities in natural language requirements, Proc. IEEE/ACM Int. Conf. Autom. Softw. Eng. - ASE 10, p. 53, 2010. [4] Chantree F, Ambiguity Management in Natural Language Generation, in 7th Annual CLUK Research Colloqium, 2004. [5] Burg JFM, Linguistic Instruments in Requirement Engineering. IOS Press Inc., 1989. [6] Kamsties E, Paech B, Taming Ambiguity in Natural Language Requirements, in International Conference on System and Software Engineering and their Applications, 2000, pp. 1 8. [7] Grenat MH, Taher MM, On a Translation of Structural Ambiguity, Al-Satil J., pp. 9 19, 2008. [8] Berry MD, Kamsties E, Krieger MM, and WLS. & Lee T, From Contract Drafting to Software Specification: Linguistic Sources of Ambiguity. 2003. [9] Yang Y, Xia F, Zhang W, Xiao X, Li Y, Li X, Towards Semantic Requirement Engineering, Semantic Computing and Systems, 2008. WSCS 08. IEEE International Workshop, no. 14 15 July 2008. IEEE Xplore, Huangshan, pp. 67 71, 2008. [10] Gleich B, Creighton O, Kof L, Ambiguity Detection : Towards a Tool Explaining Ambiguity Sources, Springer-Verlad Berlin Herdelb. 2010, vol. 6182, 2010. [11] Kamsties E, Understanding Ambiguity in Requirements Engineering, in Engineering and Managing Software Requirements, Springer-Verlag Berlin Heidelberg, 2005. [12] Tjong F, Avoiding Ambiguity in Requirements Specifications, no. February, 2008. [13] Chantree FJ, Kilgarriff A, de Roeck A,Willis A, Using a Distributional Thesaurus to Resolve Coordination Ambiguities, Department of Computing, Faculty of Mathematics and Computing, The Open University, UK, 2005. [14] Berry DM, Ambiguity in Natural Language Requirements Document, Monterey Workshop 2007. 2007. [15] Chantree FJ, Nuseibeh B, De Roeck A, Willis A, Nocuous Ambiguities in Requirement Specifications, The Open University, UK, 2005. [16] Noor NKM, Noah SA, Aziz MJA, Hamzah MP, Anaphora Resolution of Malay Text: Issues and Proposed Solution Model, 2010 Int. Conf. Asian Lang. Process., pp. 174 177, Dec. 2010. [17] Ahmad-Nazri MZ, Shamsuddin SM, Abu-Bakar A, An Exploratory Study on Malay Processing Tool for Acquisition of Taxonomy Using FCA, Eighth International Conference on Intelligent Systems Design and Applications. IEEE, pp. 375 380, 2008. [18] Al-Fawareh HM, Jusoh S, Sheikh-Osman WR, Ambiguity in Text Mining, in Proceedings of the International Conference on Computer and Communication Engineering 2008, 2008, pp. 1172 1176. [19] Knowles G, Mohd.Don Z, Tagging a corpus of Malay texts and coping with syntactic drift, pp. 422 488, 2003. [20] Rojas AB, Sliesarieva GB, Automated detection of language issues affecting accuracy, ambiguity and verifiability in software requirements written in natural language, Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas. Association for Computational Linguistics, Los Angeles, California, 2010. [21] Haron H, Abdul Ghani AA, A Method to identify potential ambiguous Malay words through ambiguity attributes mapping: An exploratory study, in The Fourth Conference of Computer Science and Information Technology (CCST2014), 2014, pp. 1 8.