International Journal of Computer Engineering and Applications, Volume XI, Issue IX, September 17, ISSN

Similar documents
DCA प रय जन क य म ग नद शक द र श नद श लय मह म ग ध अ तरर य ह द व व व लय प ट ह द व व व लय, ग ध ह स, वध (मह र ) DCA-09 Project Work Handbook

S. RAZA GIRLS HIGH SCHOOL

क त क ई-व द य लय पत र क 2016 KENDRIYA VIDYALAYA ADILABAD

HinMA: Distributed Morphology based Hindi Morphological Analyzer


Question (1) Question (2) RAT : SEW : : NOW :? (A) OPY (B) SOW (C) OSZ (D) SUY. Correct Option : C Explanation : Question (3)

The Prague Bulletin of Mathematical Linguistics NUMBER 95 APRIL

ह द स ख! Hindi Sikho!

ENGLISH Month August

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Detection of Multiword Expressions for Hindi Language using Word Embeddings and WordNet-based Features

Learning Methods in Multilingual Speech Recognition

Arabic Orthography vs. Arabic OCR

CROSS LANGUAGE INFORMATION RETRIEVAL: IN INDIAN LANGUAGE PERSPECTIVE

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools

Problems of the Arabic OCR: New Attitudes

AQUA: An Ontology-Driven Question Answering System

Evolutive Neural Net Fuzzy Filtering: Basic Description

Word Segmentation of Off-line Handwritten Documents

A Neural Network GUI Tested on Text-To-Phoneme Mapping

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Highlighting and Annotation Tips Foundation Lesson

Cambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services

SARDNET: A Self-Organizing Feature Map for Sequences

Speaker Identification by Comparison of Smart Methods. Abstract

Word Stress and Intonation: Introduction

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Radius STEM Readiness TM

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

First Grade Curriculum Highlights: In alignment with the Common Core Standards

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

An Evaluation of E-Resources in Academic Libraries in Tamil Nadu

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Modeling function word errors in DNN-HMM based LVCSR systems

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Coast Academies Writing Framework Step 4. 1 of 7

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Disambiguation of Thai Personal Name from Online News Articles

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

LODI UNIFIED SCHOOL DISTRICT. Eliminate Rule Instruction

English Policy Statement and Syllabus Fall 2017 MW 10:00 12:00 TT 12:15 1:00 F 9:00 11:00

A Case-Based Approach To Imitation Learning in Robotic Agents

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Florida Reading Endorsement Alignment Matrix Competency 1

Longman English Interactive

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Test Effort Estimation Using Neural Network

Primary English Curriculum Framework

Blinky Bill. Handwriting and. Alphabet Copy Book. Sample file. From Homeschooling Downunder. Manuscript Print Ball and Stick Font

Software Maintenance

Achievement Level Descriptors for American Literature and Composition

Rule Learning With Negation: Issues Regarding Effectiveness

Mining Association Rules in Student s Assessment Data

F.No.29-3/2016-NVS(Acad.) Dated: Sub:- Organisation of Cluster/Regional/National Sports & Games Meet and Exhibition reg.

Literature and the Language Arts Experiencing Literature

5 th Grade Language Arts Curriculum Map

Use of Online Information Resources for Knowledge Organisation in Library and Information Centres: A Case Study of CUSAT

TABE 9&10. Revised 8/2013- with reference to College and Career Readiness Standards

HISTORY COURSE WORK GUIDE 1. LECTURES, TUTORIALS AND ASSESSMENT 2. GRADES/MARKS SCHEDULE

Student Name: OSIS#: DOB: / / School: Grade:

Rhode Island College

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

Parsing of part-of-speech tagged Assamese Texts

TEKS Comments Louisiana GLE

Many instructors use a weighted total to calculate their grades. This lesson explains how to set up a weighted total using categories.

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9)

Circuit Simulators: A Revolutionary E-Learning Platform

BASIC TECHNIQUES IN READING AND WRITING. Part 1: Reading

SIE: Speech Enabled Interface for E-Learning

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

Reading Project. Happy reading and have an excellent summer!

Course Law Enforcement II. Unit I Careers in Law Enforcement

Mixed Up Multiplication Grid

Pearson Longman Keystone Book D 2013

Unit purpose and aim. Level: 3 Sub-level: Unit 315 Credit value: 6 Guided learning hours: 50

Rendezvous with Comet Halley Next Generation of Science Standards

Common Core State Standards for English Language Arts

Handbook for Teachers

CDE: 1st Grade Reading, Writing, and Communicating Page 2 of 27

Classify: by elimination Road signs

Prentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Correlated to Nebraska Reading/Writing Standards (Grade 10)

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Modeling function word errors in DNN-HMM based LVCSR systems

Welcome to the Purdue OWL. Where do I begin? General Strategies. Personalizing Proofreading

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Grade 2 Unit 2 Working Together

User education in libraries

essays. for good college write write good how write college college for application

ESSENTIAL SKILLS PROFILE BINGO CALLER/CHECKER

Guidelines on how to use the Learning Agreement for Studies

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Speech Emotion Recognition Using Support Vector Machine

Understanding and Supporting Dyslexia Godstone Village School. January 2017

Transcription:

A NOVEL APPROACH TO TEXT STEGANOGRAPHY USING DEVANAGARI SCRIPT FEATURES Susmita Mahato, Dilip Kumar Yadav and Danish Ali Khan Department of Computer Applications National Institute of Technology, Jamshedpur- 831014 (INDIA) a sus_ps@rediffmail.com, b dkyadav1@gmail.com, c dakhan.ca@nitjsr.ac.in ABSTRACT Steganography is a tool to do communication between transceiver in a secret way. Here one person communicates with other using a cover medium having secret message embedded within it. In this paper, we propose a novel approach to text steganography using devanagari language script features. The idea behind this technique is that some conjunct consonants words can be written in more than one way, and substituting one with other does not change the meaning of the sentence. Using this feature steganography can be intelligently achieved. Although, the embedding rate is not very high in this technique, it is more reliable to other substitution based linguistic steganography as it does not introduces any noise to the document and the meaning of the sentence is maintained after message embedding which keeps the document suspicion free. The secret data hiding and revealing technique is presented. Keywords: Devanagari; Script feature; Cryptography; Steganography [1] INTRODUCTION Internet era is fully based on electronic communication, making the need to secure digital information in data storing and sharing. Cryptography and steganography techniques are used to provide security to significant data over communication media. Steganography is a technique to hide secret information in some cover media as text, image video, audio, etc. Text steganography is hard to achieve because the bits to accumulate the secret information are lesser than those required for an image, audio, video, etc. This paper explains the development in the field of text Susmita Mahato, Dilip Kumar Yadav and Danish Ali Khan 1

A NOVEL APPROACH TO TEXT STEGANOGRAPHY USING DEVANAGARI SCRIPT FEATURES steganography and attempts to give an innovative idea to perform text steganography using devanagari script features. The large number of people uses devanagari script as a medium for communication. The language is very flexible which provides opportunities for steganography. In this paper we attempt to design an algorithm to do information hiding in devanagari script document. This method uses devanagari script features to hide message without changing its meaning. The rest of the paper is organized as follows: Section 2 describes related works, Section 3 discusses about devanagari script features, Section 4 shows the proposed technique, Section 5 demonstrates working of our approach, and conclusion is given in Section 6. [2] RELATED WORKS This section gives some of the related works on the field of text steganography. Low et al. [7] proposed a text steganography technique using line shifting method where shifting by few degrees as 1/300 in. up or down to the lines of text are made, and hence, different distinctive shapes of the text are used to hide information. Changder, C. Debnath [4] proposed a text steganography technique based on the matra of Hindi alphabets. In this approach, secret message is hidden by shifting the specific matra toward left or right. Shirali-Shahreza [12] proposed a technique for text steganography based on different spellings of words. In British and American English, several words have different spelling as color and colour. M. H. Shirali-Shahreza and M. Shirali-Shahreza [13] proposed a technique for text steganography based on the different terms of a same word used in UK and the USA. These terms are substituted to hide data in English text. For example, the term account in American English is referred as bill in British English. Khairullah [7] proposed a technique for text steganography using font color of the invisible characters in Microsoft Word documents. The secret data were hidden by setting any foreground color for invisible characters such as space or new line, etc. which is not viewed in the document. Huang et al. [5] proposed a technique for steganography based on placing extra white spaces in between words. Susmita Mahato et al. [9] proposed a text steganography for Microsoft Word document. Information is embedded in this technique by applying slight variation on the font size of blank space character. Slight variation in font size of invisible character space from other characters is not reflected in the document and in the required disk size for the document. Moreland [10] proposed a technique for text steganography using chosen characters from words. As example to conceal the secret message, the first letter of all paragraphs can be used, and by placing all those characters together, we obtain the secret message. K. Alla et al. [1] proposed a technique for the data hiding using Hindi letters and its Diacritics. The technique uses consonants, vowels and compound letters to do information hiding. The secret message is converted into binary form. Hindi consonants and vowels are used in the sentence to denote 0 and compound letters are used to hide 1. These Hindi letters are then used to form a meaningful Hindi sentence. Hindi word has a specific structure when used in a sentence. For example: - It is always seen that the consonant is followed by a vowel at the end, without exception. K. Alla et al. [2] proposed three techniques for hiding binary bits into the Hindi sentences. First is based on classifying the position of matraye of Hindi Characters. Second is Susmita Mahato, Dilip Kumar Yadav and Danish Ali Khan 2

based on the classification of Hindi characters by various OCR tools in Hindi Language. Third one is based on hex katapayadi scheme. M. Srivastava et al. [15] proposed three techniques for hiding binary bits with the help of Hindi language. First is based on punctuation marks. Punctuations that are available in the Hindi language are used to store hidden bit sequence. The encoding and the decoding is performed with the help of the table given, bits to be encoded are mapped with the appropriate punctuation marks and used in a Hindi sentence. Second is based on Synonyms. In this technique a four level mapping table is created where the advantage of having multiple synonyms in the Hindi language is used to hide two bit binary at a time. At the encoding side when a secret message is to be hidden it is first converted to binary and then with help of the mapping table appropriate words can be chosen. The mapping table can be created. Third is based on Sanskrit Classification. Here a two level mapping table was generated containing Tatbhav and Tatsama words. Tatbhav hides 0 and Tatsama hides 1. Tatbhav is the actual word or the synonym of a word whereas Tatsama is not an exact synonym but still a very closely related word to the main word. Tatbhav is used to hide 0 whereas tatsama serves the purpose for 1. The advantage of these techniques is implementation is easy with the use of dictionaries. But the limitation of these techniques is that to hide even small information a very large paragraph or essay of Hindi sentences needs to be created. A. K. Agarwal et al. [3] proposed a steganography technique by converting English words into appropriate Hindi words having the similar meaning with the help of an English-Hindi dictionary. Then, these Hindi alphabets are converted to numerical values with the help of Kalapayiidi System mapping. Now at the odd positions 1 is added to the numerical set that has been found. This new set of numerical values is again checked with the Kalapayiidi System to get possible meaningful Hindi words. Once meaningful Hindi words have been found then using Hindi English dictionary appropriate English words are found. This is the encrypted cipher text. This method is that it is very confusing and difficult to implement L. Singh et al. [14] proposed a technique where English plain text is converted to a cipher text, in Hindi. The conversion is not according to dictionary. Here using the simple substitution cipher, the English sentences or words are first considered in an alphabetical format. Then, these alphabets are converted into their ASCII value. A mapping table is maintained where each ASCII code of English alphabets is mapped to a corresponding ASCII value of a Hindi alphabet. After the mapping is completed, a set of Hindi alphabets are obtained, this is the cipher text. At the receiver side the reverse process is carried out to get the hidden English word. The major drawback of the system is that the Hindi cipher text is a non-logical set of Hindi alphabets which might catch the eye of eve s dropper. Nitin N. Patil and J. B. Patil [11] introduced a novel text watermarking technique to secure the intellectual property rights of the original author of Devanagari text. They developed an embedding algorithm which cleverly uses the unique construct of Devanagari language sarvanam (pronoun) for generation of the effective watermarks in combination with additional security phrases. [3] DEVANAGARI SCRIPT FEATURES Hindi is mostly written in a script called Devanagari. Hindi is normally spoken using a combination of around 52 sounds, ten vowels, 40 consonants, nasalization and a kind of aspiration [6]. Devanagari characters can be combined to indicate combinations of sounds. A conjunct Susmita Mahato, Dilip Kumar Yadav and Danish Ali Khan 3

A NOVEL APPROACH TO TEXT STEGANOGRAPHY USING DEVANAGARI SCRIPT FEATURES consonant is a combination of two or more consonants which are pronounced together without the pronunciation of the inherent अ a vowel between them. When the form of one or both of the consonants is changed and they are joined together, this is called a conjunct consonant. Conjunct consonants can be divided into six different groups depending on the type of modification of a consonant that takes place. Among these the nasal consonants ङ, ञ, ण, and म, ङ and ञ are usually represented by a dot above the line in modern Hindi. For instance: क घ :kanghi (comb), प ख :pankha (fan), च चल :chanchal (restless). ण sometimes appears in a half form and sometimes as a dot above the line. For instance: अण ड / अ ड : anda (egg) ठण ड / ठ ड : thanda(cold), घण ट / घ ट : ghanta (hour/ bell). न with त, थ, द, or ध can be written in half form or as a dot. With श or स, it should be written as a dot and before ह it should be written in its half form. For instance: स त : sant (saint), नन ह : nanha (tiny). म with प, फ, ब, भ may be written as a dot above the line but with म, न, य, ल or ह the conjunct form म is used. For instance: चम मच chammach (spoon), कम बल / क बल : kambal (blanket). In the same way rounded characters: ट, ठ, ड, ढ, द and ह. These can be made into half characters by the use of the हल त : halant symbol ( ) or by the use of a modified form. For instance च ठ ठ / च ठठ : chiththi (letter), छ ट ट / छ ठठ :chuththi (holiday) etc. The consonant र as the first consonant in a conjunct consonant is written above the line after the consonant it is joined to (including any vowel mātrā that may be attached to the second consonant). धरम /ध म :dharm (dharma). Nasalization of vowels in Hindi can be represented by two symbols written above the line: the symbol called Chandra bindu, and the symbol called च द bindu. If a vowel mātrā written above the line crowds the space then the Chandra bindu is reduced to a dot (bindu). Now a days there is a tendency in written Hindi to use bindu in place of Chandra bindu. For instance आ ख : āṁkh, ह (or ह ) : hāṁ, ह (or ह ). Nasal consonants called (अन स व र) can be represented by a dot above the head stroke of the character preceding the following characters: i) before क, ख, ग, घ represents ङ for instance: अ क represents क. ii) before च, छ, ज, झ represents ञ for instance: अ चल represents ल. iii) before ट, ठ, ड, ढ represents ण for instance: ठ ड represents ठण ड, iv) before त, थ, द, ध represents न for instance च द represents च न द. v) before प, फ, ब, भ represents म for instance ल ब represents लम ब These features can be intelligentaly used to perform information hiding. [4] THE PROPOSED TECHNIQUE We propose a new technique for text steganography based on different features of devanagari script. As mentioned above, some words can be written in two ways. As example, अण ड / अ ड, ठण ड / ठ ड, घण ट / घ ट, कम बल / क बल, च ठ ठ / च ठठ, छ ट ट / छ ठठ, च द / च न द, ल ब / लम ब. Here, by substituting one with other words secret message bit is hidden. To conceal the message in the word document file, the number of bits to be hidden is calculated. The cover document s number Nasalization of vowels should be greater than the total number of bits to be hidden. To hide 1, the Nasalization of vowels character is changed with its substitute from the original, and to hide 0, it is left as it is. This stego document is sent to the destination, where by checking the Nasalization of vowels variation of character, the secret message bits are extracted. Algorithm for hiding secret bits in word document: Step 1: Select the secret bits. Susmita Mahato, Dilip Kumar Yadav and Danish Ali Khan 4

Step 2: Count the number of bits present in secret message. Step 3: Select a hindi story document. Step 4: Go to all nasal conjuction characters and change it to hide 1 and leave as it is to hide 0. Hence, the stego message is formed. Step 5: Send to the destination. Algorithm for extracting secret message from word document: Step 1: Take the stego word document. Step 2: Check each nasal conjunction letter present in the document; if variation is seen, then take 1 if not then take 0. Step 3: Hence, we extract the secret bits. [5] WORKING OF OUR APPROACH This section shows working of our approach with a sample message bit, and cover document. Example: Secret bits:- 1101011001 Figure. 1. showing cover document Figure. 2. Cover document with highlighted words having scope of bit embedding Susmita Mahato, Dilip Kumar Yadav and Danish Ali Khan 5

A NOVEL APPROACH TO TEXT STEGANOGRAPHY USING DEVANAGARI SCRIPT FEATURES Figure. 3. Stego document (After embedding Secret bits:- 1101011001 ) Figure. 1, 2 and 3 showing cover document, cover document with highlighted words having scope of bit embedding and Stego document (After embedding Secret bits:- 1101011001 ) respectively. We find that for this sample the bit rate is 1 bit per sentence, which is comparatively low to other similar kind of information hiding techniques, but the noise generated is not affecting the meaning of the sentence. Hence the generated stego document is suspicion free. [6] CONCLUSION In this paper, text steganography technique for information hiding is presented. The paper proposes a novel approach to provide data security by hiding secret bit information in devnagari script. As Devnagari script is a common script used in India, this can be used as a cover medium to perform text steganography. Secret data is hidden behind conjunction constants variations. The proposed algorithm s embedding capacity is although very low, the generated stego document does not produce any noticeable noise making it less suspicious. Future research can be performed to increase the embedding efficiency of the algorithm. It can be applied to other regional language too to perform information hiding. REFERENCES [1] Alla K., and Siva Rama Prasad R., A Novel Hindi Text Steganography using Diacritics and its Compound words, IJCSN International journal of Computer Science and Network Security, Vol. 8 No. 12, 2008, pp. 404 409. [2] Alla K., and Siva Rama Prasad R., A New Approach to Hindi Text Steganography Using Matraye, Core Classification And HHK Scheme, Seventh International Conference on Information Technology, IEEE, 2010,pp. 1223 1224. [3] Agarwal A. K. and Srivastava D. K., Ancient Kalapayiidi System Sanskrit Encryption Technique Unified, International Conference on Signal Propagation and Computer Technology (ICSPCT), IEEE, 12-13 July 2014. [4] Changder, S., Debnath, N.C., A new approach to Hindi text steganography by shifting matra. International Conference on Advances in Recent Technologies in Communication and Computing, pp. 199 202 (2009) Susmita Mahato, Dilip Kumar Yadav and Danish Ali Khan 6

[5] Huang, D., Yan, H.: Inter word distance changes represented by sine waves for watermarking text images. IEEE Trans. Circuits Syst. Video Technol. 11(12), 1237 1245 (2001) [6] Introduction to Hindi Script, Available from:- www.bodhgayanews.net/hindi/hin11_script_intro.pdf accessed on 08/04/17 [7] Low, S.H., Maxemchuk, N.F., Brassil, J.T., O Gorman, L.: Document marking and identification using both line and word shifting. Proceedings of the 14th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM 95), vol. 2, pp. 853 860, 2 6 Apr 1995 [8] Khairullah, M.: A novel text steganography system using font color of the invisible characters in Microsoft word documents. Second International Conference on Computer and Electrical Engineering, pp. 482 484 (2009) [9] Mahato S., Yadav D.K., Khan D.A. (2014) A Novel Approach to Text Steganography Using Font Size of Invisible Space Characters in Microsoft Word Document. In: Mohapatra D., Patnaik S. (eds) Intelligent Computing, Networking, and Informatics. Advances in Intelligent Systems and Computing, vol 243. Springer, New Delhi [10] Moerland, T.: Steganography and Steganalysis. www.liacs.nl/home/tmoerlan/privtech.pdf. 15May 2003 [11] Patil Nitin N. and Patil J. B. Implementation of a Novel Watermarking Technique for Devanagari Text, International Journal of Information and Electronics Engineering, Vol. 5, No. 5, September 2015 [12] Shirali-Shahreza, M.H.: Text steganography by changing words spelling. Proceedings of 10 th International Conference on Advanced Communication Technology, pp. 1912 1913 (2008) [13] Shirali-Shahreza, M.H., Shirali-Shahreza, M.: A new synonym text steganography. Proceedings of the 4th IEEE International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2008), pp. 1524 1526, 15 17 Aug 2008 [14] Singh L. and Johari R., Cross Language Cipher Technique, Security in Computing and Communications: Third International Symposium, SSCC 2015, Kochi, India. Springer International Publishing Switzerland,August 10-13 2015. [15] Srivastava M., Rafiq M. Q. and Tiwari R. K., A Novel Approach to Hindi Text Steganography, Second International Conference on Advances in Communication, Network, and Computing, CNC, Springer, 2011, pp. 295 298. Susmita Mahato, Dilip Kumar Yadav and Danish Ali Khan 7