International Journal of Advanced Research in Computer Science and Software Engineering

Similar documents
Improving the Quality of MT Output using Novel Name Entity Translation Scheme

Learning Methods in Multilingual Speech Recognition

SIE: Speech Enabled Interface for E-Learning

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Using SAM Central With iread

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools

EUROPEAN DAY OF LANGUAGES

Arabic Orthography vs. Arabic OCR

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

USE OF ONLINE PUBLIC ACCESS CATALOGUE IN GURU NANAK DEV UNIVERSITY LIBRARY, AMRITSAR: A STUDY

Mandarin Lexical Tone Recognition: The Gating Paradigm

Phonological Processing for Urdu Text to Speech System

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

Consonants: articulation and transcription

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Transliteration Systems Across Indian Languages Using Parallel Corpora

A Neural Network GUI Tested on Text-To-Phoneme Mapping

DIBELS Next BENCHMARK ASSESSMENTS

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

South Carolina English Language Arts

Prevalence of Oral Reading Problems in Thai Students with Cleft Palate, Grades 3-5

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

A Case Study: News Classification Based on Term Frequency

Florida Reading Endorsement Alignment Matrix Competency 1

Speech Recognition at ICSI: Broadcast News and beyond

WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company

First Grade Curriculum Highlights: In alignment with the Common Core Standards

READ 180 Next Generation Software Manual

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

INTERMEDIATE ALGEBRA PRODUCT GUIDE

The Revised Math TEKS (Grades 9-12) with Supporting Documents

DegreeWorks Advisor Reference Guide

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Word Segmentation of Off-line Handwritten Documents

Test Administrator User Guide

Teaching Algorithm Development Skills

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Phonetics. The Sound of Language

Stages of Literacy Ros Lugg

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

GENERAL COMMENTS Some students performed well on the 2013 Tamil written examination. However, there were some who did not perform well.

Houghton Mifflin Online Assessment System Walkthrough Guide

Test Blueprint. Grade 3 Reading English Standards of Learning

Data Fusion Models in WSNs: Comparison and Analysis

How long did... Who did... Where was... When did... How did... Which did...

Cambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services

Student Handbook. This handbook was written for the students and participants of the MPI Training Site.

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Longman English Interactive

Using Blackboard.com Software to Reach Beyond the Classroom: Intermediate

Linking Task: Identifying authors and book titles in verbose queries

STUDENT MOODLE ORIENTATION

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing

Problems of the Arabic OCR: New Attitudes

Grade 3: Module 2B: Unit 3: Lesson 10 Reviewing Conventions and Editing Peers Work

Universal contrastive analysis as a learning principle in CAPT

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

PowerTeacher Gradebook User Guide PowerSchool Student Information System

Urban Analysis Exercise: GIS, Residential Development and Service Availability in Hillsborough County, Florida

Filing RTI Application by your own

CROSS-LANGUAGE MAPPING FOR SMALL-VOCABULARY ASR IN UNDER-RESOURCED LANGUAGES: INVESTIGATING THE IMPACT OF SOURCE LANGUAGE CHOICE

Mining Association Rules in Student s Assessment Data

Parsing of part-of-speech tagged Assamese Texts

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

PRAAT ON THE WEB AN UPGRADE OF PRAAT FOR SEMI-AUTOMATIC SPEECH ANNOTATION

KIS MYP Humanities Research Journal

Considerations for Aligning Early Grades Curriculum with the Common Core

Appendix L: Online Testing Highlights and Script

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

Biology Keystone Questions And Answers

REVIEW OF CONNECTED SPEECH

Body-Conducted Speech Recognition and its Application to Speech Support System

New Features & Functionality in Q Release Version 3.1 January 2016

HIGH COURT OF HIMACHAL PRADESH, SHIMLA No.HHC/Admn.2(31)/87-IV- Dated:

Proceedings of Meetings on Acoustics

Moodle Student User Guide

Level: 5 TH PRIMARY SCHOOL

INTERNAL MEDICINE IN-TRAINING EXAMINATION (IM-ITE SM )

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Using a Native Language Reference Grammar as a Language Learning Tool

Activity Insight Faculty User Guide

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

Primary English Curriculum Framework

Holy Family Catholic Primary School SPELLING POLICY

Bluetooth mlearning Applications for the Classroom of the Future

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Phonology Revisited: Sor3ng Out the PH Factors in Reading and Spelling Development. Indiana, November, 2015

Moodle 2 Assignments. LATTC Faculty Technology Training Tutorial

SARDNET: A Self-Organizing Feature Map for Sequences

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

INSTRUCTOR USER MANUAL/HELP SECTION

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Developing a Language for Assessing Creativity: a taxonomy to support student learning and assessment

Introduction to the Revised Mathematics TEKS (2012) Module 1

Transcription:

Volume 3, Issue 7, July 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Implementation of DJ Rule Based Algorithm for Dhuni- Vishleshan of Compound Punjabi Words Deepjot Kaur *, Navjot Kaur Department of Computer Science & Engineering Sri Guru Granth Sahib World University Fatehgarh Sahib, Punjab, India. Abstract:- Dhuni-vishleshan describes the process by which one word is broken into many words. It is the new software for Punjabi language. Various rules can be made and after that we can implement the words according to their rules. In Punjabi, words are a sequence of characters. There is a little amount of work is completed in this area. A word can be of two types-simple and compound. A simple word consists of roots. A compound word is also called as co-joined word can be broken up into two or more words. The problem to which this paper is concerned, is breaking up of Punjabi compound words into constituent words. In this paper, the rules for breaking the compound words into simple words have been applied. The problem of this paper is to break the compound word into constituent words with the help of rules of dhunivishleshan in Punjabi. Keywords:- compound words, DJ rule based algorithm, dhuni-vishleshan, phonetics, gurmukhi script. I. Introduction A) Phonetics: Phonetics is the study of speech sounds of humans that appear in all human languages to represent the meanings. In phonetics we deal with different sounds neither letters. The task of phonetics is to provide brief of speech. Phonetics plays a important role in improving our communication. Alphabets and words are spelled correctly that is must [1]. Eg. A child cries and informs it mother that it is hungry. In this condition no language is used. For communication language can spoken or written. In this case sound matters. Mostly sounds are produced by air-stream from lungs through any other speech organs. It is the root of the speech sounds. B) History of Punjabi Language: Punjabi sometimes spelled Panjabi, belong to the Indic group of the Indo-European family of Languages.Punjabi is the tonal language. Tonal being that it differentiate the words by tones [2]. Punjabi language is used in both parts of Punjab in India and Pakistan. In India and Pakistan the written standard for Punjabi is known as Majhi that is called after the Majha region of Punjab.This script was created by Guru Angad Dev Ji. This language is the mother language of more than 100 million people of Pakistan, India, Canada and America. In India it is the official Language of Punjab state, and is additionally spoken within the neighboring states of Haryana and Himachal Pradesh. The Punjabi language is closely connected with the Sikh religion. Its alphabet, recognized as Gurmukhi, was the vehicle for recording the teachings of the Sikh gurus. It was invented by the second of the gurus within the 16 th century. The word Gurmukhi means Guru s mouth. Gurmukhi script is used for Punjabi language and it is the 11 th widely spoken language in the world. Almost 100 million people speak different accent of this language as their first language. 1) Gurmukhi Consonants: The Gurmukhi script has thirty five akhar or consonants, a twin of the Punjabi alphabet as well as 3 vowel and thirty two consonants. Each character represents a phonetic sound. The alphabetical order of the Gurmukhi script area unit classified to make a grid of 5 horizontal and 7 vertical rows. Some characters have a nasal sound [3]. 2) Gurmukhi Vowels: In Punjabi language letters are joined by a line at the top. In this there is no concept of upper and lower case letters. The gurmukhi script can be separate into three zones i.e. upper, middle and lower. There are ten vowels,three semi-vowels and three half-characters are used in Punjabi language [4].In spoken langugage a vowel could be a sound that is prounced with associate vocal tract such as teeth, lips, tongue. Vowels are the affecting class of sound in any language. They play a significant role within the prounciation of any words. II. Proposed Algorithm Phonetics is the study of speech sounds of humans that appear in all human languages to represent the meanings. The work has been done in the area of English and similar languages. Punjabi is the 11 th widely spoken language. There is the very little amount of work is completed in this field. Developing programs that understand a natural language is a difficult task. They contain an infinity of various sentences.the problem to which this paper is concerned, is breaking up of Punjabi compound 2013, IJARCSSE All Rights Reserved Page 503

words into constituent words. Eg. + ਆ +. Sometimes a person cannot pronounce difficult word so it s a easy way to pronounce by separating the words by applying several rules. Several rules can be made according to Punjabi laga,,,,,,,,,, and Punjabi vowels are also used like ਅ ਆ ਇ ਈ ਉ ਊ ਏ ਐ ਓ ਔ. On this various rules will be made and after that we can implement the words according to rules. Some examples of dhunivisheshan of Punjabi Compound Words are: Table I: Compound Words with their Outputs (Dhuni-vishleshan) Compound word Dhuni-vishleshan ਇ ਈ ਓ ਏ ਅ ਈ ਉ ਐ ਅ Algorithm:- Dhuni-vishleshan is a recently developed software for Punjabi language. It is a application that is developed in.net. The algorithm used for the implementation of this module is the DJ Rule Based Algorithm. Step 1: Load data from database. Step 2: Select the word from the database or whether enter manually. Step 3: Splitting the string into character by character. Step 4: Now, comparing the characters:- a) If character = Replace it with ਆ b) Else If character = Replace it with ਇ c) Else If character = Replace it with ਈ d) Else If character = Replace it with ਉ e) Else If character = Replace it with ਊ f) Else If character = Replace it with ਏ g) Else If character = Replace it with ਐ h) Else If character = Replace it with ਓ i) Else If character = Replace it with ਔ 2013, IJARCSSE All Rights Reserved Page 504

j) Else If character = Replace it with ਨ k) Else If character = Replace it with ਨ l) Else If character = Replace it with ਅ m) Else If character = Replace it with n) Else If character = Replace it with o) Else character = Replace it with Step 5: Concatinate the final character for final output III Experiment & Result Dhuni-vishleshan is a recently developed software for Punjabi language. It is a application that is developed in.net. With in which the work is done on MS Excess at the back-end tool and front-end tool is.net. The algorithm used for the implementation of these module is the Rule Based Algorithm. Accuracy is the significant issue to be examined. So, to measure the accuracy of our algorithm we implement experiments on number of different words. Whenever the application is started, the window shown in figure 1 will appear which contains the text area, where the user can enter the text. In this, we can choose a word from the database or enter a word manually. Fig.1: Snapshot of main screen. In the following snapshot Fig.2 shows the working of Dhuni-vishleshan. First the user will choose the word from database or whether enter manually. After that they will get the output on clicking button ਆਉਟ ਟ ਉ. Fig. 3 shows its word to sound rule. 2013, IJARCSSE All Rights Reserved Page 505

In Fig.3, the word is entered by the user manually. Fig. 2: Loaded Data Fig.3: Word -ਅ ਨ is entered manually If the user will type the word -ਅ ਨ manually then it will show the output as: ਇ ਅ - ਅ ਆ ਨ and it gives the correct output. In fig. 4, the word is taken from the database by user. 2013, IJARCSSE All Rights Reserved Page 506

Percentage of Accuracy Kaur et al., International Journal of Advanced Research in Computer Science and Software Engineering 3(7), Fig. 4: Dhuni-vishleshan for the word - ਆ If the user choose the word - ਆ from the database then it will show the output as: ਇ ਆ - ਉ ਇ ਆ and it gives correct output. We perform testing on different words corresponding to our algorithm. To measure the accuracy of our algorithm. We perform testing on 24231 words. After testing we obtain 99.9% accuracy. 30000 Accuracy 25000 20000 15000 10000 Input word from database Accurated Segmented 5000 0 Fig.5: Histogram showing the accuracy for Punjabi Word Results:- We have tested the system by first giving the input from database that contain approximately twenty four thousand words where our system has given no error. After that we have enterred the words manually that also gives no error. So we can say that our system has good accuracy. Following is the part of the document. 2013, IJARCSSE All Rights Reserved Page 507

Word Output Comment ਅ ਅ ਇ ਆ ਅ ਅ ਓ ਨ ਨ ਆ ਏ ਆ ਔ ਨ ਅ ਅ ਇ ਈ ਇ ਆ ਨ ਇ ਟ ਨ ਨ ਇ ਨ ਟ ਨ ਏ ਨ ਉ ਅ ਈ ਐ ਓ ਏ - ਉ ਨ - ਨ ਏ ਆ ਈ ਔ ਆ ਨ ਏ ਆ ਨ- ਉ ਨ ਨ - ਉ ਈ IV Conclusion In this work, we have develop the DJ Rule-Based algorithm on words according to their rules. With the help of this algorithm we have noted an accuracy of 99.9% depending upon the number of rules that are implemented. As future work, we can use the sound button for prouncing the word and further implementation can be done on the line or paragraph also. This software can be beneficial for those people who are learning punjabi. With this software one can learn about the very important aspect of Punjabi Grammar i.e. Dhunivishleshan is in an straightforward and interesting way that can give entirely new dimension that add new way to traditional approach to Punjabi Teaching. This can also be used to solve and test the problems related to Punjabi Grammar. References [1] Deepjot Kaur, Navjot Kaur, A Review: An Efficient Review of Phonetics Algorithms, International Journal of Computer Science & Engineering Technology (IJCSET), ISSN : 2229-3345 Vol. 4 No. 05 May 2013. [2] Meenu Bhagat, Spelling Error Pattern Analysis of Punjabi Typed Text, Thesis report, Thapar University, Patiala (2007). [3] Parminder Singh and Gurpreet Singh Lehal, Text-To-Speech Synthesis System for Punjabi Language. [4] Gurmukhi Vowels http://sikhism.about.com/od/learntoreadgurmukhi/ig/gurmukhi-vowels-illustrated/ [5] Rakesh Chandra Balabantaray,Sanjaya Kumar Lenka, An Automatic Approximate Matching Technique Based on Phonetic Encoding, IIIT Bhubaneswar, International Journal of Computer Science Issues,Vol. 9, Issue 3, No 3, May 2012. [6] Sheilly Paddal, Nidhi, Punjabi Phonetic: Punajbi Text to IPA Conversion, Department of Computer Science & Engineering, SVIET Banur, Punajb, International Journal of Emerging Technology and Advanced Engineering Issues, Vol.2, Oct. 2012. [7] Priyanka Gupta and Vishal Goyal, Implementation of Rule Based Algorithm for Sandh-Vicheda of Compound Hindi Words, Department of Computer Science Punjabi University Patiala, International Journal of Computer Science Issues, Vol. 3, 2009. 2013, IJARCSSE All Rights Reserved Page 508

[8] Kare Sjolander, Automatic alignment of phonetic segments, Centre for Speech Technology, Department of Speech, Music (2001). [9] Walter D. Andrews, Mary A. Kohler and Joseph P. Campbell, Phonetic Speaker Recognition, Department of Defense Speech Processing Research. http://jcarreras.homestead.com/rrphonetics1.html. [10] David Pinto, Darnes Vilari no, Yuridiana Alem, The Soundex Phonetic Algorithm Revisited for SMS-based Information Retrieval,Department of computer science,mexico. [11] Contractor, D., Kothari, G., Faruquie, T.A., Subramaniam, L.V., Negi, S.: Handling noise queries in cross language FAQ retrieval. In: Proceedings of the 2010 Conference on Empirical Methods of phonetics in Natural Language Processing. EMNLP 10, Stroudsburg, PA, USA, Association for Computational Linguistics (2010) 87 96. [12] Gurpreet Singh Lehal, A Survey of the State of the Art in Punjabi Language Processing, Language in India, Vol. 9, no, 10, pp. 9-23, 2009. [13] Bodo Winter, Pseudoreplication in Phonetic Research, Department of Linguistics, Germany, August 2011. [14] Ashby, New Directions in Learning, Teaching and Assessment for Phonetics, Estudios de Fonética Experimental in 2008, XVII, 19-44. [15] Rajkovic, P., Jankovic, D.: Adaptation and Application of Daitch-Mokotoff Soundex Algorithm on Serbian names. In: XVII Conference on Applied Mathematics (2007). 2013, IJARCSSE All Rights Reserved Page 509