Nurturing Living Languages. IDN variant TLDs WELCOME. A study of issues related to the delegation of IDN variant TLDs

Similar documents
The IDN Variant Issues Project: A Study of Issues Related to the Delegation of IDN Variant TLDs. 20 April 2011

B.A.B.Ed (Integrated) Course

क त क ई-व द य लय पत र क 2016 KENDRIYA VIDYALAYA ADILABAD

CROSS LANGUAGE INFORMATION RETRIEVAL: IN INDIAN LANGUAGE PERSPECTIVE

GENERAL COMMENTS Some students performed well on the 2013 Tamil written examination. However, there were some who did not perform well.

HinMA: Distributed Morphology based Hindi Morphological Analyzer

DCA प रय जन क य म ग नद शक द र श नद श लय मह म ग ध अ तरर य ह द व व व लय प ट ह द व व व लय, ग ध ह स, वध (मह र ) DCA-09 Project Work Handbook

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Transliteration Systems Across Indian Languages Using Parallel Corpora

DISTRICT ASSESSMENT, EVALUATION & REPORTING GUIDELINES AND PROCEDURES

S. RAZA GIRLS HIGH SCHOOL

SYLLABUS- ACCOUNTING 5250: Advanced Auditing (SPRING 2017)

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

August 14th - 18th 2005, Oslo, Norway. Code Number: 001-E 117 SI - Library and Information Science Journals Simultaneous Interpretation: Yes

Midland College Syllabus MUSI 1311 Music Theory I SCH (3-3)

READ 180 Next Generation Software Manual

COMMISSIONER AND DIRECTOR OF SCHOOL EDUCATION ANDHRA PRADESH :: HYDERABAD NOTIFICATION FOR RECRUITMENT OF TEACHERS 2012

Controlled vocabulary

16.1 Lesson: Putting it into practice - isikhnas

Sri Lanka. On the scale of a world map, Sri Lanka previously known as Ceylon appears to hang like a Pearl over the Indian Ocean.

Test Blueprint. Grade 3 Reading English Standards of Learning

Appendix L: Online Testing Highlights and Script

(I couldn t find a Smartie Book) NEW Grade 5/6 Mathematics: (Number, Statistics and Probability) Title Smartie Mathematics

व रण क ए आ दन-पत र. Prospectus Cum Application Form. न दय व kऱय सम त. Navodaya Vidyalaya Samiti ਨਵ ਦ ਆ ਦਵਦ ਆਦ ਆ ਸਦ ਤ. Navodaya Vidyalaya Samiti

English to Marathi Rule-based Machine Translation of Simple Assertive Sentences

Detection of Multiword Expressions for Hindi Language using Word Embeddings and WordNet-based Features

Initial steps to be followed before filling Online Application Form

Making Sales Calls. Watertown High School, Watertown, Massachusetts. 1 hour, 4 5 days per week

Mercer County Schools

Ontological spine, localization and multilingual access

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

EXAMINATIONS POLICY 2016/2017

Grade 5: Module 3A: Overview

Standards-Based Bulletin Boards. Tuesday, January 17, 2012 Principals Meeting

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

1 Copyright Texas Education Agency, All rights reserved.

Mathematics subject curriculum

Introduction to the Revised Mathematics TEKS (2012) Module 1

Florida Reading Endorsement Alignment Matrix Competency 1

Experiments with Cross-lingual Systems for Synthesis of Code-Mixed Text

Copyright 2017 DataWORKS Educational Research. All rights reserved.

Plainfield Public School District Reading/3 rd Grade Curriculum Guide. Modifications/ Extensions (How will I differentiate?)

Problems of the Arabic OCR: New Attitudes

ADMN-1311: MicroSoft Word I ( Online Fall 2017 )

Piano Safari Sight Reading & Rhythm Cards for Book 1

ROSETTA STONE PRODUCT OVERVIEW

MARK 12 Reading II (Adaptive Remediation)

Phonological Processing for Urdu Text to Speech System

ACADEMIC AFFAIRS GUIDELINES

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

ACADEMIC AFFAIRS GUIDELINES

On-Line Data Analytics

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

MARK¹² Reading II (Adaptive Remediation)

SOFTWARE EVALUATION TOOL

Online ICT Training Courseware

TeacherPlus Gradebook HTML5 Guide LEARN OUR SOFTWARE STEP BY STEP

The Revised Math TEKS (Grades 9-12) with Supporting Documents

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Build on students informal understanding of sharing and proportionality to develop initial fraction concepts.

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

Promoting governmental policies that support Internet growth. Enabling technical capacity building and community development throughout the world

Name: STEP 1: Starting Questions. Description PSII Learner.. PSII Teacher.

Student User s Guide to the Project Integration Management Simulation. Based on the PMBOK Guide - 5 th edition

K-12 PROFESSIONAL DEVELOPMENT

Student Information System. Parent Quick Start Guide

1. Introduction. 2. The OMBI database editor

NAVODAYA VIDYALAYA SAMITI PROSPECTUS FOR JAWAHAR NAVODAYA VIDYALAYA SELECTION TEST- 2014

Education & Training Plan Civil Litigation Specialist Certificate Program with Externship

Loughton School s curriculum evening. 28 th February 2017

National Literacy and Numeracy Framework for years 3/4

ESSENTIAL SKILLS PROFILE BINGO CALLER/CHECKER

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Primary English Curriculum Framework

K 1 2 K 1 2. Iron Mountain Public Schools Standards (modified METS) Checklist by Grade Level Page 1 of 11

BENCHMARKING OF FREE AUTHORING TOOLS FOR MULTIMEDIA COURSES DEVELOPMENT

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

Testing Schedule. Explained

CELTA. Syllabus and Assessment Guidelines. Third Edition. University of Cambridge ESOL Examinations 1 Hills Road Cambridge CB1 2EU United Kingdom

SSIS SEL Edition Overview Fall 2017

ENGLISH Month August

On the Combined Behavior of Autonomous Resource Management Agents

Indian Institute of Technology, Kanpur

Practice Examination IREB

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

Derivational and Inflectional Morphemes in Pak-Pak Language

Named Entity Recognition: A Survey for the Indian Languages

Approved Foreign Language Courses

NAVODAYA VIDYALAYA SAMITI PROSPECTUS FOR JAWAHAR NAVODAYA VIDYALAYA SELECTION TEST- 2016

K-12 Blueprint Logo Placement

GOLD Objectives for Development & Learning: Birth Through Third Grade

Lincoln School Kathmandu, Nepal

Student Handbook. Supporting Today s Students with the Technology of Tomorrow

Transcription:

IDN variant TLDs WELCOME A study of issues related to the delegation of IDN variant TLDs Mahesh D. Kulkarni Programme Coordinator & HEAD GIST Centre for Development of Advanced Computing, Pune, India mdk@cdac.in Venue : ICANN 2011, Sanfrancisco 16 th March 2011 1

The Multilingual diversity of India - Some facts & Figures The Constitution recognizes 22 languages termed as Scheduled Languages. Two major script systems are used: Perso-Arabic based and Brahmi based Sindhi, Kashmiri, Urdu use the Perso-Arabic system with notational changes in Sindhi. The remaining 19 languages use 11 derivations of the Brahmi script. One to many and many to one relationship between language and script. Santali & Sindhi use more than one script. Devanagari script is used for Sanskrit, Hindi, Marathi, Nepali, Konkani, Maithili, Dogri, Bodo 2

Indian language complexities Syllable formation level Alternate spellings Rendering order level Alternate forms Different inputting mechanism in Indian languages 3

Need for Variant Identification : Indian Language Scenario Most Indian languages are Multi-tier in nature When conjuncts come in picture, resulting glyph shapes increase manifolds. 4

Types of Variants : 1. Homographic variants : Similar Looking 1 / l in Latin / द न in Devanagari 2. Homophonic variants : Similar Sounding / alternate spelling color / colour in Latin ह द / ह द in Devanagari 3. Case variants : C / c in Latin (No such case in Indian Languages) 5

Homographic variants confusingly similar: Nurturing Living Languages Most of the browsers and applications using IDNs display labels in minimal size. This results in maximum number of spoofing and phishing attacks. Multi-tier scripts such as used in Indian languages are less readable in the address bar. Unicode normalization rules have also been considered as variants 6

Homographic Variants : Telugu Variants: Tamil Variants : 7

Homophonic Variants & Alternate spellings: Nurturing Living Languages Valid Homophones Common Misspellings : ह द versus ह द : इ डय versus इ डय While formulating the IDN policy for.in we have not considered these variants as historically other domains have always considered alternate spellings of www.color.com and www.colour.com as separate entities 8

Case Variants Case variants are not applicable in case of Indian Languages However Indian languages are rich in synonyms 9

Need for Variant Identification Invisible characters like ZWJ and ZWNJ can greatly amount to visual spoofing possibilities. If permitted, their placement within the Domain Name/Label should be restricted to only most compulsory cases. In some cases, within the same script, two languages need different conjunct formation rules. Across the Operating systems, Rendering Engines and their versions, the rendering is not same. 10

Need for Variant Identification Indian scripts introduce syllabic variants Such homographs need to be considered while identifying variants 11

Need for Variant Identification : Display Aspect Nurturing Living Languages ZWJ and ZWNJ : Invisible characters like ZWJ and ZWNJ can greatly amount to visual spoofing possibilities. A clear decision needs to be taken regarding their inclusion in TLDs and if included, their placement within the Domain Name/Label. Examples : 12

IDN Variants TLDs.com is to commercial, since first three letter of English meaningful word In English one can easily correlate the short forms with the type of activity / content the domain may have. Transliteration can not always be acceptable for following reasons. Some scripts may not have characters necessary to represent the sound of the words E.g Tamil does not have Bha bharat will map to parat Associating the transliterated IDNs with real world will be difficult May convey entirely different meaning in other languages / region. In Indian languages short form does not exists. 13

Examples Example word "PAL" In Tamil -> பல means Milk In Marathi PAL -> प ल means Lizard 14

IDN Variants TLDs Another solution is to translate the TLDs in different languages. However, since the TLD do not convey the language information, it is likely that a translation suitable for one region may not be suitable for other (because of regional translation requirements). This issue is more specific where the scripts / languages are shared across borders. 15

Need for checking well formed-ness of labels Rendering Engine lacunae: The well formed word कत ब as seen in Address of Safari (Version - 3.1.2 (4525.22) ) on MAC OS Version -10.4.11 (Tiger) Nurturing Living Languages Actual display Expected display Bidi Algorithms needed for Urdu, Sindhi and Kashmiri are more complex. 16

Need for checking well formed-ness of labels Rendering Engine lacunae: An ill-formed word composed of sequence 0915 + 093F + 094D + 0924+ 093E + 092C as seen in IE (Version 8) on Windows XP Nurturing Living Languages Actual display Expected display Some applications are incapable of showing IDN labels and show punycode instead. 17

Indian IDNs in Browser Address Bar rendering issues Various Operating Systems and Browser combinations were considered during testing of IDN.in (.भ रत ) 18

Implementation of IDNs in.in(.भ रत ) cctld A formalism based on ABNF has been put in place to validate desired domain name for each language based on syllabic structure. The applicable character sets for all official languages have been identified from the respective Unicode code charts for the script of the language. No intermixing of scripts is allowed Variant rules have been formalized for Domain Name label. Variants occurring syllables have been identified within each language. The variant set has been kept optimal ensuring safety of citizens without being too restrictive. Link : http://pune.cdac.in/html/gist/down/idn_d.asp 19

Best practices that can be carried forward in TLD Suggested Qualification Criterion for IDN TLD : Nurturing Living Languages Validation as per Formalism Proper length, proper character set, proper formation Non variant nature with any of the pre-registered tld Presence of symbols (Currency, logos, sentiment) should be avoided. Tonal stress markers are needed for languages such as Bodo and should be permitted. Example code point 02BC is required for languages Bodo, Dogri, Assamese, Maithil is not part of the respective code pages. Political/Stakeholder opinions 20

Thank you www.cdac.in http://www.xn--11bx2e6a3b.com/ 21