Introduction, Organization Overview of NLP, Main Issues

Similar documents
Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

COMMUNICATION & NETWORKING. How can I use the phone and to communicate effectively with adults?

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Context Free Grammars. Many slides from Michael Collins

CS Course Missive

COSI Meet the Majors Fall 17. Prof. Mitch Cherniack Undergraduate Advising Head (UAH), COSI Fall '17: Instructor COSI 29a

Building an HPSG-based Indonesian Resource Grammar (INDRA)

CS Machine Learning

ECE (Fall 2009) Computer Networking Laboratory

Improving Machine Learning Input for Automatic Document Classification with Natural Language Processing

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

Tap vs. Bottled Water

Parsing of part-of-speech tagged Assamese Texts

The open source development model has unique characteristics that make it in some

Layne C. Smith Education 560 Case Study: Sean a Student At Windermere Elementary School

SEBUTHARGA NO. : SH/27/2017 SCOPE OF WORKS, TECHNICAL SPECIFICATIONS & REQUIREMENTS

Applications of memory-based natural language processing

Laboratorio di Intelligenza Artificiale e Robotica

Telekooperation Seminar

SMARTboard: The SMART Way To Engage Students

TEAM-BUILDING GAMES, ACTIVITIES AND IDEAS

Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities

A Web Based Annotation Interface Based of Wheel of Emotions. Author: Philip Marsh. Project Supervisor: Irena Spasic. Project Moderator: Matthew Morgan

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Ensemble Technique Utilization for Indonesian Dependency Parser

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Welcome to the Purdue OWL. Where do I begin? General Strategies. Personalizing Proofreading

Ferry Lane Primary School

Part I. Figuring out how English works

Getting Into Private School: The A To Z Guide To The Private High School Admissions Process (Princeton Review) By Frank Leana Ph.D.

JING: MORE BANG FOR YOUR INSTRUCTIONAL BUCK

Laboratorio di Intelligenza Artificiale e Robotica

Listening to your members: The member satisfaction survey. Presenter: Mary Beth Watt. Outline

ACC : Accounting Transaction Processing Systems COURSE SYLLABUS Spring 2011, MW 3:30-4:45 p.m. Bryan 202

CURRICULUM VITAE PERSONAL DETAILS. Evans Anderson Kirimi Miriti Year of Birth: English (Excellent), Kiswahili (Excellent), French (Fair).

Spring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes

Function Number 1 Work as part of a team. Thorough knowledge of theoretical procedures and ability to integrate knowledge and performance into

IBAN LANGUAGE PARSER USING RULE BASED APPROACH

Regan's Resume Last Edit : 31 March 2008

Reading Project. Happy reading and have an excellent summer!

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

THE LUCILLE HARRISON CHARITABLE TRUST SCHOLARSHIP APPLICATION. Name (Last) (First) (Middle) 3. County State Zip Telephone

Greenes' Guides To Educational Planning: Making It Into A Top Graduate School: 10 Steps To Successful Graduate School Admission By Howard

The CESAR Project: Enabling LRT for 70M+ Speakers

PART C: ENERGIZERS & TEAM-BUILDING ACTIVITIES TO SUPPORT YOUTH-ADULT PARTNERSHIPS

Pair Programming. Spring 2015

Faculty Schedule Preference Survey Results

Getting Started with Deliberate Practice

International Business Bachelor. Corporate Finance. Summer Term Prof. Dr. Ralf Hafner

Distant Supervised Relation Extraction with Wikipedia and Freebase

MCAS_2017_Gr5_ELA_RID. IV. English Language Arts, Grade 5

Academic Integrity RN to BSN Option Student Tutorial

Industrial Assessment Center. Don Kasten. IAC Student Webcast. Manager, Technical Operations Center for Advanced Energy Systems.

Lecturing in the Preclinical Curriculum A GUIDE FOR FACULTY LECTURERS

Treebank mining with GrETEL. Liesbeth Augustinus Frank Van Eynde

Course Syllabus MFG Modern Manufacturing Techniques I Spring 2017

On May 3, 2013 at 9:30 a.m., Miss Dixon and I co-taught a ballet lesson to twenty

HIDDEN RULES FOR OFFICE HOURS W I L L I A M & M A R Y N E U R O D I V E R S I T Y I N I T I A T I V E

Language and Literacy: Exploring Examples of the Language and Literacy Foundations

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

babysign 7 Answers to 7 frequently asked questions about how babysign can help you.

Curriculum Vitae James J. Cummings

Using dialogue context to improve parsing performance in dialogue systems

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

TWO OLD WOMEN (An Alaskan Legend of Betrayal, Courage and Survival) By Velma Wallis

CSL465/603 - Machine Learning

NATURAL LANGUAGE PARSING AND REPRESENTATION IN XML EUGENIO JAROSIEWICZ

Education for an Information Age

IEP AMENDMENTS AND IEP CHANGES

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

OFFICE SUPPORT SPECIALIST Technical Diploma

McGraw-Hill Connect and Create Built by Blackboard. Release Notes. Version 2.3 for Blackboard Learn 9.1

Hentai High School A Game Guide

BYLINE [Heng Ji, Computer Science Department, New York University,

Replace difficult words for Is the language appropriate for the. younger audience. For audience?

AQUA: An Ontology-Driven Question Answering System

Process improvement, The Agile Way! By Ben Linders Published in Methods and Tools, winter

A comparative study on cost-sharing in higher education Using the case study approach to contribute to evidence-based policy

Developing Grammar in Context

Undergraduate Program Guide. Bachelor of Science. Computer Science DEPARTMENT OF COMPUTER SCIENCE and ENGINEERING

University of New Orleans

Opinion on Private Garbage Collection in Scarborough Mixed

BUSI 2504 Business Finance I Spring 2014, Section A

Experience Corps. Mentor Toolkit

This document has been produced by:

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Grounding Language for Interactive Task Learning

Identifying Novice Difficulties in Object Oriented Design

NEWSLETTER NOVEMBER Imperial Road South, Guelph, Ontario, N1K 1Z4 Phone: (519) , Fax: (519) Attendance Line: (519)

DKPro WSD A Generalized UIMA-based Framework for Word Sense Disambiguation

Modern Languages. Introduction. Degrees Offered

Aviation English Solutions

How Living Costs Undermine Net Price As An Affordability Metric

Starting primary school

Ministry of Education Singapore

Knowledge Synthesis and Integration: Changing Models, Changing Practices

Overview of the 3rd Workshop on Asian Translation

Transcription:

HG2051 Language and the Computer Computational Linguistics with Python Introduction, Organization Overview of NLP, Main Issues Francis Bond Division of Linguistics and Multilingual Studies http://www3.ntu.edu.sg/home/fcbond/ bond@ieee.org Lecture 1 http://compling.hss.ntu.edu.sg/courses/hg2051/ HG2051

Introduction Self Introduction Administrivia Why use computers in linguistics What this course is (and isn t) Introduction to the Natural Language Tool Kit (external slides) Texts and Words; Natural Language Understanding (HTML) Tutorial Getting to know each other (what do you want) Getting to know the NLTK HG2051 1

BA in Japanese and Mathematics BEng in Power and Control Self Introduction PhD on Determiners and Number in English contrasted with Japanese, as exemplified in Machine Translation 1991-2006 NTT (Nippon Telegraph and Telephone) Japanese - English/Malay Machine Translation Japanese corpus, grammar and ontology (Hinoki) 2006-2009 NICT (National Inst. for Info. and Comm. Technology) Japanese - English/Chinese Machine Translation Japanese WordNet 2009- NTU HG2051 2

Assessment Continuous Assessment (100%) Assignment One (30%) Assignment Two (30%) Group work Final In-Class On-Line Open-Book Programming Challenge (30%) Individual work (one program each) In class 6 hour exam Participation (10%) Every week there will be short problems to do in class HG2051 3

Extra Credit If you submit a patch 1 that gets accepted to the NLTK or another tool we use you can get 1-5% extra credit (depending on the size/difficulty) Mark n 10 n 1 lines of code/documentation You can t go over 100% A patch can involve fixing a bug in code extending the code with new capabilities fixing a bug in or extending documentation spelling error rewording translating 1 a short set of commands to correct a bug in a computer program HG2051 4

Why use Computers in Linguistics? Linguistics without computers is like taking a walk (or a long hard hike) It can be very pleasant You can see a lot of details There is only so much ground you can cover Using a software tool is like catching the MRT Very efficient for set routes You have to adapt to it Hard to customize Programming is like driving a car It is expensive to start off (you have to learn!) You are free to go where you want to HG2051 5

The goal of this course To learn enough programming to flexibly analyze data and then do something with it The language will by Python We will use the NLTK toolkit You will be able to write your own programs by the end HG2051 6

HG251 Prerequisites A little linguistic knowledge You know what a word is You know what a part of speech is You know what a parse tree is If you don t know these, you will have to do a little background reading No computational knowledge You have to be ready to learn If you are a very experienced Python programmer, then you will not learn so much If you can program, but in a different framework, then you will learn something new, and I will expect more from your code HG2051 7

What HG251 isn t We won t be learning how to build cars this is the prerequisite for further NLP courses... but we won t be writing taggers and parsers yet Just an introduction to Python We will be motivated by NLP Very easy (this is a feature, not a bug) but it is very fun HG2051 8

the Three Virtues of a Programmer Laziness The quality that makes you go to great effort to reduce overall energy expenditure. It makes you write labor-saving programs that other people will find useful, and document what you wrote so you don t have to answer so many questions about it. Impatience The anger you feel when the computer is being lazy. This makes you write programs that don t just react to your needs, but actually anticipate them. Or at least pretend to. Hubris The quality that makes you write (and maintain) programs that other people won t want to say bad things about. Larry Wall, Tom Christiansen, Randal L. Schwartz and Stephen Potter (1996) Programming Perl 2nd Ed, O Reilly HG2051 9

Schedule On-line compling.hss.ntu.edu.sg/courses/hg2051/ HG2051 10

But enough about me Language Poll Natural Mandarin Bahasa Malay Tamil... Artificial PERL C/C++ Java... HG2051 11

Readings Core readings are all from the NLTK book. Supplementary readings may be assigned, but all the sources will be online. All Wikipedia articles have been checked by me, and I will watch them for changes. (extend the web of trust) You must read the material before class I will assume that you have done so You get good at programming by programming that is how we should spend our time HG2051 12

Acknowledgments Thanks to Graham Wilcox for the inspiration for this course, and permission to adapt his course notes. Thanks to Steven Bird, Ewan Kline and Edward Loper for releasing the NLTK Thanks to Guido van Rossum, Python Benevolent Dictator for Life (BDFL). Definitions from WordNet 3.0 HG2051 13