LING 410X - Language as Data

Similar documents
2362 Palmer Set up an appointment:

GRADUATE COLLEGE Dual-Listed Courses

AMD 329 Digital Textile Printing for Apparel Design Fall 2017

Cleveland State University Introduction to University Life Course Syllabus Fall ASC 101 Section:

Course Syllabus p. 1. Introduction to Web Design AVT 217 Spring 2017 TTh 10:30-1:10, 1:30-4:10 Instructor: Shanshan Cui

Interior Design 350 History of Interiors + Furniture

SYLLABUS: RURAL SOCIOLOGY 1500 INTRODUCTION TO RURAL SOCIOLOGY SPRING 2017

Spring 2015 CRN: Department: English CONTACT INFORMATION: REQUIRED TEXT:

CS 100: Principles of Computing

International Business Bachelor. Corporate Finance. Summer Term Prof. Dr. Ralf Hafner

MGMT 479 (Hybrid) Strategic Management

Human Development: Life Span Spring 2017 Syllabus Psych 220 (Section 002) M/W 4:00-6:30PM, 120 MARB

CRITICAL THINKING AND WRITING: ENG 200H-D01 - Spring 2017 TR 10:45-12:15 p.m., HH 205

San José State University

Corporate Communication

Course Syllabus Advanced-Intermediate Grammar ESOL 0352

ME 4495 Computational Heat Transfer and Fluid Flow M,W 4:00 5:15 (Eng 177)

TRINITY VALLEY COMMUNITY COLLEGE COURSE SYLLABUS

Introduction to Sociology SOCI 1101 (CRN 30025) Spring 2015

Biology 1 General Biology, Lecture Sections: 47231, and Fall 2017

MKT ADVERTISING. Fall 2016

MURRAY STATE UNIVERSITY DEPARTMENT: NUTRITION, DIETETICS, AND FOOD MANAGEMENT COURSE PREFIX: NTN COURSE NUMBER: 230 CREDIT HOURS: 3

MATH 1A: Calculus I Sec 01 Winter 2017 Room E31 MTWThF 8:30-9:20AM

INDES 350 HISTORY OF INTERIORS AND FURNITURE WINTER 2017

4-H Ham Radio Communication Proficiency Program A Member s Guide

SOCIAL PSYCHOLOGY. This course meets the following university learning outcomes: 1. Demonstrate an integrative knowledge of human and natural worlds

INTRODUCTION TO GENERAL PSYCHOLOGY (PSYC 1101) ONLINE SYLLABUS. Instructor: April Babb Crisp, M.S., LPC

DEPARTMENT OF HISTORY AND CLASSICS Academic Year , Classics 104 (Summer Term) Introduction to Ancient Rome

HARRISBURG AREA COMMUNITY COLLEGE ONLINE COURSE SYLLABUS

COURSE DESCRIPTION PREREQUISITE COURSE PURPOSE

English Policy Statement and Syllabus Fall 2017 MW 10:00 12:00 TT 12:15 1:00 F 9:00 11:00

Course Policies and Syllabus BUL3130 The Legal, Ethical, and Social Aspects of Business Syllabus Spring A 2017 ONLINE

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

FINN FINANCIAL MANAGEMENT Spring 2014

Course Syllabus MFG Modern Manufacturing Techniques I Spring 2017

Syllabus: PHI 2010, Introduction to Philosophy

IPHY 3410 Section 1 - Introduction to Human Anatomy Lecture Syllabus (Spring, 2017)

Monday/Wednesday, 9:00 AM 10:30 AM

EMPLOYEE DISCRIMINATION AND HARASSMENT COMPLAINT PROCEDURE

Religious Accommodation of Students Policy

PSYCHOLOGY 353: SOCIAL AND PERSONALITY DEVELOPMENT IN CHILDREN SPRING 2006

OFFICE OF COLLEGE AND CAREER READINESS

INTRODUCTION TO HEALTH PROFESSIONS HHS CREDITS FALL 2012 SYLLABUS

Educating Students with Special Needs in Secondary General Education Classrooms. Thursdays 12:00-2:00 pm and by appointment

STA2023 Introduction to Statistics (Hybrid) Spring 2013

The Policymaking Process Course Syllabus

Required Materials: The Elements of Design, Third Edition; Poppy Evans & Mark A. Thomas; ISBN GB+ flash/jump drive

Professors will not accept Extra Credit work nor should students ask a professor to make Extra Credit assignments.

Coding II: Server side web development, databases and analytics ACAD 276 (4 Units)

Course Syllabus. Alternatively, a student can schedule an appointment by .

OFFICE OF DISABILITY SERVICES FACULTY FREQUENTLY ASKED QUESTIONS

ECO 2013: PRINCIPLES OF MACROECONOMICS Spring 2017

CHEM 1105: SURVEY OF GENERAL CHEMISTRY LABORATORY COURSE INFORMATION

TRINITY VALLEY COMMUNITY COLLEGE COURSE SYLLABUS

Legal Research Methods CRCJ 3003A Fall 2013

Application for Admission. Medical Laboratory Science Program

Geography MASTER OF SCIENCE MASTER OF APPLIED GEOGRAPHY. gradcollege.txstate.edu

Valdosta State University Master of Library and Information Science MLIS 7130 Humanities Information Services Syllabus Fall 2011 Three Credit Hours

Aerospace Engineering

Syllabus Foundations of Finance Summer 2014 FINC-UB

Office Location: LOCATION: BS 217 COURSE REFERENCE NUMBER: 93000

Lesson Plan. Preparation

General Microbiology (BIOL ) Course Syllabus

PROCEDURES FOR SELECTION OF INSTRUCTIONAL MATERIALS FOR THE SCHOOL DISTRICT OF LODI

Class Tuesdays & Thursdays 12:30-1:45 pm Friday 107. Office Tuesdays 9:30 am - 10:30 am, Friday 352-B (3 rd floor) or by appointment

SOUTHWEST COLLEGE Department of Mathematics

English 195/410A Writing Center Theory and Practice Section 01, TR 4:30-5:45, Douglass 108

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

GEOG 473/573: Intermediate Geographic Information Systems Department of Geography Minnesota State University, Mankato

MGMT 3362 Human Resource Management Course Syllabus Spring 2016 (Interactive Video) Business Administration 222D (Edinburg Campus)

CMST 2060 Public Speaking

DOCENT VOLUNTEER EDUCATOR APPLICATION Winter Application Deadline: April 15, 2013

Name: Giovanni Liberatore NYUHome Address: Office Hours: by appointment Villa Ulivi Office Extension: 312

Parent Information Welcome to the San Diego State University Community Reading Clinic

GRADUATE STUDENTS Academic Year

Medical Terminology - Mdca 1313 Course Syllabus: Summer 2017

COURSE SYLLABUS: CPSC6142 SYSTEM SIMULATION-SPRING 2015

IST 440, Section 004: Technology Integration and Problem-Solving Spring 2017 Mon, Wed, & Fri 12:20-1:10pm Room IST 202

CHMB16H3 TECHNIQUES IN ANALYTICAL CHEMISTRY

Phys4051: Methods of Experimental Physics I

EEAS 101 BASIC WIRING AND CIRCUIT DESIGN. Electrical Principles and Practices Text 3 nd Edition, Glen Mazur & Peter Zurlis

ACCT 3400, BUSN 3400-H01, ECON 3400, FINN COURSE SYLLABUS Internship for Academic Credit Fall 2017

4:021 Basic Measurements Fall Semester 2011

JN2000: Introduction to Journalism Syllabus Fall 2016 Tuesdays and Thursdays 12:30 1:45 p.m., Arrupe Hall 222

Preferred method of written communication: elearning Message

This course has been proposed to fulfill the Individuals, Institutions, and Cultures Level 1 pillar.

Class Mondays & Wednesdays 11:00 am - 12:15 pm Rowe 161. Office Mondays 9:30 am - 10:30 am, Friday 352-B (3 rd floor) or by appointment

Disability Resource Center (DRC)

SYLLABUS FOR HISTORY 4362 FORMERLY HISTORY 4353 THE HISTORY OF MEXICAN CULTURE FALL, 2015

EDIT 576 (2 credits) Mobile Learning and Applications Fall Semester 2015 August 31 October 18, 2015 Fully Online Course

MATH 205: Mathematics for K 8 Teachers: Number and Operations Western Kentucky University Spring 2017

Demography and Population Geography with GISc GEH 320/GEP 620 (H81) / PHE 718 / EES80500 Syllabus

Academic Integrity RN to BSN Option Student Tutorial

MARKETING ADMINISTRATION MARK 6A61 Spring 2016

EDIT 576 DL1 (2 credits) Mobile Learning and Applications Fall Semester 2014 August 25 October 12, 2014 Fully Online Course

PSY 1012 General Psychology. Course Policies and Syllabus

University of Colorado Boulder, Program in Environmental Design. ENVD : Urban Site Analysis and Design Studio, Summer 2017

MANAGERIAL LEADERSHIP

BSW Student Performance Review Process

New Student Application. Name High School. Date Received (official use only)

Transcription:

Spring Semester 2017 Iowa State University LING 410X - Language as Data Course Handbook Instructor: Sowmya Vajjala Office: 331 Ross Hall Email: sowmya@iastate.edu Course Objectives: This course aims to introduce students to methods of discovering language patterns in text documents and applying them to solve practical text analysis problems in their disciplines. Data of any form (text, numbers, images etc.) is available in large amounts now like never before. Text is one of the major forms of big data and hence text analysis is in huge demand in the information technology industry now. Apart from the technological applications, it is also useful in various disciplines like business intelligence, sociology, psychology and literature to name a few. For example, key word extraction and sentiment analysis are very useful in Business analytics. Authorship detection and stylometric analyses are examples applications for literature. Studying mental disorders through patient written samples is gaining prominence in clinical psychology. In this background, this course introduces some commonly used methods to work with textual data. After a brief primer in the fundamentals of linguistics and its role in text analysis, the course will introduce the students to writing R scripts (as it is easier to do exploratory analysis and visualization in R without learning a lot of programming principles) to perform text analysis and visualize textual data. Learning Outcomes After finishing this course, students will know: 1. some common methods for performing automatic text analysis 2. some real-life applications of text analysis 3. how to apply these methods to solve text analysis problems in their domain areas 4. how to visualize textual data using various tools and methods Pre-requisites: Junior Standing. LING 120 is a preferred but not a mandatory prerequisite. Course Details: on Tuesdays and Thursdays from 9:30 10:50 am Meets in Ross 0406 on Tuesdays, and Ross 0137 (Lab) on Thursdays. Note that the Thursday classroom is different from what is put up on class scheduler. 1

Office hours: Tuesdays and Thursdays, 11-12 noon (please email beforehand if there are specific issues to discuss. If anyone cannot make it during these hours, send me an email to fix an appointment.) Credits: Credit Points: 3 (Expect to spend 5-6 hours per week outside the class to work on the problems and assignments.) Nature of the course and expectations: Primary mode of instruction is by lectures and hands-on lab sessions. The course will have regular assignments that deal with various methods of corpora creation and text analysis using software tools, and a final project. The corpora and resources used in this course will address the methods to solve various text analysis related to the student s discipline. Students enrolled in the course are expected to 1. regularly and actively participate in class, and submit the assignments on time (80% of the grade) 2. work hard, and prepare well for the classes Grading Policy There are 6 Assignments in this course covering 75% of your final grade, and a final project carrying 25 marks (which involves visualizing textual data). Plus/minus grading will be used (93% = A, 90% = A-, 87% = B+, 83% = B, 80% = B-, etc.). The following are the scheduled deadlines for this class Assignment 1: 24 January (10 Marks) Assignment 2: 7 February (10 Marks) Assignment 3: 21 February (15 marks) Assignment 4: 10 March (15 marks) Assignment 5: 2 April (15 marks) Assignment 6: 15 April (10 marks) Group project: (25 Marks total) Initial report due: 21 March (5 marks) Project presentation: 25th, 27th April (5 marks) Project report, and code submission: Finals week, 5th May midnight (15 marks) 2

Attendance Policy Attendance is not mandatory, but recommended. Class etiquette: Please do not read or work on materials for other classes in this class. Come to class on time and do not pack up early. Electronic devices like mobile phones, tablets etc should not be used in the class. Laptops should not be open in class unless there is a concrete, assigned activity. If for some reason, you must leave early or you have an important call coming in, or you have to miss class for an important reason, please let me know (via email) and get it approved before the class. Being absent from the class does not allow you to skip submitting any assignments that were assigned in that class. Academic Conduct: Generally, you are encouraged to work in groups, discuss, and exchange ideas. At the same time, you are expected to do your assignments by yourself and with honesty. For the text you write, you always have to provide explicit references for any ideas or passages you reuse from somewhere else. Note that this includes text taken from the web. You should cite the url of the web site in case no more official publication is available. Specifically, the class will follow the University policy on academic dishonesty. Anyone suspected of academic dishonesty will be reported to the Dean of Students Office: http://www.dso.iastate.edu/ja/academic/misconduct.html Disability Accommodation Iowa State University complies with the Americans with Disabilities Act and Sect 504 of the Rehabilitation Act. If you have a disability and anticipate needing accommodations in this course, please contact (instructor name) to set up a meeting within the first two weeks of the semester or as soon as you become aware of your need. Before meeting with (instructor name), you will need to obtain a SAAR form with recommendations for accommodations from the Student Disability Resources, located in Room 1076 on the main floor of the Student Services Building. Their telephone number is 515-294-7220 or email disabilityresources@iastate.edu. Retroactive requests for accommodations will not be honored. Harassment and Discrimination Iowa State University strives to maintain our campus as a place of work and study for faculty, staff, and students that is free of all forms of prohibited discrimination and harassment based upon race, ethnicity, sex (including sexual assault), pregnancy, color, religion, national origin, physical or mental disability, age, marital status, sexual orientation, gender identity, genetic information, or status as a U.S. veteran. Any student who has concerns about such behavior should contact his/her instructor, Student Assistance at 515-294-1020 or email dso-sas@iastate.edu, or the Office of Equal Opportunity and Compliance at 515-294-7612. Dead Week Policy This class follows the Iowa State University Dead Week policy as noted in section 10.6.4 of the Faculty Handbook: http://www.provost.iastate.edu/ resources/faculty-handbook Textbooks The primary textbook is: Text analysis with R for students of literature by M.J.Jockers and you are not obligated to buy it. The course will also rely on a wide range of freely accessible online tutorials and videos related to various methods of text analysis. (example: https://github.com/kbenoit/itaur-short). 3

Syllabus - topics covered 1. Introduction Text analysis, real world applications, usefulness for various disciplines 2. Introduction to Linguistics and the role of linguistic knowledge in solving text analysis problems, with examples 3. Installing R and working with it. 4. Corpus preparation: methods to select, process and clean corpora 5. Keyword and Key-phrase extraction methods 6. text classification methods and their application for sentiment detection 7. topic modeling and its applications 8. methods of visualizing textual information Scheduling and Deadlines (tentative) Note that the following session plan is subject to change; it only constitutes the current state of our planning as the semester unfolds. 1. Tuesday, January 10: Introduction to the course, expectations etc. 2. Thursday, January 12: R set up, basics practice (lab) A1 assigned. Due on 24th January 3. Tuesday, January 17: No class. instructor was sick. 4. Thursday, January 19: introducing text processing in R 5. Tuesday, January 24: Corpus preprocessing and cleaning - Introduction and issues involved A2 assigned on pre-processing text. Due on 7th February 6. Thursday, January 26: Corpus cleaning continued (Reading from HTML, PDF, XML, JSON etc.) + Practice. 7. Tuesday, January 31: Scraping data from Twitter, NYT etc. 8. Thursday, February 2: corpus cleaning: conclusion + learning to use R Markdown 9. Tuesday, February 7: Introduction to vocabulary analysis. Keywords and Phrases extraction - overview, and applications A3 assigned on vocabulary and phrase analysis. Due on 21st February 10. Thursday, February 9: KWIC and other such tools: usage, analysis and lab 11. Tuesday, February 14: Words to Phrases (ngrams etc) 12. Thursday, February 16: Conclusion of the topic and exercises. 13. Tuesday, February 21: Text classification overview A4 assigned. Due on 10th March. 15 marks. 14. Thursday, February 23: Text classification and R 4

15. Tuesday, February 28: Text classification continued 16. Thursday, March 2: Text classification conclusion 17. Tuesday, March 7: Revision of concepts so far. Description of final project ideas 18. Thursday, March 9: Revision, Final project ideas discussion and decisions made. 19. Tuesday, March 14: Spring break 20. Thursday, March 16: Spring break 21. Tuesday, March 21: Topic Modeling A5 assigned. Due on 2 April. 10 marks. 22. Thursday, March 23: Topic Modeling 23. Tuesday, March 28: Topic Modeling 24. Thursday, March 30: Topic Modeling 25. Tuesday, April 4: Visualizing textual data A6 assigned. Due on 15th April. 10 marks. 26. Thursday, April 6: Visualizing textual data 27. Tuesday, April 11: Visualizing textual data 28. Thursday, April 13: Visualizing textual data 29. Tuesday, April 18: Conclusion and revision 30. Thursday, April 20: Conclusion and revision. Group exercises on exploring domain specific problems and solving them. 31. Tuesday, April 25: Group project presentations 32. Thursday, April 27: Group project presentations 5