Predictive Analysis of Text: Concepts, Instances, and Classifiers. Heejun Kim

Similar documents
CS Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Python Machine Learning

CS 446: Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

(Sub)Gradient Descent

Lecture 1: Machine Learning Basics

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Lecture 1: Basic Concepts of Machine Learning

Assignment 1: Predicting Amazon Review Ratings

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

12- A whirlwind tour of statistics

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Switchboard Language Model Improvement with Conversational Data from Gigaword

Probability and Statistics Curriculum Pacing Guide

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Learning From the Past with Experiment Databases

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

Probability estimates in a scenario tree

AP Statistics Summer Assignment 17-18

Word Segmentation of Off-line Handwritten Documents

Association Between Categorical Variables

REALISTIC MATHEMATICS EDUCATION FROM THEORY TO PRACTICE. Jasmina Milinković

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Build on students informal understanding of sharing and proportionality to develop initial fraction concepts.

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Rule Learning With Negation: Issues Regarding Effectiveness

Preliminary Chapter survey experiment an observational study that is not a survey

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Multi-label classification via multi-target regression on data streams

Reflective Teaching KATE WRIGHT ASSOCIATE PROFESSOR, SCHOOL OF LIFE SCIENCES, COLLEGE OF SCIENCE

Evaluation of Teach For America:

Grade 6: Correlated to AGS Basic Math Skills

Introduction to Causal Inference. Problem Set 1. Required Problems

learning collegiate assessment]

Truth Inference in Crowdsourcing: Is the Problem Solved?

Model Ensemble for Click Prediction in Bing Search Ads

A Case Study: News Classification Based on Term Frequency

Redirected Inbound Call Sampling An Example of Fit for Purpose Non-probability Sample Design

Iowa School District Profiles. Le Mars

MADERA SCIENCE FAIR 2013 Grades 4 th 6 th Project due date: Tuesday, April 9, 8:15 am Parent Night: Tuesday, April 16, 6:00 8:00 pm

Name Class Date. Graphing Proportional Relationships

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

STA 225: Introductory Statistics (CT)

Ensemble Technique Utilization for Indonesian Dependency Parser

Reducing Features to Improve Bug Prediction

Linking Task: Identifying authors and book titles in verbose queries

A Version Space Approach to Learning Context-free Grammars

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

An Introduction to Simio for Beginners

Softprop: Softmax Neural Network Backpropagation Learning

Learning to Rank with Selection Bias in Personal Search

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Rule Learning with Negation: Issues Regarding Effectiveness

School Size and the Quality of Teaching and Learning

Algebra 2- Semester 2 Review

Speech Emotion Recognition Using Support Vector Machine

Introduction to Simulation

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Ricopili: Postimputation Module. WCPG Education Day Stephan Ripke / Raymond Walters Toronto, October 2015

End-of-Module Assessment Task

Software Maintenance

Peer Influence on Academic Achievement: Mean, Variance, and Network Effects under School Choice

Evolutive Neural Net Fuzzy Filtering: Basic Description

UCLA UCLA Electronic Theses and Dissertations

A Vector Space Approach for Aspect-Based Sentiment Analysis

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

On-the-Fly Customization of Automated Essay Scoring

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

6 Financial Aid Information

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Best Colleges Main Survey

Pre-Algebra A. Syllabus. Course Overview. Course Goals. General Skills. Credit Value

Paper Reference. Edexcel GCSE Mathematics (Linear) 1380 Paper 1 (Non-Calculator) Foundation Tier. Monday 6 June 2011 Afternoon Time: 1 hour 30 minutes

Effectiveness of McGraw-Hill s Treasures Reading Program in Grades 3 5. October 21, Research Conducted by Empirical Education Inc.

Lesson M4. page 1 of 2

Left, Left, Left, Right, Left

Introducing the New Iowa Assessments Mathematics Levels 12 14

Let s think about how to multiply and divide fractions by fractions!

Cooper Upper Elementary School

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Lecture 2: Quantifiers and Approximation

Sight Word Assessment

Mathematics process categories

arxiv: v1 [cs.cv] 10 May 2017

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Transcription:

Predictive Analysis of Text: Concepts, Instances, and Classifiers Heejun Kim May 29, 2018

Predictive Analysis of Text Objective: developing computer programs that automatically predict a particular concept within a span of text

Procedure Performance Test Test Data color size sides equal sides... label Model red big 3 no... yes green big 3 yes... yes blue small inf yes... no blue small 4 yes... no............ red big 3 yes... yes Training Data Representation color size sides equal sides... label red big 3 no... yes green big 3 yes... yes blue small inf yes... no blue small 4 yes... no... red big 3 yes... yes... Learning Algorithm

basic ingredients Training data: a set of examples of the labeled concept we want to automatically recognize Representation: a set of features that we believe are useful in recognizing the desired concept Learning algorithm: a computer program that uses the training data to learn a predictive model of the concept

basic ingredients Model: a function that describes a predictive relationship between feature values and the presence/absence of the concept Test data: a set of previously unseen examples used to estimate the model s effectiveness Performance metrics: a set of statistics used measure the predictive effectiveness of the model

training and testing labeled examples training machine learning algorithm testing model model new, unlabeled examples predictions

instances Predictive Analysis: concept, instances, and features features concept color size # slides equal sides... label red big 3 no... yes green big 3 yes... yes blue small inf yes... no blue. small. 4. yes..... no. red big 3 yes... yes

Type of features Nominal: values that are distinct symbols (e.g., male and female). No ordering or distance. Numeric Ordinal: ranked order of the categories (e.g., hot, mild, and cool). No distance. Interval: ordered and measured in fixed and equal units (e.g., temperature and school year). 0 is arbitrary. Ratio: measurement method inherently defines a zero point (e.g., distance). Ordered and measured in fixed and equal units.

training and testing color size # slides Equal sides... label red big 3 no... yes green big 3 yes... yes blue small inf yes... no blue small 4 yes... no... red big 3 yes... yes color labeled examples size # slides. Equal sides new, unlabeled examples..... label red big 3 no...? Green big 3 yes...? blue small inf yes...? blue small 4 yes...?... red big 3 yes...?... training machine learning algorithm testing model color model size # slides Equal sides predictions... label red big 3 no... yes green big 3 yes... yes blue small inf yes... no blue small 4 yes... no... red big 3 yes... yes...

questions Is a particular concept appropriate for predictive analysis? What should the unit of analysis be? How should I divide the data into training and test sets? What is a good feature representation for this task? What type of learning algorithm should I use? How should I evaluate my model s performance?

Concepts Learning algorithms can recognize some concepts better than others What are some properties of concepts that are easier to recognize?

Concepts Option 1: can a human recognize the concept? Option 2: can two or more humans recognize the concept independently and do they agree? Option 2 is better. In fact, models are sometimes evaluated as an independent assessor How does the model s performance compare to the performance of one assessor with respect to another? One assessor produces the ground truth and the other produces the predictions

measures agreement: percent agreement Percent agreement: percentage of instances for which both assessors agree that the concept occurs or does not occur (A + D) (A + B + C + D) yes no yes A B no C D

measures agreement: percent agreement Problem: percent agreement does not account for agreement due to random chance. How can we compute the expected agreement due to random chance?

measures agreement: percent agreement Percent agreement: (80 + 10) (80 + 5 + 5 + 10) yes no yes 80 5 no 5 10 Agreement due to random chance?

measures agreement: percent agreement How can we compute the expected agreement due to random chance? Kappa agreement: percent agreement after correcting for the expected agreement due to chance (not covered in this course) For more details, refer to Wikipedia article or online video

questions Is a particular concept appropriate for predictive analysis? What should the unit of analysis be? How should I divide the data into training and test sets? What is a good feature representation for this task? What type of learning algorithm should I use? How should I evaluate my model s performance?

turning data into training and test instances For many text-mining applications, turning the data into instances for training and testing is fairly straightforward Easy case: instances are self-contained, independent units of analysis topic categorization: instances = documents opinion mining: instances = product reviews bias detection: instances = political blog posts emotion detection: instances = support group posts

instances Topic Categorization: predicting health-related documents features concept w_1 w_2 w_3... w_n label 1 1 0... 0 health 0 0 0... 0 other 0 0 0... 0 other 0 1 0... 1 other......... 0.. 1 0 0... 1 health

instances Opinion Mining predicting positive/negative movie reviews features concept w_1 w_2 w_3... w_n label 1 1 0... 0 positive 0 0 0... 0 negative 0 0 0... 0 negative 0 1 0... 1 negative......... 0.. 1 0 0... 1 positive

instances Bias Detection predicting liberal/conservative blog posts features concept w_1 w_2 w_3... w_n label 1 1 0... 0 liberal 0 0 0... 0 conservative 0 0 0... 0 conservative 0 1 0... 1 conservative......... 0.. 1 0 0... 1 liberal

questions Is a particular concept appropriate for predictive analysis? What should the unit of analysis be? How should I divide the data into training and test sets? What is a good feature representation for this task? What type of learning algorithm should I use? How should I evaluate my model s performance?

training and test data We want our model to learn to recognize a concept So, what does it mean to learn?

training and test data The machine learning definition of learning: A machine learns with respect to a particular task T, performance metric P, and experience E, if the system improves its performance P at task T following experience E. -- Tom Mitchell

can we use the same data for testing? Training Data training machine learning algorithm Spam Detection Model Test Data testing New Data

training and test data We want our model to improve its generalization performance! That is, its performance on previously unseen data! Generalize: to derive or induce a general conception or principle from particulars. -- Merriam-Webster In order to test generalization performance, the training and test data cannot be the same. Why?

Training data + Representation: what could possibly go wrong?

training and test data While we don t want to test on training data, we want to have training and test set that are derived from the same probability distribution. What does that mean?

training and test data?? Data Training Data Test Data : positive instances : negative instances

training and test data Is this a good partitioning? Why or why not? Data Training Data Test Data : positive instances : negative instances

training and test data Random Sample Random Sample Data Training Data Test Data : positive instances : negative instances

training and test data On average, random sampling should produce comparable data for training and testing Data Training Data Test Data : positive instances : negative instances

Statistical Estimation Link

training and test data SAP

training and test data If you want to predict stock price by analyzing tweets, how the training and test data should be separated? Test data Training data t o t 1 t 2 t 3 t 4

training and test data If you want to predict stock price by analyzing tweets, how the training and test data should be separated? Training data Test data t o t 1 t 2 t 3 t 4

training and test data Models usually perform the best when the training and test set have: a similar proportion of positive and negative examples a similar co-occurrence of feature-values and each target class value

training and test data Caution: in some situations, partitioning the data randomly might inflate performance in an unrealistic way! How the data is split into training and test sets determines what we can claim about generalization performance The appropriate split between training and test sets is usually determined on a case-by-case basis

discussion Spam detection: should the training and test sets contain email messages from the same sender, same recipient, and/or same timeframe? Topic segmentation: should the training and test sets contain potential boundaries from the same discourse? Opinion mining for movie reviews: should the training and test sets contain reviews for the same movie? Sentiment analysis: should the training and test sets contain blog posts from the same discussion thread?

questions Is a particular concept appropriate for predictive analysis? What should the unit of analysis be? How should I divide the data into training and test sets? What is a good feature representation for this task? What type of learning algorithm should I use? How should I evaluate my model s performance?

three types of classifiers Linear classifiers Decision tree classifiers Instance-based classifiers

three types of classifiers All types of classifiers learn to make predictions based on the input feature values However, different types of classifiers combine the input feature values in different ways

three types of classifiers

Number of Usefulness votes Learning Algorithm + Model: what could possibly go wrong? 12 Relationship between Usefulness and word count 10 8 6 4 2 0 10 300 Word_Count

Predictive Analysis linear classifiers: perceptron algorithm parameters learned by the model predicted value (e.g., 1 = positive, 0 = negative)

Predictive Analysis linear classifiers: perceptron algorithm test instance f_1 f_2 f_3 0.5 1 0.2 model weights w_0 w_1 w_2 w_3 2-5 2 1 output = 2.0 + (0.50 x -5.0) + (1.0 x 2.0) + (0.2 x 1.0) output = 1.7 output prediction = positive

Predictive Analysis linear classifiers: perceptron algorithm (two-feature example borrowed from Witten et al. textbook)

Predictive Analysis linear classifiers: logistic regression when (source: https://en.wikipedia.org/wiki/logistic_regression#/media/file:logistic-curve.svg)

would a linear classifier work? 1.0 x2 0.5 0.5 1.0 x1

three types of classifiers Linear classifiers Decision tree classifiers Instance-based classifiers

Predictive Analysis decision tree classifiers Node Edge Leaf

Predictive Analysis decision tree classifiers Decision Tree Special decision rules organized in form of tree data structure that help to understand the relationship among the attributes and class labels. Attributes become nodes, edges are used to represent the values of these attributes, and predictions are made at each leaf.

decision tree classifiers 1.0 x2 0.5 0.5 1.0 x1 Draw a decision tree that would perform perfectly on this training data!

examples of decision tree classifiers 1.0 X1 > 0.5 x2 0.5 yes no X2 > 0.5 X2 > 0.5 0.5 1.0 x1 yes no yes no black white white black

three types of classifiers Linear classifiers Decision tree classifiers Instance-based classifiers

instance-based classifiers 1.0 x2 0.5? 0.5 1.0 x1 predict the class associated with the most similar training examples

instance-based classifiers 1.0? x2 0.5 0.5 1.0 x1 predict the class associated with the most similar training examples

instance-based classifiers Assumption: instances with similar feature values should have a similar label Given a test instance, predict the label associated with its nearest neighbors There are many different similarity metrics for computing distance between training/test instances

questions Is a particular concept appropriate for predictive analysis? What should the unit of analysis be? How should I divide the data into training and test sets? What is a good feature representation for this task? What type of learning algorithm should I use? How should I evaluate my model s performance?

Any Questions?

Text Representation Next Class