Prerequisite Relation Learning for Concepts in MOOCs

Similar documents
Python Machine Learning

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Lecture 1: Machine Learning Basics

A Case Study: News Classification Based on Term Frequency

Assignment 1: Predicting Amazon Review Ratings

Linking Task: Identifying authors and book titles in verbose queries

CS Machine Learning

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Rule Learning With Negation: Issues Regarding Effectiveness

Reducing Features to Improve Bug Prediction

Learning From the Past with Experiment Databases

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

CS 446: Machine Learning

arxiv: v1 [cs.cl] 2 Apr 2017

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Probabilistic Latent Semantic Analysis

Indian Institute of Technology, Kanpur

Speech Emotion Recognition Using Support Vector Machine

Rule Learning with Negation: Issues Regarding Effectiveness

Beyond the Pipeline: Discrete Optimization in NLP

Learning to Schedule Straight-Line Code

Using dialogue context to improve parsing performance in dialogue systems

(Sub)Gradient Descent

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

arxiv: v2 [cs.cv] 30 Mar 2017

Semantic and Context-aware Linguistic Model for Bias Detection

arxiv: v1 [cs.lg] 15 Jun 2015

Australian Journal of Basic and Applied Sciences

A Semantic Similarity Measure Based on Lexico-Syntactic Patterns

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Exploration. CS : Deep Reinforcement Learning Sergey Levine

CSL465/603 - Machine Learning

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Finding Translations in Scanned Book Collections

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

A Vector Space Approach for Aspect-Based Sentiment Analysis

Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition

Detecting English-French Cognates Using Orthographic Edit Distance

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Truth Inference in Crowdsourcing: Is the Problem Solved?

Artificial Neural Networks written examination

Word Segmentation of Off-line Handwritten Documents

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Model Ensemble for Click Prediction in Bing Search Ads

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Switchboard Language Model Improvement with Conversational Data from Gigaword

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA

Human Emotion Recognition From Speech

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Georgetown University at TREC 2017 Dynamic Domain Track

Multi-Lingual Text Leveling

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Multiobjective Optimization for Biomedical Named Entity Recognition and Classification

Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities

Summarizing Answers in Non-Factoid Community Question-Answering

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

Universidade do Minho Escola de Engenharia

Term Weighting based on Document Revision History

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Distant Supervised Relation Extraction with Wikipedia and Freebase

Robust Sense-Based Sentiment Classification

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Multilingual Sentiment and Subjectivity Analysis

The stages of event extraction

Bug triage in open source systems: a review

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Group A Lecture 1. Future suite of learning resources. How will these be created?

Semi-Supervised Face Detection

Postprint.

Using EEG to Improve Massive Open Online Courses Feedback Interaction

Regression for Sentence-Level MT Evaluation with Pseudo References

A study of speaker adaptation for DNN-based speech synthesis

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Task Tolerance of MT Output in Integrated Text Processes

Calibration of Confidence Measures in Speech Recognition

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

EXPLOITING DOMAIN AND TASK REGULARITIES FOR ROBUST NAMED ENTITY RECOGNITION

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

Issues in the Mining of Heart Failure Datasets

Ensemble Technique Utilization for Indonesian Dependency Parser

School of Innovative Technologies and Engineering

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Softprop: Softmax Neural Network Backpropagation Learning

The Smart/Empire TIPSTER IR System

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Transcription:

Prerequisite Relation Learning for Concepts in MOOCs Reporter: Liangming PAN Authors: Liangming PAN, Chengjiang LI, Juanzi LI, Jie TANG Knowledge Engineering Group Tsinghua University 2017-04-19 1

Outline Backgrounds Problem Definition Methods Experiments and Analysis Conclusion 2

Backgrounds Prerequisite Relation Learning for Concepts in MOOCs 3

Backgrounds Prerequisite Relation Learning for Concepts in MOOCs Massive open online courses (MOOCs) have become increasingly popular and offered students around the world the opportunity to take online courses from prestigious universities. 4

Backgrounds Prerequisite Relation Learning for Concepts in MOOCs Massive open online courses (MOOCs) have become increasingly popular and offered students around the world the opportunity to take online courses from prestigious universities. 5

Backgrounds Prerequisite Relation Learning for Concepts in MOOCs A prerequisite is usually a concept or requirement before one can proceed to a following one. The prerequisite relation exists as a natural dependency among concepts in cognitive processes when people learn, organize, apply, and generate knowledge (Laurence and Margolis, 1999). 6

Backgrounds Prerequisite Relation Learning for Concepts in MOOCs Partha Pratim Talukdar and William W Cohen. Crowdsourced comprehension: predicting prerequisite structure in wikipedia. 2012. 7

Backgrounds Prerequisite Relation Learning for Concepts in MOOCs Motivation 1. Manually building a concept map in MOOCs is infeasible In the era of MOOCs, it is becoming infeasible to manually organize the knowledge structures with thousands of online courses from different providers. Motivation 2. To help improve the learning experience of students The students from different background can easily explore the knowledge space and better design their personalized learning schedule. 8

Backgrounds Question: What should she get started if she wants to learn the concept of conditional random field? 9

Outline Backgrounds Problem Definition Methods Experiments and Analysis Conclusion 10

Input Problem Definition MOOC Corpus, where C i is one course Course, where v i is the i-th video of course C Video, where s i is the i-th sentence of video v Course Concepts, where K i is the set of course concepts in C i Output Prerequisite Function The function PF predicts whether concept a is a prerequisite concept of b 11

Outline Backgrounds Problem Definition Methods Experiments and Analysis Conclusion 12

Features Overview Semantic Features Semantic Relatedness Video Reference Distance Features Contextual Features Sentence Reference Distance Wikipedia Reference Distance Average Position Distance Structural Features Distributional Asymmetry Distance Complexity Level Distance 13

Semantic Features Features Semantic Features Semantic Relatedness Semantic Relatedness plays an important role in prerequisite relations between concepts. If two concepts have very different semantic meanings, it is unlikely that they have prerequisite relations. Matrix Anthropology Gradient Descent Neural Networks 14

Semantic Features Concept Embeddings Wikipedia corpus Procedure of Concept Embeddings 1. Entity Annotation: We label all the entities in the Wikipedia corpus based on the hyperlinks in Wiki, and get a new corpus OE and a wiki entity set ES. Where x i corresponds to a word w OE or an entity e ES 2. Word Embeddings: We apply the skip-gram model to train word embeddings on OE. 3. Concept Representation: After training, we can obtain the vector for each concept in ES. For any non-wiki concept, we obtain its vector via the vector addition of its individual word vectors. 15

Contextual Features Features Contextual Features Video Reference Distance If in videos where concept A is frequently talked about, the teacher also needs to refer to concept B for a lot but not vice versa, then B would more likely be a prerequisite of A. Back Propagation Gradient Descent Mention Mention Gradient Descent Back Propagation 16

Video Reference Distance Video Set of the MOOC corpus Contextual Features Video Reference Weight from A to B Where f(a, v): the term frequency of concept A in video v r v, B {0,1}: whether concept B appears in video v It indicates how B is referred by A s videos Video Reference Distance of (A,B) 17

Generalized Video Reference Distance Contextual Features Generalized Video Reference Weight from A to B Where {a 1,, a K }: the top-k most similar concepts of A, where a 1,, a K T w a i, A : the similarity between a i and A It indicates how B is referred by A s related concepts in their videos Generalized Video Reference Distance of (A,B) 18

Contextual Features Semantic Features Semantic Relatedness Video Reference Distance Features Contextual Features Sentence Reference Distance Wikipedia Reference Distance Average Position Distance Structural Features Distributional Asymmetry Distance Complexity Level Distance 19

Structural Features Average Position Distance Features Structural Features Complexity Level Distance Distributional Asymmetry Distance In teaching videos, knowledge concepts are usually introduced based on their learning dependencies, so the structure of MOOC courses also significantly contribute to prerequisite relation inference in MOOCs. We investigate 3 different structural information, including appearing positions of concepts, learning dependencies of videos and complexity levels of concepts. 20

Structural Features Average Position Distance Assumption In a course, for a specific concept, its prerequisite concepts tend to be introduced before this concept and its subsequent concepts tend to be introduced after this concept. TOC Distance of (A,B) 21 Where C(A, B): the set of courses in which A and B both appear AP(A,C) = the average index of videos containing concept A in course C (The average position of a concept A in course C)

Distributional Asymmetry Distance Assumption Structural Features The learning dependency of course videos is also helpful to infer learning dependency of course concepts. Specifically, if video V a is a precursor video of V b, and a is a prerequisite concept of b, then it is likely that f(b, V a ) < f(a, V b ) Example Mention Gradient Descent Back Propagation A Mention B 22

Structural Features Distributional Asymmetry Distance All possible video pairs of that have sequential relation Distributional Asymmetry Distance 23

Complexity Level Distance Assumption Structural Features If two related concepts have prerequisite relationship, they may have a difference in their complexity level. It means that one concept is more basic while another one is more advanced. Example Training Set Test Set Data Set 24

Complexity Level Distance Assumption Structural Features For a specific concept, if it covers more videos in the course or it survives longer time in a course, then it is more likely to be a general concept rather than a specific concept. Average video coverage of A Average survival time of A Complexity Level Distance of (A,B) 25

Outline Backgrounds Problem Definition Methods Experiments and Analysis Conclusion 26

Experimental Datasets Collecting Course Videos Machine Learning (ML), Data Structure and Algorithms (DSA), and Calculus (CAL) from Coursera Course Concepts Annotation Extract candidate concepts from documents of video subtitles Label the candidates as course concept or not course concept Prerequisite Relation Annotation We manually annotate the prerequisite relations among the labeled course concepts. 27

Experimental Datasets Dataset Statistics 3 novel datasets extracted from Coursera ML: 5 Machine Learning courses DSA: 8 Data Structure and Algorithms courses CAL: 7 Calculus courses 28

Models Naïve Bayes (NB) Logistic Regression (LR) SVM with linear kernel (SVM) Random Forest (RF) Evaluation Results Metrics Precision (P) Recall (R) F1-Score (F1) 5-Fold Cross Validation 29

Comparison with Baselines Comparison Methods Hyponym Pattern Method (HPM) This method simply treat the concept pairs with IS-A relations as prerequisite concept pairs. Reference Distance (RD) This method was proposed by Liang et al. (2015). However, this method is only applicable to Wikipedia concepts. Supervised Relationship Identification (SRI) Wang et al. (2016) has employed several features to infer prerequisite relations of Wikipedia concepts in textbooks, including 3 Textbook features and 6 Wikipedia features. (1) T-SRI: only textbook features are used to train the classifier. (2) F-SRI: the original version, all features are used. 30

W-ML, W-DSA, W-CAL are subsets with Wikipedia Concepts Comparison with Baselines HPM achieves relatively high precision but low recall. T-SRI only considers relatively simple features Incorporating Wikipedia-based features achieves certain promotion in performance 31

Setting Each time, one feature or one group of features is removed We record the decrease of F1-score for each setting Comparison with Baselines Conclusion All the proposed features are useful Complexity Level Distance is most important Semantic Relatedness is least important 32

Outline Backgrounds Problem Definition Methods Experiments and Analysis Conclusion 33

Liangming Pan KEG, THU peterpan10211020@163.com 34