KNIME & Teacher Bots: From Workflows to Micro Services

Similar documents
Lecture 1: Machine Learning Basics

Python Machine Learning

Best Practices in Internet Ministry Released November 7, 2008

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Software Maintenance

CREATING SHARABLE LEARNING OBJECTS FROM EXISTING DIGITAL COURSE CONTENT

Linking Task: Identifying authors and book titles in verbose queries

Rule Learning With Negation: Issues Regarding Effectiveness

A Case Study: News Classification Based on Term Frequency

Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform

Rule Learning with Negation: Issues Regarding Effectiveness

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Assignment 1: Predicting Amazon Review Ratings

Introduction to Simulation

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Spring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Reducing Features to Improve Bug Prediction

Evolution of Symbolisation in Chimpanzees and Neural Nets

Test Effort Estimation Using Neural Network

Expert locator using concept linking. V. Senthil Kumaran* and A. Sankar

SARDNET: A Self-Organizing Feature Map for Sequences

Automating the E-learning Personalization

Generative models and adversarial training

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Mexico (CONAFE) Dialogue and Discover Model, from the Community Courses Program

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Identification of Opinion Leaders Using Text Mining Technique in Virtual Community

Forget catastrophic forgetting: AI that learns after deployment

Learning Methods in Multilingual Speech Recognition

Model Ensemble for Click Prediction in Bing Search Ads

1 Instructional Design Website: Making instruction easy for HCPS Teachers Henrico County, Virginia

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Probability estimates in a scenario tree

CS Machine Learning

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Evolutive Neural Net Fuzzy Filtering: Basic Description

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

MYCIN. The MYCIN Task

Universidade do Minho Escola de Engenharia

Introduction to Causal Inference. Problem Set 1. Required Problems

Axiom 2013 Team Description Paper

Applications of memory-based natural language processing

Calibration of Confidence Measures in Speech Recognition

Learning From the Past with Experiment Databases

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Time series prediction

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Reinforcement Learning by Comparing Immediate Reward

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

STUDENT MOODLE ORIENTATION

Applications of data mining algorithms to analysis of medical data

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

Class Numbers: & Personal Financial Management. Sections: RVCC & RVDC. Summer 2008 FIN Fully Online

Laboratorio di Intelligenza Artificiale e Robotica

Education as a Means to Achieve Valued Life Outcomes By Carolyn Das

INNOVATIONS IN TEACHING Using Interactive Digital Images of Products to Teach Pharmaceutics

Attributed Social Network Embedding

What is PDE? Research Report. Paul Nichols

Learning Methods for Fuzzy Systems

IMGD Technical Game Development I: Iterative Development Techniques. by Robert W. Lindeman

Artificial Neural Networks written examination

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Introduction to Modeling and Simulation. Conceptual Modeling. OSMAN BALCI Professor

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Probabilistic Latent Semantic Analysis

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Modeling function word errors in DNN-HMM based LVCSR systems

16.1 Lesson: Putting it into practice - isikhnas

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

FAILURE PREDICTION. An Application in the Railway Industry

AQUA: An Ontology-Driven Question Answering System

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Evaluating Usability in Learning Management System Moodle

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Truth Inference in Crowdsourcing: Is the Problem Solved?

A Pipelined Approach for Iterative Software Process Model

A Vector Space Approach for Aspect-Based Sentiment Analysis

Comment-based Multi-View Clustering of Web 2.0 Items

Human Emotion Recognition From Speech

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

What is a Mental Model?

Visit us at:

The Nature of Exploratory Testing

Computerized Adaptive Psychological Testing A Personalisation Perspective

Issues in the Mining of Heart Failure Datasets

Renaissance Learning 32 Harbour Exchange Square London, E14 9GE +44 (0)

Switchboard Language Model Improvement with Conversational Data from Gigaword

Detailed Instructions to Create a Screen Name, Create a Group, and Join a Group

Evaluation of Learning Management System software. Part II of LMS Evaluation

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

Resource Package. Community Action Day

Transcription:

KNIME & Teacher Bots: From Workflows to Micro Services Kathrin Melcher Vincenzo Tursi Rosaria Silipo Phil Winters 2018 KNIME AG. All Right Reserved.

The History of Bots 1950 A.L.I.C.E. Artificial Linguistic Internet Computer Entity. Alan Turing 2

Bots A bot is software designed to automate the kinds of tasks you would usually do on your own (or another human would do for you). Search Bots Teaching Bots Communication Bots Personal Assistant Bots Data & Developer Bots Team Bots 3

The Human Internet Search Process Ask Question Translate to Keyword(s) Not Yet Keywords Categories Index Best Question Answer? None Yes X 4

The Search Challenge: Context A Jewish Holiday Paul Shapiro Professional Search marketer (and huge KNIME fan ) Optimizing for Hanukkah: Sometimes it s still strings, not things Hannukah Chanukah Hanukah Channukah Chanuka Chanukkah Hanukka Chanukka Hannukkah Ḥanukkah Channuka Festival of Lights Feast of Dedication 5

A Data Science Project Training set classes 1 2 1 0 4 2 1 0 2 3 1 5 6 2 0 0 6 2 0 0 2 3 1 1 1 5 6 2 3 3 0 3 Data Preparation Training Test set 1 2 1 0 4 2 1 0 2 3 1 5 6 2 0 0 6 2 0 0 2 3 1 1 1 5 6 2 3 3 0 3 Apply Scoring 6

Reality Check classes Training set 2 1 0 2 1 2 3 1 5 6 6 2 0 2 3 1 5 2 3 0 Data Preparation Training Test set 2 1 0 2 1 2 1 5 6 2 0 6 2 0 2 3 1 1 5 6 3 3 0 Apply Scoring 7

Ontology Definition Ontology is the philosophical study of the nature of being, becoming, existence, or reality, as well as the basic categories of being and their relations Introduced by Greek philosophers (Parmenides) Parmenides was among the first to propose an ontological characterization of the fundamental nature of reality In computer science and information science, an ontology is a formal naming and definition of the types, properties, and interrelationships of entities 8

Ontology Example Uberon-an-integrative-multi-species-anatomy-ontology-gb-2012-13-1-r5-2.jpg Relationship of major animal lineages with indication of how long ago these animals shared a common ancestor. On the left, important organs are shown, which allows us to determine how long ago these may have evolved. 9

The Specialist Topic Search Process Ask Question Translate to Keyword(s) Not Yet Keywords Categories Special Ontology Medicine symptoms diseases treatments Best Question Answer? Yes None Pharmaceutical drugs dosages allergies X 10

Creating an Ontology: Simple! Build Context Ask Question Translate to Keyword(s) Not Yet Keywords Categories Ontology Best Question Answer? None Yes X 11

A Real World Ontology Need: I want to learn KNIME I have a question.. Terms Concepts Context Background Depth Breadth Language E-learning Forum Blog Other 12

Our Own Ontology (20 Classes) From e-learning Course From other Resources From Experience Installation Data Access ETL Mining Control Deployment DataViz Use Cases Text Processing Big Data Server Image Processing Reporting Development Integration Optimizing KNIME Life Science Announcement Bug Legal 13

Active Learning Cycle 1 st attempt Class Labels Training Extract most uncertain predictions Class Label Extension Training Set [Forum Questions] Re-labeling 14

The Human Learn KNIME Process Ask Question Translate to Keyword(s) Not Yet Keywords Categories Index Best Question Answer? None Yes X 15

Teacher Bot Emil : A Bot to help Learn KNIME Ask Question Teaching bot Translate to Keyword(s) Not Yet Best Question Answer? None Keywords Categories KNIME Ontology Yes Email X 16

Teacher Bot Emil Question Emil, our Teacher Bot! 17

Teacher Bot Emil Category 18

Teacher Bot Emil Links 19

Teacher Bot Emil Links 20

Teacher Bot Emil : A Bot to help Learn KNIME Ask Question Teaching bot Translate to Keyword(s) Not Yet Best Question Answer? None Keywords Categories KNIME Ontology Yes Email X 21

Teacher Bot Emil Goodbye! 22

Teacher Bot Emil ML Find Resources Update Datasets 23

Creation of an Initial Ontology Ask Question Teaching bot Translate to Keyword(s) Not Yet Keywords Categories KNIME Ontology Training Training Set Initial Labelling Best Question Answer? None Yes Email X 24

Web Crawling KNIME Resources Only three nodes 25

Step 0 Initial Labeling Resources Extract Keywords Distance measure Find Closest Resource Labels from Ontology Training Set v0 Forum Extract Keywords Ontology Labeling a Training Set based on Distance (and no Clue) 26

Step 0 Initial Labeling Labeling a Training Set based on Distance (and no Clue) Chi-Square N-gram Tanimoto 27

Step 1 - Training 28

Creation of an Initial Ontology Ask Question Teaching bot Translate to Keyword(s) Not Yet Keywords Categories KNIME Ontology Training Training Set Initial Labelling Best Question Answer? None Yes Email X 29

Adding Active Learning to the Cycle Ask Question Teaching bot Translate to Keyword(s) Not Yet Keywords Categories KNIME Ontology Training Training Set Initial Labelling Best Question Answer? None Active Learning Cycle Yes Email X 30

Active Learning Random Forest 10% most uncertain classes Diff. between three top probabilities for each predicted class Training Subset chosen to be labeled Active Learning Cycle Training Set Extend Initial Labelling Based on Distance k-nn (k=1) Labeling Predicted Classes or Something Else Category Assign Labeling Labeling manually all Something Else Category Define 31

Adding Active Learning to the Cycle Ask Question Teaching bot Translate to Keyword(s) Not Yet Keywords Categories KNIME Ontology Training Training Set Initial Labelling Best Question Answer? None 10% most uncertain Active Learning Cycle Yes Email X Category Assign 32

Category Assign Category 33

Step 2a - Category Assign Reading Data Update Datasets 34

Adding Active Learning to the Cycle Ask Question Teaching bot Translate to Keyword(s) Not Yet Keywords Categories KNIME Ontology Training Training Set Initial Labelling Best Question Answer? None 10% most uncertain Active Learning Cycle Yes Email Category Assign Category Define 35

Step 2b - Category Define Adding Label 36

Step 2b - Category Define 37

Adding Active Learning to the Cycle Ask Question Teaching bot Translate to Keyword(s) Not Yet Keywords Categories KNIME Ontology Training Training Set Initial Labelling Best Question Answer? Yes None Email 10% lowest probability Active Learning Cycle Category Assign Extend Category Define 38

Step 3 Extend with k-nn Expert has labelled uncertain samples k-nn (k=1) extends the expert classes to their neighbor sample 39

Step 3 - Extend with k-nn Chi-Square k-nn k=1 40

Adding Active Learning to the Cycle Ask Question Teaching bot Translate to Keyword(s) Not Yet Keywords Categories KNIME Ontology Training Training Set Initial Labelling Best Question Answer? Yes None Email 10% lowest probability Active Learning Cycle Category Assign Extend Category Define 41

Combining the Teaching Bot and the Active Learning Cycle Ask Question Teaching bot Translate to Keyword(s) Not Yet Keywords Categories KNIME Ontology Training Training Set Initial Labelling Best Question Answer? Yes None Email 10% lowest probability Active Learning Cycle Category Assign Extend Category Define 42

Changes in Training Set AL Iteration 0 AL Iteration 1 AL Iteration 2 43

Answer Evolution AL # Input Dataset Output Ver Accuracy Timestamp 0 Traning_Set_v0 Random_Forest_v0 0.0 0.59 19/2/2018 1 Traning_Set_v1 Random_Forest_v1 1.0 0.56 23/2/2018 2 Traning_Set_v2 Random_Forest_v2 2.0 0.52 26/2/2018 Version 0 Version 2 44

Combining the Teaching Bot and the Active Learning Cycle Ask Question Teaching bot Translate to Keyword(s) Not Yet Keywords Categories KNIME Ontology Training Training Set Initial Labelling Best Question Answer? Yes None Email 10% lowest probability Active Learning Cycle Category Assign Extend Category Define 45

From building one time to reusing Components: MicroServices Ask Question Teaching bot Translate to Keyword(s) Not Yet Keywords Categories KNIME Ontology Training Training Set Initial Labelling Best Question Answer? Yes None Email 10% lowest probability Active Learning Cycle Category Assign Extend Category Define 46

Microservices 47

Microservices - Converting reusable Subflows into Microservices Metanode Templates Microservices 48

Combining the Teaching Bot and the Active Learning Cycle Ask Question Teaching bot Translate to Keyword(s) Not Yet Keywords Categories KNIME Ontology Training Training Set Initial Labelling Best Question Answer? Yes None Email 10% lowest probability Active Learning Cycle Category Assign Extend Category Define 49

What we have tried to show. Creating a basic bot Building an Ontology with Active Learning Automating the process Converting reusable subflows into micro services 50

What did we learn? KNIME forum is used as educational tool Support is search Keyword extraction is a plus with respect to just keyword search Re-adjust your class system (and goals) from time to time Accuracy is not all New educational page on DataViz Optimizing KNIME -> Maybe another blog post? 51

How could this be extended? Improve text processing phase (tagging) Use word embedding Problem: Document Vector leads to big and sparse feature spaces Solution: Train a vector representation for each word using the Word2Vec Use the Keras integration to replace the Random Forest with a Neural Network which uses LSTM layers. Investigate the role of parameters: 10% of uncertain K=1 in k-nearest Neighbors Forgetting functions? Add speech recognition? KNIME YouTube videos as additional resource 52

Where to find more Presentation available immediately Series of blog posts in the next weeks Workflows on EXAMPLE Server Collection of blog posts in a whitepaper 53

KNIME & Teacher Bots: From Workflows to Micro Services Kathrin Melcher Vincenzo Tursi Rosaria Silipo Phil Winters 2018 KNIME AG. All Right Reserved.

The KNIME trademark and logo and OPEN FOR INNOVATION trademark are used by KNIME AG under license from KNIME GmbH, and are registered in the United States. KNIME is also registered in Germany. 55