Value Oriented Big Data Processing with Applications

Similar documents
Modeling user preferences and norms in context-aware systems

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

Visual CP Representation of Knowledge

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

Full text of O L O W Science As Inquiry conference. Science as Inquiry

Objective Research? Information Literacy Instruction Perspectives

MYCIN. The MYCIN Task

Telekooperation Seminar

Requirements-Gathering Collaborative Networks in Distributed Software Projects

DOUBLE DEGREE PROGRAM AT EURECOM. June 2017 Caroline HANRAS International Relations Manager

AQUA: An Ontology-Driven Question Answering System

STA 225: Introductory Statistics (CT)

The College Board Redesigned SAT Grade 12

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

AUTHORING E-LEARNING CONTENT TRENDS AND SOLUTIONS

10.2. Behavior models

Knowledge-Based - Systems

Probability and Statistics Curriculum Pacing Guide

Examining the Structure of a Multidisciplinary Engineering Capstone Design Program

UNDERSTANDING DECISION-MAKING IN RUGBY By. Dave Hadfield Sport Psychologist & Coaching Consultant Wellington and Hurricanes Rugby.

Automating the E-learning Personalization

Data Fusion Models in WSNs: Comparison and Analysis

OFFICE SUPPORT SPECIALIST Technical Diploma

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Ontologies vs. classification systems

What is Thinking (Cognition)?

Online Marking of Essay-type Assignments

CNS 18 21th Communications and Networking Simulation Symposium

A Case Study: News Classification Based on Term Frequency

Evidence for Reliability, Validity and Learning Effectiveness

Assignment 1: Predicting Amazon Review Ratings

Space Travel: Lesson 2: Researching your Destination

On-Line Data Analytics

Multi-Lingual Text Leveling

Commanding Officer Decision Superiority: The Role of Technology and the Decision Maker

INSTRUCTIONAL FOCUS DOCUMENT Grade 5/Science

Towards Semantic Facility Data Management

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Circuit Simulators: A Revolutionary E-Learning Platform

EDITORIAL: ICT SUPPORT FOR KNOWLEDGE MANAGEMENT IN CONSTRUCTION

GACE Computer Science Assessment Test at a Glance

Dublin City Schools Broadcast Video I Graded Course of Study GRADES 9-12

Cross-Media Knowledge Extraction in the Car Manufacturing Industry

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

Applications of memory-based natural language processing

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

WikiAtoms: Contributions to Wikis as Atomic Units

arxiv: v1 [cs.cl] 2 Apr 2017

Computer Science PhD Program Evaluation Proposal Based on Domain and Non-Domain Characteristics

Beyond the Blend: Optimizing the Use of your Learning Technologies. Bryan Chapman, Chapman Alliance

Natural Language Processing. George Konidaris

Development of an IT Curriculum. Dr. Jochen Koubek Humboldt-Universität zu Berlin Technische Universität Berlin 2008

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

A Context-Driven Use Case Creation Process for Specifying Automotive Driver Assistance Systems

Embedded Real-Time Systems

Technology in the Classroom: The Impact of Teacher s Technology Use and Constructivism

Focus on. Learning THE ACCREDITATION MANUAL 2013 WASC EDITION

Technical Manual Supplement

Training Priorities identified from Training Needs Analysis survey (January 2015)

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

Oakland Unified School District English/ Language Arts Course Syllabus

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Distributed Weather Net: Wireless Sensor Network Supported Inquiry-Based Learning

Document number: 2013/ Programs Committee 6/2014 (July) Agenda Item 42.0 Bachelor of Engineering with Honours in Software Engineering

Professional Learning Suite Framework Edition Domain 3 Course Index

What the National Curriculum requires in reading at Y5 and Y6

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

CREATING SHARABLE LEARNING OBJECTS FROM EXISTING DIGITAL COURSE CONTENT

Evolutive Neural Net Fuzzy Filtering: Basic Description

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

Using dialogue context to improve parsing performance in dialogue systems

MGT/MGP/MGB 261: Investment Analysis

Copyright Corwin 2015

JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD (410)

Abstractions and the Brain

Journal title ISSN Full text from

Rubric for Scoring English 1 Unit 1, Rhetorical Analysis

Postprint.

Education the telstra BLuEPRint

FIGURE IT OUT! MIDDLE SCHOOL TASKS. Texas Performance Standards Project

Concept Acquisition Without Representation William Dylan Sabo

Self Study Report Computer Science

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

VIEW: An Assessment of Problem Solving Style

What is a Mental Model?

Test Blueprint. Grade 3 Reading English Standards of Learning

National Survey of Student Engagement The College Student Report

A Framework for Customizable Generation of Hypertext Presentations

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Top US Tech Talent for the Top China Tech Company

Seminar - Organic Computing

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

European Cooperation in the field of Scientific and Technical Research - COST - Brussels, 24 May 2013 COST 024/13

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Metadata of the chapter that will be visualized in SpringerLink

A Genetic Irrational Belief System

The Strong Minimalist Thesis and Bounded Optimality

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Transcription:

Wright State University CORE Scholar Kno.e.sis Publications The Ohio Center of Excellence in Knowledge- Enabled Computing (Kno.e.sis) 2015 Value Oriented Big Data Processing with Applications Krishnaprasad Thirunarayan Wright State University - Main Campus, t.k.prasad@wright.edu Follow this and additional works at: http://corescholar.libraries.wright.edu/knoesis Part of the Bioinformatics Commons, Communication Technology and New Media Commons, Databases and Information Systems Commons, OS and Networks Commons, and the Science and Technology Studies Commons Repository Citation Thirunarayan, K. (2015). Value Oriented Big Data Processing with Applications.. http://corescholar.libraries.wright.edu/knoesis/1087 This Presentation is brought to you for free and open access by the The Ohio Center of Excellence in Knowledge-Enabled Computing (Kno.e.sis) at CORE Scholar. It has been accepted for inclusion in Kno.e.sis Publications by an authorized administrator of CORE Scholar. For more information, please contact corescholar@www.libraries.wright.edu.

Value-Oriented Big Data Processing with Applications Krishnaprasad Thirunarayan (T. K. Prasad) Kno.e.sis Ohio Center of Excellence in Knowledge-enabled Computing

Outline 5 V s of Big Data Research Semantic Perception for Scalability and Decision Making Lightweight semantics to manage heterogeneity Cost-benefit trade-off continuum Hybrid Knowledge Representation and Reasoning Anomaly, Correlation, Causation

Gartner's 2014 Hype Cycle for Emerging Technologies

5V s of Big Data Research Volume Velocity Variety Veracity Value Big Data => Smart Data

Volume : Assorted Examples Check engine light analogy

Volume : Challenge Sensors (due to IoT) offer unprecedented access to granular data that can be transformed into powerful knowledge. Without an integrated business analytics platform, though, sensor data will just add to information overload and escalating noise. http://www.sas.com/en_us/insights/big-data/internet-of-things.html

Volume : (1) Semantic Perception

Weather Use Case

Parkinson s Disease Use Case

Heart Failure Use Case

Asthma Use Case

Traffic Use Case

Heterogeneity in a Physical-Cyber-Social System 511.org Slow moving traffic Link Description 511.org Scheduled Event Traffic Monitoring 511.org Schedule Information Scheduled Event

Traffic Data Analysis Histogram of speed values collected from June 1 st 12:00 AM to June 2 nd 12:00 AM Histogram of travel time values collected from June 1 st 12:00 AM to June 2 nd 12:00 AM 16

Relating Sensor Time Series Data to Scheduled/Unscheduled Events Multiple events interact with each other Varying influence Image credit: http://traffic.511.org/index 17

Heterogeneity in a Physical-Cyber-Social System

Volume : (2) Exploiting Embarrassing Parallelism

Volume with a Twist Resource-constrained reasoning on mobiledevices

Cory Henson s Thesis Statement Machine perception can be formalized using semantic web technologies to derive abstractions from sensor data using background knowledge on the Web, and efficiently executed on resourceconstrained devices.

Perception Cycle* that exploits background knowledge / domain models Abstracting raw data for human comprehension 1 Explanation Observe Property Perceive Feature Prior Knowledge Discrimination 2 Focus generation for disambiguation and action (incl. human in the loop) * based on Neisser s cognitive model of perception

Prior knowledge on the Web W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph

Prior knowledge on the Web W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph

Virtues of Our Approach to Semantic Perception Blends simplicity, effectiveness, and scalability. Declarative specification of explanation and discrimination; With contemporary relevant applications (e.g., healthcare); Using improved encodings/algorithms that are significant (asymptotic order of magnitude gain) and necessary ( tractable resource needs for typical problem sizes); and Prototyped using extant PCs and mobile devices.

Evaluation on a mobile device Efficiency Improvement Problem size increased from 10 s to 1000 s of nodes Time reduced from minutes to milliseconds Complexity growth reduced from polynomial to linear O(n 3 ) < x < O(n 4 ) O(n)

Variety Syntactic and semantic heterogeneity in textual and sensor data, in (legacy) materials data in (long tail) geosciences data

Variety (What?): Materials/Geosciences Use Case Structured Data (e.g., relational) Semi-structured, Heterogeneous Documents (e.g., Publications and technical specs, which usually include text, numerics, maps and images) Tabular data (e.g., ad hoc spreadsheets and complex tables incorporating irregular entries)

Variety (How?): (1) Granularity of Semantics & Applications Lightweight semantics: File and document-level annotation to enable discovery and sharing Richer semantics: Data-level annotation and extraction for semantic search and summarization Fine-grained semantics: Data integration, interoperability and reasoning in Linked Open Data Cost-benefit trade-off continuum

Variety (What?) : Sensor Data Use Case Develop/learn domain models to exploit complementary and corroborative information to obtain improved situational awareness To relate patterns in multimodal data to situation To integrate machine sensed and human sensed data Example Application: SemSOS : Semantic Sensor Observation Service

Variety: (2) Hybrid KRR Blending data-driven models with declarative knowledge Data-driven: Bottom-up, correlation-based, statistical Declarative: Top-down, causal/taxonomical, logical Refine structure to better estimate parameters E.g., Traffic Analytics using PGMs + KBs

Variety (Why?): Hybrid KRR Data can help compensate for our overconfidence in our own intuitions and reduce the extent to which our desires distort our perceptions. -- David Brooks of New York Times However, inferred correlations require clear justification that they are not coincidental, to inspire confidence.

Variety (How?): Hybrid KRR Blending data-driven models with declarative knowledge Structure learning from data Enhance structure By refining direction of dependency Disambiguation Filtering By augmenting with taxonomy nomenclature and relationships Improved Parameter learning from data E.g., Traffic Analytics using PGMs + KBs

Anomalies, Correlations, Causation Due to common cause or origin E.g., Planets: Copernicus > Kepler > Newton > Einstein Coincidental due to data skew or misrepresentation E.g., Tall policy claims made by politicians! Coincidental new discovery E.g., Hurricanes and Strawberry Pop-Tarts Sales Strong correlation vs causation E.g., Spicy foods vs Helicobacter Pyroli : Stomach Ulcers Anomalous and accidental E.g., CO 2 levels and Obesity Correlation turning into causations E.g., Pavlovian learning: conditional reflex

Veracity Lot of existing work on Trust ontologies, metrics and models, and on Provenance tracking Homogeneous data: Statistical techniques Heterogeneous data: Semantic models

Veracity: Confession of sorts! Trust is well-known, but is not well-understood. The utility of a notion testifies not to its clarity but rather to the philosophical importance of clarifying it. -- Nelson Goodman (Fact, Fiction and Forecast, 1955)

(More on) Value Learning domain models from big data for prediction E.g., Harnessing Twitter "Big Data" for Automatic Emotion Identification

(More on) Value Discovering gaps and enriching domain models using data E.g., Data driven knowledge acquisition method for domain knowledge enrichment in the healthcare

Conclusions Glimpse of our research organized around the 5 V s of Big Data Discussed role in harnessing Value Semantic Perception (Volume) Continuum of Semantic models to manage Heterogeneity (Variety) Hybrid KRR: Probabilistic + Logical (Variety) Continuous Semantics (Velocity) Trust Models (Veracity)

Kno.e.sis: Ohio Center of Excellence in Knowledge-enabled Computing Thank You http://knoesis.wright.edu/tkprasad Krishnaprasad Thirunarayan, Amit P. Sheth: Semantics-Empowered Big Data Processing with Applications. AI Magazine 36(1): 39-54 (2015) Special Thanks to: Pramod Anantharam, Dr. Cory Henson Department of Computer Science and Engineering Wright State University, Dayton, Ohio, USA