Bayesian networks in environmental applications

Similar documents
MISSISSIPPI STATE UNIVERSITY SUG FACULTY SALARY DATA BY COLLEGE BY DISCIPLINE 12 month salaries converted to 9 month

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

Transferable Indigenous Knowledge (TIK): Education Process and Policy

MISSISSIPPI STATE UNIVERSITY SUG FACULTY SALARY DATA BY COLLEGE BY DISCIPLINE

Wildlife, Fisheries, & Conservation Biology

PLANT SCIENCE/SOIL SCIENCE 2100 INTRODUCTION TO SOIL SCIENCE

CFAN 3504 Vertebrate Research Design and Field Survey Techniques

Mie University Graduate School of Bioresources Graduate School code:25

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

MAJORS, OPTIONS, AND DEGREES

Uncertainty concepts, types, sources

Lecture 1: Machine Learning Basics

CORE CURRICULUM BOT 601 (Foundations in Current Botany) Terrestrial Plants. 1 st Lecture/Presentation (all MS and PhD) 2 nd Lecture (PhD only)

Leadership Guide. Homeowner Association Community Forestry Stewardship Project. Natural Resource Stewardship Workshop

Michigan State University

MAR Environmental Problems & Solutions. Stony Brook University School of Marine & Atmospheric Sciences (SoMAS)

CS Machine Learning

M.S. in Environmental Science Graduate Program Handbook. Department of Biology, Geology, and Environmental Science

A Case Study: News Classification Based on Term Frequency

Environmental Science BA

Speech Recognition at ICSI: Broadcast News and beyond

On-Line Data Analytics

Major Degree Campus Accounting B.B.A. Athens Accounting M.Acc. Athens Adult Education Ed.D. Athens Adult Education Ed.S. Athens Adult Education M.Ed.

Word Segmentation of Off-line Handwritten Documents

Coral Reef Fish Survey Simulation

Lectures: Mondays, Thursdays, 1 pm 2:20 pm David Strong Building, Room C 103

FOREST ECOLOGY FOR 404 FALL SEMESTER 2013

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Benjamin Pohl, Yves Richard, Manon Kohler, Justin Emery, Thierry Castel, Benjamin De Lapparent, Denis Thévenin, Thomas Thévenin, Julien Pergaud

AGN 331 Soil Science Lecture & Laboratory Face to Face Version, Spring, 2012 Syllabus

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Rule Learning With Negation: Issues Regarding Effectiveness

Biology 10 - Introduction to the Principles of Biology Spring 2017

Lecture 1: Basic Concepts of Machine Learning

English for Specific Purposes World ISSN Issue 34, Volume 12, 2012 TITLE:

CSL465/603 - Machine Learning

BSM 2801, Sport Marketing Course Syllabus. Course Description. Course Textbook. Course Learning Outcomes. Credits.

Audit Of Teaching Assignments. An Integrated Analysis of Teacher Educational Background and Courses Taught October 2007

Irene Middle School. Pilot 1 MobilED Pilot 2

Evaluation of the Cocoa Beach Green Business Program

Reducing Features to Improve Bug Prediction

PH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.)

2015 Academic Program Review. School of Natural Resources University of Nebraska Lincoln

ENV , ENV rev 8/10 Environmental Soil Science Syllabus

A Grammar for Battle Management Language

Speech Emotion Recognition Using Support Vector Machine

USF Course Change Proposal Global Citizens Project

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

UNIVERSITY OF DAR ES SALAAM VACANCIES

Chemistry Senior Seminar - Spring 2016

Graduate Group in Geography

LOUISIANA HIGH SCHOOL RALLY ASSOCIATION

TIEE Teaching Issues and Experiments in Ecology - Volume 1, January 2004

Stakeholder Debate: Wind Energy

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

GRADUATE PROGRAM Department of Materials Science and Engineering, Drexel University Graduate Advisor: Prof. Caroline Schauer, Ph.D.

EGRHS Course Fair. Science & Math AP & IB Courses

Planting Seeds, Part 1: Can You Design a Fair Test?

Information Event Master Thesis

Scoring Notes for Secondary Social Studies CBAs (Grades 6 12)

Education for Sustainable Development Biodiversity Education Project

level 5 (6 SCQF credit points)

AC : TEACHING COLLEGE PHYSICS

Rule-based Expert Systems

Biological Sciences, BS and BA

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

HEALTH SERVICES ADMINISTRATION

Biological Sciences (BS): Ecology, Evolution, & Conservation Biology (17BIOSCBS-17BIOSCEEC)

Statistics and Data Analytics Minor

LEGO MINDSTORMS Education EV3 Coding Activities

UNEP-WCMC report on activities to ICRI

CONSERVATION BIOLOGY, B.S.

Learning Microsoft Office Excel

Report on organizing the ROSE survey in France

RIVERS AND LAKES. MATERIA: GEOGRAFIA AUTORI Stefania Poggio Angela Renzi CONSULENZA: Cristina Fontana I.C. COMO-LORA-LIPOMO

Infrared Paper Dryer Control Scheme

DOCTOR OF PHILOSOPHY IN ARCHITECTURE

AGN 331 Soil Science. Lecture & Laboratory. Face to Face Version, Spring, Syllabus

Automating the E-learning Personalization

Decision Making Lesson Review

ENVR 205 Engineering Tools for Environmental Problem Solving Spring 2017

Master of Science in Biology: A student manual. Faculty of Science and Bioengineering Sciences BIOLOGY DEPARTMENT

Running head: LISTENING COMPREHENSION OF UNIVERSITY REGISTERS 1

Zoology zoology.siu.edu

Ricopili: Postimputation Module. WCPG Education Day Stephan Ripke / Raymond Walters Toronto, October 2015

Notes For Agricultural Sciences Grade 12

Graduate Group in Geography

The Talent Development High School Model Context, Components, and Initial Impacts on Ninth-Grade Students Engagement and Performance

Pragmatic Use Case Writing

Kronos KnowledgePass TM

DEPARTMENT OF MOLECULAR AND CELL BIOLOGY

Facing our Fears: Reading and Writing about Characters in Literary Text

Navigating in a sea of risks: MARISCO, a conservation planning method used in risk robust and ecosystem based adaptation strategies

Human Emotion Recognition From Speech

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

INSTRUCTION MANUAL. Survey of Formal Education

Transcription:

Bayesian networks in environmental applications Pedro Aguilera, Antonio Fernández, Rosa Fernández, Rafael Rumí, Antonio Salmerón Dept. Plant Biology and Ecology - Dept. Statistics and Applied Mathematics. University of Almería May 12, 2011 Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 1/ 21

Outline Motivation Review paper Search for published papers Paper structure Main results Some on-going common projects Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 2/ 21

Motivation Motivation 1 Task 4.2.3: Applications of BNs in environment. Water quality / Water pollution. Factors influencing the transformation of the landscape. Species modelling. What is done in environmental modelling and how is it done w.r.t. BNs? Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 3/ 21

Motivation Motivation 2 Cooperation with P. Aguilera, from the Ecology Dept. Previous work: P. A. Aguilera, A. Fernández, F. Reche, R. Rumí (2010) Hybrid Bayesian Network Classifiers: Application to species distribution models. Environmental Modelling and Software(2010). PhD student Rosa Fernández: PhD thesis project. Basic foundations of what to do, how to do it, in terms of cooperation with experts. Create BNs for environmental modelling group (not just a specific cooperation) Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 4/ 21

Review paper Questions proposed Are BNs successfully applied in environmental sciences? How is it done? How should it be done? (mixing both points of view) Quantification of the results and identification of weak aspects of BNs application. Propose some conclusions and general ideas to be taken into account. Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 5/ 21

Search for literature General setting search Search the ISI web of Knowledge database Period: January 1990 - September 2010. Keyword: Bayesian Networks or Bayesian Belief Networks. Document: Papers or reviews. Total of documents: 1316 Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 6/ 21

Search for literature Number of papers 0 50 100 150 200 250 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 Year Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 7/ 21

Search for literature Scientific area Percentage Computer Sciences 27.3 Mathematics 20.9 Engineering 16.2 Health Sciences 15.0 Life Sciences 10.9 Sociology and Education 4.4 Environmental Sciences 4.2 Others 1.0 Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 8/ 21

Selection of papers Specific environmental-related search Selecting only papers inside Environmental Sciences, left out many papers interesting to the research. The search was extended to recover those papers, by including some other areas, such as Agriculture, Water Resources, Marine & Fresheater Biology,... Also extended to journals and papers outside the scope of ISI Web of Knowledge Scope. The list was refined manually to exclude papers not related to Environmental Sciences or with BNs. Output: 118 papers selected. Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 9/ 21

Classification of papers Category # Papers (Percentage) Environmental Sciencie & Ecology 32 (19.72%) Water Resources 29 (24.57%) Agriculture 8 (6.78%) Geology 6 (5.08%) Marine & Freshwater Biology 8 (6.78%) Biodiversity & Conservation 8 (6.78%) Forestry 5 (4.24%) Fisheries 5 (4.24%) Metereology and Atmospheric Sciences 3 (2.53%) Others 14 (11.86%) Total 118 (100%) Others means paper that talk about BNs but do not build or use any model. Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 10/ 21

Analysis of the papers Main goal of the paper Analyze the use/misuse of BNs in this field. Give some kind of guidelines for applying BNs properly. Methodology Define a general model implementation procedure: 1 Identify aim of the model 2 Preprocess of the data 3 Model learning 4 Validation Describe briefly each of these steps. Analyze the performance of the papers within each step of this procedure. Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 11/ 21

Analysis of the papers Main goal of the paper Analyze the use/misuse of BNs in this field. Give some kind of guidelines for applying BNs properly. Methodology Define a general model implementation procedure: 1 Identify aim of the model 2 Preprocess of the data 3 Model learning 4 Validation Describe briefly each of these steps. Analyze the performance of the papers within each step of this procedure. Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 11/ 21

Aim of the model Aim We define 4 different (no excluding) possible aims: Aim # Papers (Percentage) Inference 77 (74%) Characterize 5 (4.8%) Classification (general) 15 (14.4%) Classification (fixed) 6 (5.8%) Regression 1 (1.0% ) Total 104 (100%) Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 12/ 21

Data preprocessing Type of Variables 4 possible data types are considered Type # Papers (Percentage) Discrete 55 (52.9%) Discretized 31 (29.8%) Continuous 4 (3.8%) Hybrid 2 (1.9%) No Information 12 (11.5% ) Total 104 (100%) Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 13/ 21

Data preprocessing Discretization procedure Out of the 31 variables discretized, we consider several discretization procedures Discretization # Papers (Percentage) Experts 8 (25.8%) Software 1 (3.2%) Eq. Frequency 1 (3.2%) Min. Entropy 1 (3.2%) Deterministic Eq. 3 (9.7% ) Several 2 (6.5%) No Information 15 (48.4%) Total 31 (100%) Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 14/ 21

Model learning Model learning procedure How the model was built? Procedure # Papers (Percentage) Data 17 (16.3%) Experts 36 (34.6%) Both 44 (42.3%) No Information 7 (6.7% ) Total 104 (100%) Note that 76.9% of the papers use Experts in the modeling step, and 34.6& of them only Experts. Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 15/ 21

Validation Validation procedures How do we validate the model built? Procedure # Papers (Percentage) Train & Test 10 (9.6%) CV 7 (6.7%) Experts 14 (13.5%) Previous models 3 (2.9%) Sensitivity analysis 13 (12.5%) Goodness of fit 3 (2.9%) Several 14 (13.5%) No validation 40 (38.5% ) Total 104 (100%) Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 16/ 21

Software Software used How the model was built and the results obtained? Software # Papers (Percentage) Analytica 2 (1.9%) WINBUGS 1 (1%) B-course 1 (1%) Elvira 1 (1%) C++ 1 (1%) Genie 1 (1%) Hugin 22 (21.2%) Netica 36 (34,6%) SamIam 1 (1%) Several 5 (4.8%) No information 33 (31.7% ) Total 104 (100%) Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 17/ 21

Conclusions Surprising results Only 5% of the papers deal with continuous variables. Half of the papers do not tell how they discretized. Most of the papers use experts in the learning step (only 16.3% use our standard learning algorithms alone) Around 40% do not validate the model, and 12.5% use Sensitivity Analysis. Absence of missing data (I guess they just remove them). Software available reduces to Netica and Hugin. Lot of efficient algorithms not really available to the practitioners. Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 18/ 21

Conclusions General conclusions Difficult to understand the papers: Authors do not state what the model is built for. The word Expert solves many problems. However, they do not state how the combination of Experts and algorithms is carried out. Authors do not use the same vocabulary, e.g. prediction - inference, discrete - discretized,... Authors do not make any effort to make their work reproducible This review paper is to show the actual application of BNs, but also to act as a general guideline to environmental researchers about how to proceed, and advise them different ways to do each step. Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 19/ 21

Current cooperation Research contract with EGMASA (Environmental Management Corporation) to determine the influence of change driving forces in the human services available in the basins of river Adra and river Nacimiento. Discretization failed, CLG model used (Model learning finished - Future paper) Research project proposal to compute a composite indicator about the human services that a National Park provides to the society, and identification of its main driving forces. (Submitted for evaluation). Master thesis about water quality (green water - blue water) using discretized BNs (done), and proposal to use the MTE model (future paper). Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 20/ 21

Thanks for your attention. Questions? Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 21/ 21