Bayesian networks in environmental applications Pedro Aguilera, Antonio Fernández, Rosa Fernández, Rafael Rumí, Antonio Salmerón Dept. Plant Biology and Ecology - Dept. Statistics and Applied Mathematics. University of Almería May 12, 2011 Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 1/ 21
Outline Motivation Review paper Search for published papers Paper structure Main results Some on-going common projects Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 2/ 21
Motivation Motivation 1 Task 4.2.3: Applications of BNs in environment. Water quality / Water pollution. Factors influencing the transformation of the landscape. Species modelling. What is done in environmental modelling and how is it done w.r.t. BNs? Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 3/ 21
Motivation Motivation 2 Cooperation with P. Aguilera, from the Ecology Dept. Previous work: P. A. Aguilera, A. Fernández, F. Reche, R. Rumí (2010) Hybrid Bayesian Network Classifiers: Application to species distribution models. Environmental Modelling and Software(2010). PhD student Rosa Fernández: PhD thesis project. Basic foundations of what to do, how to do it, in terms of cooperation with experts. Create BNs for environmental modelling group (not just a specific cooperation) Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 4/ 21
Review paper Questions proposed Are BNs successfully applied in environmental sciences? How is it done? How should it be done? (mixing both points of view) Quantification of the results and identification of weak aspects of BNs application. Propose some conclusions and general ideas to be taken into account. Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 5/ 21
Search for literature General setting search Search the ISI web of Knowledge database Period: January 1990 - September 2010. Keyword: Bayesian Networks or Bayesian Belief Networks. Document: Papers or reviews. Total of documents: 1316 Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 6/ 21
Search for literature Number of papers 0 50 100 150 200 250 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 Year Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 7/ 21
Search for literature Scientific area Percentage Computer Sciences 27.3 Mathematics 20.9 Engineering 16.2 Health Sciences 15.0 Life Sciences 10.9 Sociology and Education 4.4 Environmental Sciences 4.2 Others 1.0 Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 8/ 21
Selection of papers Specific environmental-related search Selecting only papers inside Environmental Sciences, left out many papers interesting to the research. The search was extended to recover those papers, by including some other areas, such as Agriculture, Water Resources, Marine & Fresheater Biology,... Also extended to journals and papers outside the scope of ISI Web of Knowledge Scope. The list was refined manually to exclude papers not related to Environmental Sciences or with BNs. Output: 118 papers selected. Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 9/ 21
Classification of papers Category # Papers (Percentage) Environmental Sciencie & Ecology 32 (19.72%) Water Resources 29 (24.57%) Agriculture 8 (6.78%) Geology 6 (5.08%) Marine & Freshwater Biology 8 (6.78%) Biodiversity & Conservation 8 (6.78%) Forestry 5 (4.24%) Fisheries 5 (4.24%) Metereology and Atmospheric Sciences 3 (2.53%) Others 14 (11.86%) Total 118 (100%) Others means paper that talk about BNs but do not build or use any model. Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 10/ 21
Analysis of the papers Main goal of the paper Analyze the use/misuse of BNs in this field. Give some kind of guidelines for applying BNs properly. Methodology Define a general model implementation procedure: 1 Identify aim of the model 2 Preprocess of the data 3 Model learning 4 Validation Describe briefly each of these steps. Analyze the performance of the papers within each step of this procedure. Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 11/ 21
Analysis of the papers Main goal of the paper Analyze the use/misuse of BNs in this field. Give some kind of guidelines for applying BNs properly. Methodology Define a general model implementation procedure: 1 Identify aim of the model 2 Preprocess of the data 3 Model learning 4 Validation Describe briefly each of these steps. Analyze the performance of the papers within each step of this procedure. Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 11/ 21
Aim of the model Aim We define 4 different (no excluding) possible aims: Aim # Papers (Percentage) Inference 77 (74%) Characterize 5 (4.8%) Classification (general) 15 (14.4%) Classification (fixed) 6 (5.8%) Regression 1 (1.0% ) Total 104 (100%) Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 12/ 21
Data preprocessing Type of Variables 4 possible data types are considered Type # Papers (Percentage) Discrete 55 (52.9%) Discretized 31 (29.8%) Continuous 4 (3.8%) Hybrid 2 (1.9%) No Information 12 (11.5% ) Total 104 (100%) Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 13/ 21
Data preprocessing Discretization procedure Out of the 31 variables discretized, we consider several discretization procedures Discretization # Papers (Percentage) Experts 8 (25.8%) Software 1 (3.2%) Eq. Frequency 1 (3.2%) Min. Entropy 1 (3.2%) Deterministic Eq. 3 (9.7% ) Several 2 (6.5%) No Information 15 (48.4%) Total 31 (100%) Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 14/ 21
Model learning Model learning procedure How the model was built? Procedure # Papers (Percentage) Data 17 (16.3%) Experts 36 (34.6%) Both 44 (42.3%) No Information 7 (6.7% ) Total 104 (100%) Note that 76.9% of the papers use Experts in the modeling step, and 34.6& of them only Experts. Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 15/ 21
Validation Validation procedures How do we validate the model built? Procedure # Papers (Percentage) Train & Test 10 (9.6%) CV 7 (6.7%) Experts 14 (13.5%) Previous models 3 (2.9%) Sensitivity analysis 13 (12.5%) Goodness of fit 3 (2.9%) Several 14 (13.5%) No validation 40 (38.5% ) Total 104 (100%) Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 16/ 21
Software Software used How the model was built and the results obtained? Software # Papers (Percentage) Analytica 2 (1.9%) WINBUGS 1 (1%) B-course 1 (1%) Elvira 1 (1%) C++ 1 (1%) Genie 1 (1%) Hugin 22 (21.2%) Netica 36 (34,6%) SamIam 1 (1%) Several 5 (4.8%) No information 33 (31.7% ) Total 104 (100%) Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 17/ 21
Conclusions Surprising results Only 5% of the papers deal with continuous variables. Half of the papers do not tell how they discretized. Most of the papers use experts in the learning step (only 16.3% use our standard learning algorithms alone) Around 40% do not validate the model, and 12.5% use Sensitivity Analysis. Absence of missing data (I guess they just remove them). Software available reduces to Netica and Hugin. Lot of efficient algorithms not really available to the practitioners. Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 18/ 21
Conclusions General conclusions Difficult to understand the papers: Authors do not state what the model is built for. The word Expert solves many problems. However, they do not state how the combination of Experts and algorithms is carried out. Authors do not use the same vocabulary, e.g. prediction - inference, discrete - discretized,... Authors do not make any effort to make their work reproducible This review paper is to show the actual application of BNs, but also to act as a general guideline to environmental researchers about how to proceed, and advise them different ways to do each step. Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 19/ 21
Current cooperation Research contract with EGMASA (Environmental Management Corporation) to determine the influence of change driving forces in the human services available in the basins of river Adra and river Nacimiento. Discretization failed, CLG model used (Model learning finished - Future paper) Research project proposal to compute a composite indicator about the human services that a National Park provides to the society, and identification of its main driving forces. (Submitted for evaluation). Master thesis about water quality (green water - blue water) using discretized BNs (done), and proposal to use the MTE model (future paper). Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 20/ 21
Thanks for your attention. Questions? Aguilera,Fernández,Fernández,Rumí,Salmerón Bayesian networks in environmental applications 21/ 21