Hybrid Soft Computing Challenges, Perspectives and Applications Ajith Abraham Norwegian Center of Excellence, Norwegian University of Science and Technology, Trondheim Norway http://www.softcomputing.net ajith.abraham@ieee.org
Presentation Overview Soft Computing Ingredients : Neural networks, fuzzy inference systems, evolutionary algorithms, probabilistic reasoning etc. Need for Hybridization? Engineering Hybrid Architectures Applications - E-commerce (business intelligence) - Network Security - Data Mining Conclusions
What is Intelligence? Intelligence is what we use when we don t know what to do. Intelligence requires an ability to Perform complex tasks Recognize complex patterns Solve unseen problems Learn from experience Learn from instruction Use Natural Language Be aware of self (consciousness) Use tools
Parking a Car Generally, a car can be parked rather easily. It it were specified to within, say, a fraction of a millimeter, it would take hours of maneuvering and precise measurements of distance and angular position to solve the problem. High precision carries a high cost. The challenge is to exploit the tolerance for imprecision by devising methods of computation which lead to an acceptable solution at low cost. This, in essence, is the guiding principle of modern intelligent computing.
Intelligent Systems Ingredients FL : Algorithms for dealing with imprecision and uncertainty RS: Handling uncertainty arising from the granularity in the domain of discourse NC : Machinery for function approximation EA/SI : Algorithms for global search and optimization FL, RS, NC and EA/SI are Complementary rather than Competitive
Computational Theory of Perceptions Humans have remarkable capability to perform a wide variety of physical and mental tasks without any measurement and computations. Reflecting the finite ability of the sensory organs and (finally the brain) to resolve details, Perceptions are inherently imprecise. Provides capability to compute and reason with perception based information
How to Model Perceptions Perceptions are both fuzzy and granular Boundaries of perceived classes are unsharp Values of attributes are granulated Example: Granules in age: very young, young, not so old, Perceptions are described by propositions drawn from a natural language
Knowledge-based Systems Fuzzy logic Rough sets Machine Intelligence Hybrid Systems NN-FL NN-EA FL-EA NN-FL-EA Etc.. Non-linear Dynamics Chaos theory Signal processing Fractals Pattern recognition Machine learning Data mining Web intelligence
Problem Solving Techniques Conventional Hard Computing Soft Computing Precise Models Approximate Models Symbolic Logic Reasoning Traditional Numerical Modeling and Search Approximate Reasoning Functional Approximation and Randomized Search
Soft Computing Soft Computing Main Components Approximate Reasoning Functional Approximation/ Randomized Search Probabilistic Models Fuzzy Logic Neural Networks Evolutionary Algorithms
Artificial Neural Networks
Artificial Neural Networks - Features Typically, structure of a neural network is established and one of a variety of mathematical algorithms is used to determine what the weights of the interconnections should be to maximize the accuracy of the outputs produced. This process by which the synaptic weights of a neural network are adapted according to the problem environment is popularly known as learning. There are broadly three types of learning: Supervised learning, unsupervised learning and reinforcement learning
Different Neural Network Architectures Multi layered feedforward network Recurrent network Competitive network Jordan network
Backpropagation Algorithm Backpropagation algorithm Δw ij (n) = ε * δe δw ij + α * Δw ij (n E = error criteria to be minimized 1) w ij = weight from the i-th input unit to the j-th output ε and α are the learning rate and momentum
Choosing Hidden Neurons A large number of hidden neurons will ensure the correct learning and the network is able to correctly predict the data it has been trained on, but its performance on new data, its ability to generalise, is compromised. With too few a hidden neurons, the network may be unable to learn the relationships amongst the data and the error will fail to fall below an acceptable level. Selection of the number of hidden neurons is a crucial decision. Often a trial and error approach is taken.
Use of Momentum Helps to get out of local minima Smooth out the variations
Effects of Different Learning Rates
Effect on Number of Hidden Neurons Mackey Glass Lowest RMSE for LM = 0.0004 (24 hidden neurons)
Effect on Number of Hidden Neurons Mackey Glass Lowest RMSE for LM = 0.0009 (24 hidden neurons)
Effect on Number of Hidden Neurons - Gas Furnace Series Lowest RMSE for LM = 0.009 (24 hidden neurons)
Effect on Number of Hidden Neurons - Gas Furnace Series Lowest RMSE for SCG = 0.033 (16 hidden neurons)
No Free Lunch Theorem Even though artificial neural networks are capable of performing a wide variety of tasks, yet in practice sometimes they deliver only marginal performance. There is little reason to expect that one can find a uniformly best algorithm for selecting the weights in a feedforward artificial neural network. This is in accordance with the no free lunch theorem, which explains that for any algorithm, any elevated performance over one class of problems is exactly paid for in performance over another class.
Fuzzy Logic
How Fuzzy Sets are Constructed? Construction of fuzzy set depend on two things: Identification of a suitable universe of discourse and the specification of an appropriate membership function Example showing how a set of old people could be represented using fuzzy set and crisp set 1.0 Crisp set A 80 A = Set of Old People Age (years) 1.0.9.5 Fuzzy set A 65 75 Membership function Age (years)
Fuzzy if-then Rules Mamdani fuzzy inference system If pressure is high then volume is small high small Takagi Sugeno fuzzy inference system If pressure is medium then volume = 5 x pressure medium volume = 5 x pressure
Mamdani Inference System Input MF Output Z A 1 B 1 C 1 A 2 X B Y Z 1 2 C 2 Z = (centroid of area) X Y x y Z 2 Output MF Input (x,y)
Fuzzy Expert System A fuzzy expert system to forecast the reactive power (P) at time t+1 by knowing the load current (I) and voltage (V) at time t. The experiment system consists of two stages: Developing the fuzzy expert system and performance evaluation using the test data. The model has two input variables (V and I) and one output variable (P). Training and testing data sets were extracted randomly from the master dataset. 60% of data was used for training and remaining 40% for testing.
Fuzzy Expert System - Some Illustrations Mamdani FIS Takagi - Sugeno FIS No. of MF's Root Mean Squared Error Training Test Training Test 2 0.401 0.397 0.024 0.023 3 0.348 0.334 0.017 0.016 Different quantity of Membership Functions
Fuzzy Expert System - Some Illustrations Mamdani FIS Takagi - Sugeno FIS Root Mean Squared Error Training Test Training Test 0.243 0.240 0.021 0.019 Different shape of Membership Functions
Fuzzy Expert System - Some Illustrations Mamdani FIS Takagi - Sugeno FIS Root Mean Squared Error Training Test Training Test 0.221 0.219 0.019 0.018 For different fuzzy operators
Fuzzy Expert System - Some Illustrations Mamdani FIS Takagi - Sugeno FIS Defuzzification RMSE RMSE Defuzzification Training Test Training Test Centroid 0.221 0.0219 MOM 0.230 0.232 BOA 0.218 0.216 SOM 0.229 0.232 Weighted sum Weighted average 0.019 0.018 0.085 0.084 For different defuzzification operators
Summary of Fuzzy Modeling Surface structure Relevant input and output variables Relevant fuzzy inference system Number of linguistic terms associated with each input / output variable If-then rules Deep structure Type of membership functions Building up the knowledge base Fine tune parameters of MFs using regression and optimization techniques
Evolutionary Computation
Evolutionary Algorithms Evolutionary Algorithms Evolution strategies Evolutionary Programming Genetic Algorithm Genetic Programming Evolutionary Algorithms can be described by x[t + 1] = s(v(x[t])) x[t] : the population at time t under representation x v : is the reproduction operator (s) s : is the selection operator
Evolutionary Algorithm Flow Chart 10010110 01100010 10100100 10011001 01111101............ Selection Elitism reproduction 10010110 01100010 10100100 10011101 01111001............ Current generation Next generation
Evolutionary Algorithm Parameter Settings Before the run Parameter settings During the run Parameter tuning Parameter control Deterministic Adaptive
Evolutionary Algorithm Behaviour Evolutionary algorithm behaviour is determined by the exploitation and exploration relationship kept throughout the run. Adaptive evolutionary algorithms have been built for inducing exploitation -- exploration relationships that avoid the premature convergence problem and improve the final results. If poor settings are used, the EA s performance shall be severely affected.
Where to hybridize?
Comparison of Different Intelligent Systems FIS ANN EC Symbolic AI Mathematical model SG B B SB Learning ability B G SG B Knowledge representation G B SB G Expert knowledge G B B G Nonlinearity G G G SB Optimization ability B SG G B Fault tolerance G G G B Uncertainty tolerance G G G B Real time operation G SG SB B Fuzzy terms used for grading are good (G), slightly good (SG), slightly bad (SB) and bad (B)
Hybrid Soft Computing
Hybrid Soft Computing Architecture - 1 x 1 (n) Soft Computing 1 y 1 (n) Solution Problem x 2 (n) Soft Computing 2 y 2 (n) Solution
Hybrid Soft Computing Architecture - 2 x 1 (n) Soft Computing 1 y 1 (n) Problem Solution x 2 (n) Soft Computing 2 y 2 (n)
Hybrid Soft Computing Architecture - 3 Problem x 1 (n) Soft Computing 1 y 1 (n) Solution Δ Feedback Soft Computing 2 y 2 (n)
Hybrid Soft Computing Architecture - 4 Problem Soft Computing 1 Soft Computing 2 x 1 (n) y 1 (n) z 1 (n) Solution
Hybrid Soft Computing Architecture - 5 Problem x 1 (n) Soft Computing 1 z 1 (n) Solution Soft Computing 2 y 1 (n)
Hybrid Soft Computing Architecture - 6 Problem x 1 (n) Soft Computing 1 z 1 (n) Solution Soft Computing 2 y 1 (n)
Hybrid Soft Computing Architecture - 7 x 1 (n) Soft Computing 1 z 1 (n) Solution y 1 (n) Soft Computing 2
Hybrid Soft Computing Architecture - 8 Problem x 1 (n) Soft Computing 1 z 1 (n) Solution Δ y 1 (n) Soft Computing 2
Hybrid Soft Computing Architecture - 9
Application examples 1. Business Intelligence 2. Data Mining
Business Intelligence
The key in business is to know something that nobody else knows. Aristotle Onassis To understand is to perceive patterns. Sir Isaiah Berlin
Coping with Information Computerization of daily life produces data Point-of-sale, Internet shopping (& browsing), credit cards, banks... Information on credit cards, purchase patterns, product preferences, payment history, sites visited... Travel: One trip by one person generates info on destination, airline preferences, seat selection, hotel, rental car, name, address, restaurant choices... Data cannot be processed or even inspected manually
Data Overload Only a small portion of data collected is analyzed (estimate: 5%) Vast quantities of data are collected and stored out of fear that important info will be missed Data volume grows so fast that old data is never analyzed Database systems do not support queries like Who is likely to buy product X List all reports of problems similar to this one Flag all fraudulent transactions But these may be the most important questions!
What is Business Intelligence? Business intelligence is a smaller component of business process management. Business intelligence is knowing exactly what is happening in an organization. It's taking the pulse. It assists businesses in making better business decisions. Strong piece to measure the company's performance Monitors the financial and operational health of the organization. Provides two- way integration with operational systems and information feedback analysis.
E-Commerce Technologies Used SEARCH ENGINE ON-LINE CATALOG RECOMMENDER AGENT CONFIGURATOR SHOPPING BOT AGGREGATOR AUTOMATED AGENTS TRANSACTION PROCESSOR DATA INTERCHANGE CRYPTOGRAPHY E-PAYMENT SYSTEMS TRACKING AGENT ON-LINE HELP BROWSER SHARING INTERNET TELEPHONY BUYER FINDS SELLER SELECTION OF GOODS NEGOTIATION SALE PAYMENT DELIVERY POST-SALE ACTIVITY Information gathered SEARCH BEHAVIOR BROWSING BEHAVIOR CUSTOMER PREFERENCES EFFECTIVENESS OF PROMOTIONS BARGAINING STRATEGIES PRICE SENSITIVITIES PERSONAL DATA MARKET BASKET CREDIT/PAYMENT INFORMATION DELIVERY REQUIREMENTS ON-LINE PROBLEM REPORTS CUSTOMER SATISFACTION FOLLOW-ON SALES OPPORTUNITIES
What is Web Mining? Web mining is the application of data mining or other information process techniques to WWW, to find useful patterns.
Due to intense competition on one hand and the customer s option to choose from several alternatives business community has realized the necessity of intelligent marketing strategies and relationship management. Web usage mining attempts to discover useful knowledge from the secondary data obtained from the interactions of the users with the Web. Web usage mining has become very critical for effective Web site management, creating adaptive Web sites, business and support services, personalization, network traffic flow analysis and so on.
Analyzing the Web access logs can help understand the user behaviour and the web structure. From the business and applications point of view, knowledge obtained from the Web usage patterns could be directly applied to efficiently manage activities related to e-business, e-services etc. Accurate Web usage information could help to attract new customers, retain current customers, improve cross marketing/sales, effectiveness of promotional campaigns, tracking leaving customers and find the most effective logical structure for their Web space.
Web Usage Mining Contrary to popular belief, everything necessary for data mining Web traffic is NOT always automatically collected. Content and Structure Data Preprocessing Pattern Discovery Pattern Analysis Raw Usage Data Preprocessed Clickstream Data Rules, Patterns, and Statistics "Interesting" Rules, Patterns, and Statistics
Web Usage Data Sources Sources - Client level, Server level. Abstractions - User, Page File, Page View, Server Session. Phone Line "Internet" Client Computer Modem ISP Server Web Server Content Server User Behaviors Server Logs Site Content
Taxonomy of Web Mining Methods Web Mining Methods Predictive Modeling Data Clustering Link Analysis Text Mining Deviation Detection Decision Trees Neural Networks Machine learning Clustering K-Means Fuzzy ACC Rule Association Semantic Maps Visualization
Predictive Modeling Objective: use data about the past to predict future behavior Sample problems: Will this (new) customer pay his bill on time? (classification) What will the Dow-Jones Industrial Average be on October 15? (prediction)
Predictive Modeling Honest Horia Maria Daniel Crooked James Jeff John Which characteristics distinguish the two groups?
Web Log File Sample <cs.okstate.edu>
Web Usage Mining Framework
Web Usage Mining Framework
Evolutionary Fuzzy Clustering Usually a number of cluster centers are randomly initialized and the FCM algorithm provides an iterative approach to approximate the minimum of the objective function starting from a given position and leads to any of its local minima. No guarantee ensures that FCM converges to an optimum solution (can be trapped by local extrema in the process of optimizing the clustering criterion). The performance is very sensitive to initialization of the cluster centers. An evolutionary algorithm is used to decide the optimal number of clusters and their cluster centers.
Learning Rules with Evolutionary Algorithms The chromosome encodes individual rules. Only the best individual is considered to form part of the solution. Initial rules were generated using grid partitioning system. A1 B1 A2 B2 x y EA s were then used to evaluate this rules to incorporate the rule into the final set of rules using a iterative learning approach by penalizing less contributing rules. y B2 B1 A1 A2 x
Genetic Representation of Fuzzy Rules Chromosome representing m fuzzy rules 1 stands for a selected and 0 for a non-selected rule Length of the string depending on the number of input and output variables. 3 input variables composed of 3,2,2 fuzzy sets 1 output variable composed of 3 fuzzy sets High level representation reduces computational complexity
Chromosome structure of the i-miner
Monash University s Central Web site @ Melbourne, Australia over 7 million hits in a week!!!!
Hourly Web Traffic
Daily Web Traffic
Data Complexity Over 7 million hits in a week!! Due to the enormous traffic volume and chaotic access behaviour, the prediction for Web user access patterns becomes more difficult and complex Pattern Discovery and Trend Analysis Formulation of Clusters Discovering Hidden Information Daily and Hourly Trends (Volume of Hits)
Ant Colony Clustering Workers have been reported to sort their larvae or form piles of corpses literally cemeteries to clean up their nests. Eric Bonabeau, Marco Dorigo and Guy Théraulaz, 1999. Swarm Intelligence: From Natural to Artificial Systems, Santa Fe Institute in the Sciences of the Complexity, Oxford Univ. Press, New York. The basic mechanism underlying this type of aggregation phenomenon is an attraction between dead items mediated by the ant workers: small clusters of items grow by attracting workers to deposit more items. The general idea is that isolated items should be picked up and dropped at some other location where more items of that type are present.
Parameters Settings The statistical / text data from 01 January 2002 to 07 July were used. Takagi Sugeno Fuzzy Inference System (TSFIS) 81 Fuzzy if-then Rules 50 Epochs Backpropagation Neural Networks (BPNNs) Neurons: 14 / 17 Momentum: 0.05 / 0.2 3000 Epochs Linear Genetic Programming (LGP) 500 Population, 200,000 tournaments 0.9 Crossover / Mutation rate 256 Maximum Program Size
Parameters Settings (i Miner) Population size 30 Maximum no of generations 35 Fuzzy inference system Rule antecedent membership functions Rule consequent parameters Gradient descent learning Ranked based selection Takagi Sugeno 3 membership functions per input variable parameterized Gaussian 10 epochs linear parameters 0.50 Elitism 5 % Starting mutation rate 0.50
Hidden Knowledge From SOM Clusters daily traffic hourly traffic
Hidden Knowledge From Clusters
Hidden Knowledge From Clusters
Hidden Knowledge From Clusters
E-FCM Clusters
E-FCM Clusters
E-FCM Clusters Fuzzy clustering of visitors based on the day of access
ACO Clustering t = 1 t = 100 t = 500 t = 900 t = 10,000,000 Daily Web traffic data on a 25 x 25 non-parametric toroidal grid, 14 ants
ACO Clustering t = 1 t = 100 t = 500 t = 900 t = 10,000,000 Hourly web traffic - hourly Web traffic data on a 45 x 45 nonparametric toroidal grid, 48 ants
Performance of i-miner (Training)
Performance of i-miner (Test)
Performance of the different paradigms Hybrid method Daily (1 day ahead) RMSE Train Test CC ANT-LGP 0.0191 0.0291 0.9963 i-miner (FCM-FIS) 0.0044 0.0053 0.9967 SOM-ANN 0.0345 0.0481 0.9292 SOM-LGP 0.0543 0.0749 0.9315
Performance of the different paradigms Hybrid method Hourly (1 hour ahead) RMSE CC Train Test ANT-LGP 0.2561 0.035 0.9921 i-miner (FCM-FIS) 0.0012 0.0041 0.9981 SOM-ANN 0.0546 0.0639 0.9493 SOM-LGP 0.0654 0.0516 0.9446
Test results of the daily trends for 6 days
Test results of the average hourly trends
Data Mining
Automatic Design of Hierarchical Takagi-Sugeno Type Fuzzy Systems As a way to overcome the curse-of-dimensionality, it was suggested to arrange several low-dimensional rule base in a hierarchical structure, i.e., a tree, causing the number of possible rules to grow in a linear way according to the number of inputs. Building a hierarchical fuzzy system is a difficult task. This is because we need to define the architecture of the system (the modules, the input variables of each module, and the interactions between modules), as well as the rules of each modules. 96
Automatic Design of Hierarchical Takagi-Sugeno Type Fuzzy Systems Two approaches could be used to tackle this problem. - Expert supplies all the required knowledge for building the system. - The other one is to use machine and/or optimization techniques to construct/adapt the system. Several machine learning and optimization techniques have been applied to aid the process of building hierarchical fuzzy systems. 97
Automatic Design of Hierarchical Takagi-Sugeno Type Fuzzy Systems The problems in designing a hierarchical fuzzy logic system includes the following: Selecting an appropriate hierarchical structure; Selecting the inputs for each fuzzy TS sub-model Determining the rule base for each fuzzy TS sub-model Optimizing the parameters in the antecedent parts and the linear weights in the consequent parts. 98
Automatic Design of Hierarchical Takagi-Sugeno Type Fuzzy Systems 99
Proposed Approach The hierarchical structure is evolved using a Probabilistic Incremental Program Evolution (PIPE). The fine tuning of the rule's parameters encoded in the structure is accomplished using Evolutionary Programming (EP). The proposed method interleaves both PIPE and EP optimizations. Starting with random structures and rules' parameters, it first tries to improve the hierarchical structure and then as soon as an improved structure is found, it fine tunes its rules' parameters. It then goes back to improve the structure again and, provided it finds a better structure, it again fine tunes the rules' parameters. This loop continues until a satisfactory solution (hierarchical TS-FS model) is found or a time limit is reached. 10 0
Encoding A tree-structural based encoding method. The reasons for choosing this representation: (1) the tree has a natural and typical hierarchical layer; (2) with pre-defined instruction sets, the tree can be created and evolved using the existing tree-structurebased approaches, i.e., Genetic Programming (GP) and PIPE algorithms. 10 1
Encoding Assume that the used instruction set is I={+2, +3, x1, x2, x3, x4, where +2 and +3 denote non-leaf nodes' instructions taking 2 and 3 arguments, respectively. x1, x2, x3, x4 are leaf nodes' instructions taking zero arguments each. 10 2
PIPE PIPE combines probability vector coding of program instructions, population based incremental learning and treecoded programs. PIPE iteratively generates successive populations of functional programs according to an adaptive probability distribution, represented as a Probabilistic Prototype Tree (PPT), over all possible programs. Each iteration uses the best program to refine the distribution. Thus, the structures of promising individuals are learned and encoded in PPT. 10 3
Program Development Example of node s N 1,0 s instruction probability vector P 1,0 (left). Probabilistic proto type treeppt(middle). Possible extracted program (right). 10 4
Comparison of the incremental type multilevel FRS (IFRS), aggregated type mutilevel FRS (AFRS), and the hierarchical TS-FS for Mackey-Glass timeseries prediction Model layer No. of rules No. of para. RMSE(train) RMSE(Test) IFRS 4 25 58 0.0240 0.0253 AFRS 5 36 78 0.0267 0.0256 HTS-FS 3 24 33 0.0179 0.0167 Duan, J.-C. and Chung, F.-L. : Multilevel fuzzy relational systems: structure and identification. Soft Computing, Vol. 6, (2002) 71-86
The structure of the evolved hierarchical TS-FS model for predicting of Mackey-Glass time-series The importance degree of each input variables for Mackey-Glass time-series xi x0 x1 x2 x3 x4 x5 Impo(xi) 0.247 0.332 0.072 0.113 0.056 0.180
The developed optimal H-TS-FS architectures (Irisdata)
The developed optimal H-TS-FS architectures (Wine data)
Hybrid Soft Computing: Some Challenges Lots of success stories! We need programs that could deal with common sense informatic situation!! Most of the existing frameworks rely on user specified parameters. The intelligent system should be able to learn from data in a continuous, incremental way, able to grow as they operate, update their knowledge and refine the model through interaction with the environment. Adaptation process could learn from success and mistakes and apply that knowledge to new problems. Managing computational complexity.
What color is this rectangle?
Is this called yellow?
People define the limits of a color, such as yellow Different idea of what is yellow Knowledge is acquired by learning Personal situation, drugs, job etc. all can affect! 114
Limitations of the Human Mind Naming of colors. Based on learning, not on absolute standards. Face recognition. Cannot be passed on to another person by explanation. Object recognition. People cannot properly explain how they recognize objects.
Moore s Law
Thank You &