UNIVERSITY OF SURREY - PDF Free Download

UNIVERSITY OF SURREY B.Sc. Undergraduate Programmes in Computing B.Sc. Undergraduate Programmes in Mathematical Studies Level HE3 Examination MODULE CS364 Artificial Intelligence Time allowed: 2 hours Autumn Semester 2005 Attempt TWO Questions from THREE, each question is worth 50 marks. If any candidate attempts more than TWO questions, only the best TWO solutions will be taken into account. SEE NEXT PAGE 1

1. This question is about (a) Knowledge Acquisition and (b) Knowledge Representation. (a) Knowledge Acquisition is defined as an important step that characterises the Knowledge Representation process and system implementation thereafter. (i) Elaborate this definition of Knowledge Acquisition, and specify the main players of Knowledge Acquisition. Knowledge acquisition can be regarded as a method by which a knowledge engineer gathers information mainly from experts, but also from text books, technical manuals, research papers and other authoritative sources for ultimate translation into a knowledge base, understandable by both machines and humans. The person undertaking the knowledge acquisition, the knowledge engineer, must convert the acquired knowledge into an electronic format that a computer program can use. (ii) Briefly describe the process of Knowledge Acquisition. [9 marks] In the process of Knowledge Acquisition for an Expert System Project, the knowledge engineer basically performs four major tasks in sequence: First, the engineer ensures that he or she understands the aims and objectives of the proposed expert system to get a feeling for the potential scope of the project. 2

Second, the engineer develops a working knowledge of the problem domain by mastering it's terminology by looking up technical dictionaries and terminology data bases. For this task the key sources of knowledge are identified: textbooks, papers, technical reports, manuals, codes of practice, users and domain experts. Third, the knowledge engineer interacts with experts via meetings or interviews to acquire, verify and validate their knowledge. Fourth, the knowledge engineer produces a "document knowledge base"; a document or group of documents (nowadays in electronic format) which form an intermediate stage in the translation of knowledge from source to computer program. This comprises: o the interview transcripts, o the analysis of the information they contain o and a full description of the major domain entities (e.g. tasks, rules and objects). (b) John Sowa is one of the key proponents of conceptual graphs (CG) for Knowledge Representation. (i) Define what conceptual graphs are, and briefly describe what the main characteristics are. [8 marks] Conceptual graphs form a knowledge representation language based on the one hand in linguistics, psychology and philosophy, and data structures and data processing techniques on the other. The main aim is mapping perception onto an abstract representation and reasoning system. A conceptual graph consists of concept nodes and relation nodes The concept nodes represent entities, attributes, states, and events 3

The relation nodes show how the concepts are interconnected Conceptual Graphs are finite, connected, bipartite graphs. Finite: because any graph (in 'human brain' or 'computer storage') can only have a finite number of concepts and conceptual relations. Connected: because two parts that are not connected would simply be called two conceptual graphs. Bipartite: because there are two different kinds of nodes: concepts and conceptual relations, and every arc links a node of one kind to a node of another kind (ii) According to the concept nodes, relations, and arrow directions, write in words what the following conceptual graphs means: [Person]< (Agnt)< [Walk] [Man: John] >(Poss) >[PC] >(Attr) >[Powerful] [Mouse: Jerry] >(Chrc) >[Colour: Brown] [6 marks] [Person]< (Agnt)< [Walk] A Person is the agent of an Act, which is Walk. or Walking has an agent which is a Person. [Man: John] >(Poss) >[PC] >(Attr) >[Powerful] A Man, John, has a possession which is a PC. This PC has an attribute, which is Powerful. or 4

Powerful is an attribute of a PC, which is a possession of a Man, who is John. [Mouse: Jerry] >(Chrc) >[Colour: Brown] Jerry, the Mouse, has a characteristic which is a Colour, Brown. or The Colour Brown is a characteristic of a Mouse, Jerry. (iii) Create the conceptual graphs of the following sentences: "Bus number 9 is going to Copenhagen" "John was singing" "Romeo marries Juliet" [6 marks] "Bus number 9 is going to Copenhagen" [Bus: #9]< (Agnt)< [Go] >(Dest) >[City: Copenhagen] "John was singing" (Past) >[Situation: [Person: John]< (Agnt)< [Sing] ] "Romeo marries Juliet" [Person: Romeo]< (Agnt)< [Marry] >(Benf) >[Person: Juliet] or [Lover: Romeo]< (Agnt)< [Marry] >(Benf) >[Lover: Juliet] or [Man: Romeo]< (Agnt)< [Marry] >(Benf) >[Woman: Juliet] (v) New conceptual graphs may be derived from other canonical graphs either by generalising or specialising. 5

Considering the following two graphs (g1 and g2). Construct a new single graph that may derive from the two other graphs (g1 and g2), when applying the generalisation and/or specialisation rules accordingly. g1: man age old agent play object guitar g2: person: "Tom" age old location pub [8 marks] The restriction of g2 (based on g1) is: g3: man: "Tom" location pub age old 6

The join of g1 and g3 is: agent object guitar play g4: man: "Tom" locatio pub age age old The simplification of g4 is: agent object guitar play g4: man: "Tom" locatio pub age old (c) Collins and Quillian s semantic networks are found to be logically inadequate. The notion of concept nodes, within a conceptual graph, tackles this problem. (i) Explain what concept nodes represent and how these are used to tackle the semantic network disadvantages. Please use examples to support your statements. [8 marks] Collins and Quillian s semantic networks are found to be logically inadequate. This situation was not resolved in some of the subsequent formulations of semantic networks. Specifically, it was difficult in a typical semantic network notation to distinguish between nodes describing: classes and subclasses 7

classes and members In the sentence: Tom is a cat, a feline mammal Tom is_a cat is_a feline is_a mammal individual species subclass class The relation "is_a" is used to describe relationships between concepts that are mildly different. A good representation should allow us to distinguish between: Individuals and species Species and classes Classes and subclasses Individuals may have properties that may not influence their belonging to a subclass: Tom is a brown tabby Should not influence the observation that: A tabby cat is a kind of cat In CG theory, 'every concept is a unique individual of a particular type'. Concept nodes are labelled with descriptors or names like "dog", "cat", "gravity", etc. The labels refer to the class or type of individual represented by the node. Each concept node is used to refer to an individual concept or a generic concept. CG allows: nodes to be labelled simultaneously with the name of the individual the node represents and its type. The two are separated by a colon (":") to represent specific but unnamed individuals by a unique prescribed number. to use general markers to refer to an unspecified individual. to use named variables to refer to an individual 8

2. This question is about Uncertainty Management and Fuzzy Logic. (a) Define what uncertainty management is, and provide three main areas of uncertain knowledge, with example(s). [8 marks] Uncertainty is defined as the lack of the exact knowledge that would enable us to reach a perfectly reliable conclusion. Classical logic permits only exact reasoning. It assumes that perfect knowledge always exists and the law of the excluded middle can always be applied. Main sources of uncertain knowledge are: Weak implications. Domain experts and knowledge engineers have the painful task of establishing concrete correlations between IF (condition) and THEN (action) parts of the rules. Therefore, expert systems need to have the ability to handle vague associations, for example by accepting the degree of correlations as numerical certainty factors. Imprecise language. Our natural language is ambiguous and imprecise. We describe facts with such terms as often and sometimes, frequently and hardly ever. As a result, it can be difficult to express knowledge in the precise IF THEN form of production rules. However, if the meaning of the facts is quantified, it can be used in expert systems. In 1944, Ray Simpson asked 355 high school and college students to place 20 terms like often on a scale between 1 and 100. In 1968, Milton Hakel repeated this experiment. From the experiment, it was clear that each time a term was given a different value. Unknown data. When the data is incomplete or missing, the only solution is to accept the value unknown and proceed to an approximate reasoning with this value. Combining the views of different experts. Large expert systems usually combine the knowledge and expertise of a number of experts. Unfortunately, experts often have contradictory opinions and produce conflicting rules. To resolve the conflict, the knowledge engineer has to attach a weight to each expert and then calculate the composite conclusion. But no systematic method exists to obtain these weights. 9

(b) Fuzzy Logic is one of the methods used to tackle uncertainty. Consider the example of a washing machine: The washing machine is one of the first devices to use fuzzy logic. Basically, the problem is to identify the appropriate time t needed to wash the load, given the dirtiness of the load and the volume of the load. The following table shows the rule base for the washing time problem: load volume load dirtiness vd md ld nd fl vlot vlot lot lit ml vlot mt mt lit ll lot lot lit lit where: vd: very dirty fl: full load vlot: very long time md: medium dirty ml: medium load lot: long time ld: lightly dirty ll: low load mt: medium time nd: not dirty lit: little time The rule table consists of 12 rules. The rule table is interpreted in the following way. For instance, the entry in the second row and the third column of the table specifies the rule: If load volume is medium load and load dirtiness is little load then washing time is medium time The fuzzy sets for load dirtiness, volume and washing time respectively are based on the linear equation μ(x)=ax + b, and are defined based on the following tables: Load Dirtiness (D) nd ld md vd μ(d)=0 if D 1 2 6 μ(d)=1 if D = 0 3 5 10 μ(d)=0 if D 2 5 7 10

Load Volume (V) ll ml fl μ(v)=0 if V 2 6 μ(v)=1 if V = 0 5 10 μ(v)=0 if V 4 8 Washing Time (T) lit mt lot vlot μ(t)=0 if T 20 50 90 μ(t)=1 if T = 10 50 80 120 μ(t)=0 if T 30 60 100 The fuzzy sets table is interpreted in the following way. For instance, the mt set of T: µ mt 0, ( T ) = 1, 0, if if if : T : T : T 20 = 50 60 1 20 50 60 (i) Based on the fuzzy sets tables above, draw three individual graphs, one for each D, V, and T individually, showing the fuzzy sets. [6 marks] Load Dirtiness 1 0.9 0.8 0.7 membership 0.6 0.5 0.4 0.3 nd ld md vd 0.2 0.1 0 0 1 2 3 4 5 6 7 8 9 10 11

Load Volume membership 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5 6 7 8 9 10 ll ml fl Washing Time 1 0.9 0.8 0.7 membership 0.6 0.5 0.4 0.3 lit mt lot vlot 0.2 0.1 0 0 10 20 30 40 50 60 70 80 90 100 110 120 (ii) The inference of a fuzzy expert system, based on the Mamdani method, depends on the execution of four major tasks: Fuzzification, Rule Evaluation, Aggregation, and Defuzzification. Consider the case when the input variables are: D = 9, V= 3. Using the rule base, execute each of the four inference tasks to compute the washing time T necessary to wash the load, using Centre of Gravity in the Defuzzification task. 12

NOTE: To calculate the degrees of truth μ(x) for a given member you can either use triangular proportions, or calculate and use the appropriate linear function μ(x)=ax + b. [20 marks] Fuzzification The two rules firing are: RULE 1:If D is very dirty and V is little load then T is long time RULE 2:If D is very dirty and V is medium load then T is very long time For D=9 the relevant linguistic term is very dirty and the corresponding membership function is defined as: for positive medium vd ( D 3 µ 6 D 10 D ) =, therefore 4 2 vd µ 6 D 10 ( 9 ) = 3 4 For V=3 the relevant linguistic terms are little load and medium load and the corresponding membership functions are: Rule Evaluation ll ( V µ 0 V 4 V ) = + 1, therefore 4 ml ( V µ ) 2 2 V 8 V =, therefore 3 3 ll µ 0 D 4 ( 3 ) = ml µ 2 D 8 ( 3 ) = 1 4 1 3 As the two premises in RULE 1 are conjunctive then we have to take the minimum of the two min{3/4, 1/4}=1/4 As the two premises in RULE 2 are conjunctive then we have to take the minimum of the two min{3/4, 1/3}=1/3 Aggregation 13

Washing Time 1 0.9 membership 0.8 0.7 0.6 0.5 0.4 0.3 0.2 lit mt lot vlot 0.1 0 0 10 20 30 40 50 60 70 80 90 100 110 120 Defuzzification SoG = ( 60 + 70 + 80 + 90 ) 0. 25 4 0. 25 + + ( 100 + 110 3 0. 33 + 120 ) 0. 33 = 185 2 = 92. 5 (iii) How, in your opinion, you would create the output sets T, if using the Sugeno s concept of spikes? Draw the graph comprising the fuzzy sets for T and explain your thinking. We use four spikes to represent each linguistic variable accordingly: first spike at position 10, second spike at position 50, third spike at position 80, and fourth spike at position 120. The position of the spikes is based on the member of each set with the higher membership value. Alternatively we could have used the CoG to calculate the position of the spike on the x axis or even consulting an expert s opinion. The graph is shown bellow: 14

Washing Time 1 0.9 0.8 membership 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 50 60 70 80 90 100 110 120 lit mt lot vlot (iv) Calculate the output of the mashing machine, given the same inputs, if Sugeno was used instead of the Mamdani inference mechanism. [6 marks] Using the same inference process used in Mamdani method, we calculate the weighted average WA output: 10 0 WA =. 0 + 50 0. 0 0. 0 + + 0. 0 80 0 + 0. 25 +. 25 + 120 0. 33 0. 33 = 60 0. 58 103 (v) Which one of the two inference methods is more appropriate for the problem and why? For a problem like the mashing time of a washing machine, it would be more appropriate to use the Mamdani inference methods due to its accuracy. The Sugeno inference method is based on spikes that work better with problems that require calculation speed and not precise output. Having a precise timing for the washing is ideal because we do not want to risk not washing the cloths properly or washing them too much as that is uneconomical and may even damage the cloths. 15

3. This question is about Machine Learning and Decision Trees. (a) Describe three key characteristics of machine learning techniques. When would you use machine learning compared to other artificial intelligence techniques? 1. Learn from experience: through examples, analogy or discovery 2. To adapt: changes in response to interaction 3. Generalisation: to use experience to form a response to novel situations [3 marks] Where the complexity of the task does not allow for traditional techniques to be used (knowledge engineering bottleneck). For problems that cannot be defined (data mining) or that require learning (modelling cognition). [2 marks] (b) Neural networks and decision trees are two popular machine learning techniques, with decision trees often favoured for classification. Describe the reasons why you would use a decision tree rather than a neural network for a classification task. Decision trees: 1. Transparency: map of decision process. 2. Good for classification problems. 3. Can extract rules from the tree. Neural networks: 1. Not transparent: black boxes. 2. Difficult to extract rules. (c) Data for an example classification task is given in the following table: Example Blood Test Build Diagnosis 1 Present Slight Positive 2 Clear Medium Negative 3 Present Heavy Positive 4 Clear Slight Positive 5 Present Medium Negative 6 Clear Heavy Negative The table shows six example medical diagnoses. The two attributes show the presence of a particular substance in the blood of the example patient 16

( Present, Clear ) and the patient s build ( Slight, Medium or Heavy ). The diagnosis has two outcomes ( Positive, Negative ). A decision tree can be constructed to assist in the diagnosis of future examples using the ID3 algorithm. The ID3 algorithm uses the value of the Entropy (E) for each attribute/value pair: E c ( a = v ) = i = 1 p i log 2 p i where c is the number of classification categories ( Diagnosis ), a is the attribute ( Blood Test, Build ) with value v, and p i is the probability of a particular diagnosis for an attribute with a given value. The Entropy values for each attribute/value are then used to calculate the Information Gain for each attribute: Gain T v a = j (, a ) = E ( T ) E ( T a = j ) where T is the set of examples, E(T) is the Entropy for all of the examples, v is the number of values for the given attribute, T is the total number of examples and attribute/value pair. (i) T a = j j = 1 T T is the number of examples with the given Using the ID3 algorithm, determine which of the two attributes ( Blood Test or Build ) should be used as the root node in a decision tree. You should show the Entropy values for each attribute and value pair, together with the Information Gain for each attribute. Describe how you would use these values to select the root attribute. [24 marks] Attribute Value Pr(+) Pr( ) Entropy Gain Build Slight 2/2 0/2 0 0.667 Medium 0/2 2/2 0 Heavy 1/2 1/2 1 Blood Test Present 2/3 1/3 0.918 0.082 Clear 1/3 2/3 0.918 Total Marks 5 (1 each) 5 (1 each) 5 (1 each) 6 (3 each) [21 marks (method and values)] The Build attribute should be chosen as the root node because it has the largest value of Information Gain. The ID3 algorithm uses information theory to select the attribute that results in the predictor with the maximum separating power. This is indicated through the gain value. [3 marks] 17

(ii) Draw the resulting decision tree. Build Slight Medium Heavy {1,4} Positive {2,5} Negative Present Clear Blood Test {3,6} {3} Positive {6} Negative (iii) Explain why you think your tree is the best representation. Illustrate your answer with a drawing of an alternative tree. ID3 selects the attributes that are the best (or near best) separators at each branch of the tree. By way of example, an alternate tree is: Blood Test Present Clear Build {1,3,5} Build {2,4,6} Slight Medium Heavy Slight Medium Heavy {1} Positive {5} Negative {3} {4} Positive Positive {2} Negative {6} Negative This tree has six paths compared with four previously, and therefore has 6 rules (albeit simple). The tree is therefore simpler with the choice of Build as the root attribute. (d) Machine learning techniques rely upon the data to construct an appropriate classifier. What techniques would you use to ensure that you construct the 18

best possible classifier given the available data? Illustrate your answer by giving examples of the techniques applied to the construction of a decision tree. [6 marks] Use prior knowledge where available. Selection of attributes/nodes. Understand the data: examples may be noisy and may consist of irrelevant attributes. Explore attributes and make sure that the classifier does not overfit the data. Use training/validation/test data sets, prune the tree, stop when desired performance achieve. Normalisation. [3 marks per technique and example] INTERNAL EXAMINERS: DR B. VRUSIAS, DR M. CASEY EXTERNAL EXAMINER: 19