Approaches to Model-Tracing in Cognitive Tutors

Kodaganallur, V., Wietz, R., Heffernan, N. T., & Rosenthal, D. (Submitted). Approaches to Model-Tracing in Cognitive Tutors. (Eds) Proceedings of the 13th Conference on Artificial Intelligence in Education. IOS Press. Approaches to Model-Tracing in Cognitive Tutors Viswanathan KODAGANALLUR 1, Rob WEITZ 1, Neil HEFFERNAN 2, David ROSENTHAL 1 1 School of Business, Seton Hall University, South Orange, NJ 07079 2 Computer Science Department, Worchester Polytechnic Institute, Worchester, MA 01609 Abstract. Cognitive (or Model-Tracing) Tutors, a type of intelligent tutor, have demonstrated effectiveness in helping students learn. Model tracing algorithms are central to cognitive tutors, but have not been the focus of much published research. In this paper we briefly review the existing approaches and suggest an alternative approach that is very simple, but is suitable when tracing each student action requires just a single rule. In this approach the process of rule execution alone suffices for model-tracing; this eliminates the need for a costly tree/graph search over the search space of the problem s state-transitions. We also address the issue of goal structure and suggest a way of writing the production rules based on top-down goal decomposition. Contrary to common practice, this approach views the goal structure of a problem as a hierarchical rather than a linear structure and hence serves to provide the student with a richer goal model. 1. Introduction Cognitive Tutors [1, 2] have been successfully deployed in a wide range of domains including college-level physics, high school algebra, geometry, and computer programming. (See [3] for an overview.) The underlying paradigm of cognitive tutors has its origins in the ACT-R theory [4]. According to ACT-R, acquiring cognitive knowledge involves the formulation of thousands of rules relating task goals and task states to actions and consequences. Generally, the claim for cognitive tutors is that their use results in as much as a one standard deviation improvement in student performance beyond standard classroom instruction. Intelligent tutoring is generally set in a problem-solving context the student is presented with a problem and the tutor provides feedback as the student works. Cognitive tutors are also able to provide planning advice when the student is unsure about how to proceed from a given situation. At the heart of a cognitive tutor is a knowledge base consisting of production rules. Cognitive tutors use a model-tracing algorithm to identify the rules that the student appears to be using and are also called model-tracing tutors for this reason. Although model-tracing is a very important part of cognitive tutors, algorithms for model-tracing have not been the focus of much published research. One such algorithm is described in [5] and [6]. In this paper we review the model-tracing approaches that have been employed in fielded tutors and suggest a new approach that is simpler, but is applicable only under special conditions. Specifically, the proposed approach works only when each student action can be traced by the application of a single rule. It therefore is suitable in situations where the problem has been broken down into elementary steps and the student is expected to proceed step by step. This holds, for example, for the canonical addition problem ([4]). We have also used it to build a tutor for statistical hypothesis testing [3]. This approach allows for very targeted remediation.

As part of the approach, we also suggest a way of writing rules that emphasize top-down goal decomposition such that the goal hierarchy of the problem is clearly enshrined in the rules. This enables the tutor to naturally give strong procedural remediation and to communicate the goal hierarchy explicitly to the student. Section 2 briefly reviews model-tracing and the approaches that have been employed thus far. Section 3 describes the proposed approach and section 4 describes our approach to top-down goal decomposition. 2. Current Approaches to Model-Tracing Cognitive tutors aim to provide remediation by inferring the production rules that a student used in order to arrive at a given problem state. They do this by modeling the problem solving process (both valid and invalid) by a set of production rules. Some of these rules are expert rules or rules that a competent problem solver in the domain might adopt. Others are buggy rules, or rules that model faulty reasoning that is known to occur in real problem solving contexts. When the tutor is given a student s solution, it traces the solution by identifying a set of production rules that could have generated it. If such a set of rules is found, and it uses only expert rules, then the student has not committed any mistakes. On the other hand if the trace employs one or more buggy rules, then the tutor knows exactly what conceptual errors the student committed and can provide appropriate remediation. While it is possible to design cognitive tutors that evaluate a student s actions only when explicitly requested, it is common for cognitive tutors to provide immediate feedback upon each student action. Abstractly, the set of production rules can be seen as forming a directed acyclic graph with the problem states as nodes and state-transitions as arcs (Table 1). The graph models all anticipated states and state-transitions for a particular problem or set of similar problems. A production rule corresponding to the arc between nodes n1 and n3 would be: IF the current state is n 1 THEN move to state n 3 Although there could conceivably be a large number of possible states that could follow n 1, it is necessary only to provide productions for states that the tutor author expects some student(s) to reach Suppose a student who was in state s moved to state t, then viewed in the context of Table 1, the role of a model tracing algorithm is to identify a path from s to t in the graph. It is quite possible that there are several such paths because several combinations of rule firings could lead from s to t. In the latter case, the tutor would have to employ some mechanism to choose one of the paths for remediation if needed.

Table 1: Problem state-space viewed as directed acyclic graph The first approach to model-tracing was based on the work of Pelletier [5] who created Tertl, a goal-driven production system. Pelletier developed a forward chaining production system and incorporated a model-tracing algorithm into it. The unit of computation in the Tertl system is a cycle. In each cycle the system first generates a conflict tree which enables it to find various combinations of production rules that could lead to the goal (or state reached by the student). It then goes through a resolution phase in which it chooses one among the possibly many combinations of rule firings. Finally it commits the chosen solution by firing the rules in the chosen combination. This approach to model-tracing was used in the Tutor Development Kit, or TDK, [6] used in the Human Computer Interaction Institute (HCI) at Carnegie Mellon University for many years to build cognitive tutors. More recently the use of TDK has been discontinued and researchers at HCI have shifted to using CTAT, the Cognitive Tutor Authoring Tools ([7]). CTAT is based on JESS, the Java Expert System Shell [8], a forward chaining production system. Unlike Tertl, JESS does not have a built-in model-tracing element. In CTAT a domain-independent model-tracing algorithm works in consort with a set of domain-specific production rules written in the JESS language. These production rules specify, for each significant problem state that the student can reach, the possible successor states [9]. In terms of Table 1, CTAT searches for a path from s to t by depth-first iterative-deepening (DFID) [10]. It first finds the list of rules that can be fired from state s and checks if firing any of these leads to the state t (the state is reset to s before each new rule in the list is tried). If it does, then the input has been traced and the associated rule is known. If a single rule firing is insufficient to trace the input, then it considers all possible two-rule firings from state s to get t (again, the state is reset to s before each two-rule sequence is tried). This process is continued, while progressively increasing the number of rule firings (search depth), until either a set of rule firings is found, or it is clear that t cannot be reached from s. The whole DFID search is a separate process that repeatedly invokes JESS through its Java API. The DFID process can be costly when either the rule depth is large or when the number of productions is large. It is also known that DFID can perform duplicate node expansions when the search space is a strict graph. This can be a performance disadvantage as well. The TDK and the CTAT approaches are both dynamic approaches in the sense that they generate the nodes in the relevant portion of the graph on each invocation. An approach that trades-off space for time would be one that pre-generates all possible nodes of the state-space and stores them in some indexed form for easy retrieval along with their associated rule combinations. If this is done, then the process of model-tracing reduces to one of indexed table lookup. Given that memory has become extremely cheap, this approach for speeding run time might be feasible even for problems with very large state-spaces.

3. New Approach To Model Tracing Although the task of model-tracing, in the general case, involves identifying several ruleinvocations to account for a single student action, there are several situations where this might not be necessary. In light of Anderson s [4] finding that immediate feedback is more effective, it might be pedagogically useful in certain situations to break up the problem into several small steps and make the student follow these steps while providing remediation along the way. This might be especially true in tutors intended for beginning learners of a discipline. When such an approach is adopted, it often turns out that the task of model-tracing becomes a lot simpler as most student actions can be traced to just a single rule application. Based on this reasoning, we present an approach to model-tracing wherein model-tracing is accomplished simply by the process of a rule-engine executing rules no additional algorithm is needed. In this sense it is simpler. The productions in the new approach are similar to those used in CTAT, but the antecedents of some of the productions are augmented with a check to see if the state anticipated by the production is the state reached by the student. If the states match, then the rule employed by the student has been identified and the student s input can be considered to have been traced. Of course this approach works only if each student action can be traced by the application of exactly one rule. An example production (with the additional antecedent in the production shown in bold-italic typeface) is: IF the current state is n 1 and the student has moved to state n 3 THEN provide appropriate feedback In the traditional way of writing the rule, the bold portion of the rule antecedent would be absent and the rule consequent would be then move to state n3. With the productions augmented in this manner, the comparison of the student s action with the actions anticipated by the rule is done within the rule engine itself and a costly external search is not needed. The normal process of rule execution alone suffices for model-tracing as well. The advantages are that the system is simplified, and the power of the rule engine s optimizations to check which rules can be fired is brought to bear on the model-tracing process. An example of a rule written in this way for the domain of statistical hypothesis testing is given in Table 2. Table 2: Example of augmented production rules written in JESS ; Rule 1 (defrule decision-1pm-leq-1 (ready-to-decide) (problem (problemtype "1PM<=") (zalpharight?cutoff&~nil) (zvalue?z&~nil) (test (>?z?cutoff)) (decision?d&~nil&:(eq?d "Reject null"))) (addresult (fetch results) nil CORRECT (create$ "decision"))) Rule 1 in Table 2 deals with the situation when the student has calculated the cutoff and z values for the problem and is therefore in a position to arrive at the final decision (either Reject the null hypothesis or Not reject the null hypothesis ). This rule applies when the (sample) z value is greater than the cutoff (critical z) value, and the student has chosen the

option to Reject the null hypothesis. For this type of problem (as tested in the rule antecedent), and the relative values of the cutoff and z values, this decision is correct. That is, under the given conditions, the expert action is to Reject the null hypothesis. Instead of putting this as an action in the rule consequent, we have included it in the rule antecedent (bold portion). If this rule fires, then it implies that the student has indeed applied the corresponding expert rule and hence the student s action has been traced. 4. Top-Down Goal Decomposition We found that (at least in some problem domains) it might be an advantage to model the productions based on a top-down goal decomposition structure. McKendree [11] as shown the important role played by goal structure in tutoring effectiveness. As an example of goal structure, consider the expert goal structure of the statistical hypothesis testing problem ([12]) shown in Table 3. Each node in Table 3 represents a goal, and the nodes emanating from a node represent its subgoals. The numbers attached to the subgoals indicate the sequence in which they need to be satisfied, with identical sequence numbers at a given level indicating indifference. In some instances, the ordering simply reflects pedagogically desirable sequencing. (These ideas hold for any problem that requires a series of steps in the solution and it is not required that the steps be followed in a single, sequential order. Another example of such a problem domain is physics mechanics problems.) This approach stands in contrast to a bottom-up scheme that just specifies the sequencing of the goals without any indication of the hierarchical goal structure. Table 3: Expert goal structure for hypothesis testing Table 4 shows some of the production rules for the goal structure shown in Table 3. We use a template called problem whose slots represent the state variables of the problem (variables for which the student supplies values, and other variables that represent problem data). We use the backward chaining capabilities of JESS to induce subgoals from a main goal through JESS backward chaining reactive templates. Briefly, Rule 1 says that if the student has successfully met all the subgoals needed for making a decision, and the decision is to reject the null hypothesis, and the z value is greater than the cutoff value (both of which would have been calculated while satisfying the subgoals), then mark the decision as correct. In the conventional way of writing the productions, the antecedent would not have the clause dealing

with the decision slot; instead the consequent would state that the decision should be to reject the null hypothesis. At the start, when the student has done nothing, the ready-to-decide fact is unavailable, but since it is declared as backward chaining reactive, JESS will try to fire a rule that can assert it. It does this by automatically asserting a need-ready-to-decide fact, which is the first antecedent of Rule 2. The following three antecedents are also of backward chaining reactive templates and this causes further rule firings to try to get those facts asserted. Rule 2 shows how the ready-to-decide fact is asserted once its subgoals are satisfied. Rule 3 demonstrates how the backward chaining goes one step further into asserting the hypotheses-established fact. In this manner JESS backward chaining is able to unfurl the goal structure of Table 3. Based on Table 3, the very first act that the student can legally perform is to calculate a value for MuZero. Although all the pertinent rules are not shown, if the rules are structured as shown, Rule 4 (expert) or Rule 5 (bug) dealing with MuZero will eventually fire depending on the value supplied by the student. Another possible bug rule in this context is that the student first supplies the null hypothesis, before supplying a value for MuZero. Rule 6 shows this bug rule. Table 4 Sample production rules for statistical hypothesis testing illustrating top-down goal decomposition ; Rule 1 (defrule decision-1pm-leq-1 (ready-to-decide) (problem (problemtype "1PM<=") (zalpharight?cutoff&~nil) (zvalue?z&~nil) (test (>?z?cutoff)) (decision?d&~nil&:(eq?d "Reject null"))) (addresult (fetch results) nil CORRECT (create$ "decision"))) ; Rule 2 (defrule decompose-ready-to-decide (need-ready-to-decide) (hypotheses-established) (critical-value-computed) (statistic-value-computed) (assert (ready-to-decide))) ; Rule 3 (defrule decompose-hypotheses-established (need-hypotheses-established) (null-hypothesis-established) (alternate-hypothesis-established) (assert (hypotheses-established))) ; Similar rules to decompose null and alternate hypotheses not shown. ; Those rules will backward chain to Rule 4 below. ; Rule 4 (defrule mu-zero-correct (need-mu-zero-computed)

(problem (muzero?muzero) (studentmuzero?muzero&~nil)) (addresult (fetch results) nil CORRECT (create$ "studentmuzero")) (assert (mu-zero-computed))) ;Rule 5 (defrule mu-zero-wrong (need-mu-zero-computed) (problem (muzero?muzero) (studentmuzero?smz&~nil&:(not-eq?muzero?smz))) (addresult (fetch results) "wrong muzero" WRONG (create$ "studentmuzero"))) ;Rule 6 (defrule null-hyp-instead-of-mu-zero (need-mu-zero-computed) (problem (muzero nil) (nullhyp?nh&~nil)) (addresult (fetch results) "wrong muzero" WRONG (create$ "null-hypfor-muzero"))) The top-down goal decomposition shown in Table 4 enables the tutor to provide planning advice via a set of guidance rules (example shown in Table 5). At some stage, suppose that one possible next step for the student is to specify the null hypothesis, and the student asks for guidance at this stage. Based on the subgoals already satisfied, the tutor knows exactly where the student stands. Based on the goal decomposition that the rules already fired imply, it can provide guidance along the lines of In order to arrive at a decision, you need to establish the hypotheses, calculate the critical value and calculate the statistic value. In order to establish the hypotheses, you need to calculate MuZero, which you have already done. You now need to establish the null hypothesis. Rather than just telling the student what the next step is, the tutor is able to do so in the context of the overall problem goals. The rule s consequent assembles the guidance message for the user interface based on messages associated with each goal/subgoal. Table 5 Example of a guidance rule (defrule null-hypothesis-guidance (need-null-hypothesis-established) (mu-zero-computed) (problem (nullhypsign nil)) (store-guidance)) 4. Conclusions Model tracing algorithms are central to cognitive tutors, but have not been a focus of much published research. We have reviewed the approaches that have been taken thus far to the problem. While the general task of model-tracing is complex, we have found that in many cases it involves only a single rule. In such cases a simpler approach is appropriate and we have presented such an approach in which the process of rule execution alone suffices for

model-tracing also. This eliminates the need for a costly tree/graph search over the search space of the problem s state-transitions. Furthermore, we have suggested a way of writing the production rules based on top-down goal decomposition. This approach allows planning advice to be automatically anchored around the problem s goal structure, thereby providing the student with a rationale for the next step, rather than just suggesting it. Acknowledgment The authors wish to gratefully acknowledge Ken Koedinger for his valuable feedback. References [1] Koedinger, K. R. (2001). Cognitive tutors as modeling tool and instructional model. In Forbus, K. D. & Feltovich, P. J. (Eds.) Smart Machines in Education: The Coming Revolution in Educational Technology, (pp. 145-168). Menlo Park, CA: AAAI/MIT Press. [2] Koedinger, K. and Anderson, J. (1997). Intelligent Tutoring Goes to School in the Big City. International Journal of Artificial Intelligence in Education, 8, 30-43. [3] Kodaganallur. V, Weitz. R, Rosenthal. D. (2005) Comparison of Model tracing and Constraint Based intelligent tutoring paradigms, International Journal of AI in Education, 15(2). [4] Anderson, J. R. (1993). Rules of the Mind. Erlbaum, Hillsdale, NJ. [5] Pelletier, Ray (1993). The TDK Production Rule System. Master Thesis, Carnegie MellonUniversity. [6] Anderson, J. R. & Pelletier, R. (1991). A Development System For Model-Tracing Tutors. In Proceedings of the International Conference of the Learning Sciences, 1-8. Evanston, IL. [7] Aleven, V., McLaren, B. M., Sewall, J., & Koedinger, K. (2006). The Cognitive Tutor Authoring Tools (CTAT): Preliminary evaluation of efficiency gains. In M. Ikeda, K. D. Ashley, & T. W. Chan (Eds.), Proceedings of the 8th International Conference on Intelligent Tutoring Systems (ITS 2006), (pp. 61-70). Berlin: Springer Verlag. [8] Friedman-Hill, E. (2003). Jess in Action: Rule-Based Systems in Java. Manning Publications, Greenwich, CT. (See also the JESS homepage at: http://herzberg.ca.sandia.gov/jess/index.shtml.) [9] Koedinger, K. R., Aleven, V., McLaren, B., and VanLehn, K. Lecture notes, 1st Annual PSLC LearnLab Summer School, June 27 - July 1, 2005, Carnegie Mellon University, Pittsburgh, PA. [10] Korf, R. E. (1985), Depth-First Iterative-Deepening: An Optimal Admissible Tree Search. Artificial intelligence, 27, 97-107. [11] McKendree, J. E. (1990). Effective feedback content for tutoring complex skills. Human Computer Interaction, 5, 381-414. [12] Levine D.M., Stephan D., Krehbiel, T.C. and Berenson M.L. (2001). Statistics for Managers Using Microsoft Excel (3rd edition), Upper Saddle River, New Jersey: Prentice Hall.