Personalization of Mathematical Documents

Personalization of Mathematical Documents Ingo Dahn University Koblenz-Landau, Knowledge Media Institute October 31, 2002 Abstract In mathematical education and research it is a rare case that books are used completely and in full detail. In fact, each course and each problem will require specific parts of mathematical knowledge. Slicing Book Technology provides authors of mathematical books with the possibility to help their readers to select what they actually need. Most importantly, these services can be added to already existing books. This will be demonstrated with results from the Trial-Solution project. We will also explain the work that has to be done to augment mathematical documents with these additional features for reuse and the tools that have been developed for this purpose. 1 Personalization the Math Case In mathematical education and research it is a rare case that books are used completely and in full detail. In fact, each course and each problem will require specific parts of mathematical knowledge. Teachers and researchers will make their selections from the available material. In the following we shall describe a way to support mathematicians in this work. Such a support is also needed in education, where the learner has necessarily an incomplete knowledge and is not aware about her needs. In fact, while the learner proceeds with his study, the character of his information needs will change considerably. These problems are, in fact, not peculiar to mathematics. They occur in a similar way also in other fields. However, mathematics is rather explicit about its content. A proof and a definition will refer explicitly or implicitly the presupposed knowledge, or this knowledge will be just given since the item is presented in a certain context. The crucial observation for the following discussion is, that the services that we need to help our teacher and student, can be provided automatically, exploiting the natural structure of mathematical documents and making only those few properties of mathematical knowledge explicit that are really needed to provide these services. When we confine ourselves subsequently to Mathematics, this is done since we have most experience here. In this field in fact, more than in other fields documents tend to have a clear and precise structure with very fine grained and 1

clearly identifiable knowledge objects. The major ingredience of our approach is, to make use of this structure of the document. But clearly structured documents occur in many fields we mention textbooks in other fields, legal documents and technical documentations as the most obvious examples and our technology can be transferred to these fields too. We are facing the situation that most mathematical documents are currently available in the quasi standard format of L A TEX. L A TEXpermits an easy adaptation to different presentation styles and has moreover standard ways to encode some relations between documents (citations) and within documents (references). It is even more favourable that L A TEXprovides standard ways to encode the structure of documents and the specific constructs that occur in mathematical documents. As in popular word processors, the author has a lot of freedom to influence the structure and the layout of the documents. However, unlike these word processors, L A TEXdiscourages deviations from the quasi standard. Moreover, in L A TEXthe easiest way for the author to introduce his/her own constructs is, to do it in a central style file where it can be easily found and adapted later if the need occurs. This leads to the fact that mathematical documents are better structured and easier to adapt to different presentation needs than documents from many other fields. However, note that all these features apply at least to the same extent to XML documents from any field. Unfortunately, such documents are still rare. 2 Slicing Book Technology Having documents with a machine-recognizable structure and with machinereadable meta information (for example to distinguish types of knowledge like theorems or proofs and relations between knowledge) opens up the possibility to access directly the knowledge contained in these documents and reuse it outside of the context it was written for. This is the essence of Slicing Book Technology which was introduced by the author in 1999. It consists of the following steps. 1. Decompose existing books into semantic units 2. Add a knowledge base of meta data 3. Design an intelligent advisory system that uses these meta data 4. Compose personalized documents tailored to the learner s current needs on the fly Slicing Book Technology was presented at the LearnTec Conference 2000 [1] and at the AERA Conference 2000 [2]. The first book using this technology [3] was published by Springer-Verlag in 2000 using the SIT-Reader a server software from Slicing Information Technology (http://www.slicing-infotech.de). The European project Trial-Solution was launched in February 2000. It investigates the potential of Slicing Book Technology for the combination of semantic units from different sources. The following discussion in this paper will summarize some of the experience gained within this project. Within the project tools for Slicing Book Technology are developed and a library of textbooks on undergraduate mathematics are reengineered. The 2

library of semantic units build in the project now comprises more than 25,000 units extracted from 3,600 pages of text. These semantic units are reusable parts which can be as small as an exercise, an example or a theorem or they can be as large as a proof. We mention in passing that the decision on the size of these slices is to a certain extent dictated by the intended reuse and by economic constraints of the possible reengineering efforts. Mathematical knowledge is organized in a systematic way. New knowledge is derived from other knowledge. Different pieces of knowledge theorems, definitions examples etc. serve different purposes. Thus mathematical knowledge is organized in a semantic network of interrelated objects of different types. When we understand this network and the way in which it is used in practice, we can design principles for the automated generation of documents for a particular purpose for a particular user with specific knowledge. This means, that knowledge objects cannot be considered in isolation. Not each collection of such objects will be meaningful or useful. The first step to support the generation of personalized documents that suit the reader s needs is, to explicate the semantic network of knowledge objects. To achieve this, the Trial-Solution project has implemented a number of tools. The first of these tools the Splitter takes a L A TEXdocument and slices it into a hierarchy of semantic units. This tool works automatic, though some configuration for the special usage of L A TEXby a particular author may be needed. The Splitter will also identify definitions, theorems etc. and key phrases assigned by the author. In the next step, an automated key phrase assignment system will try to propose key phrases that characterize the content of groups of slices. This is as far as we can go automatically. At this stage, mathematical competence is required to revise what the automated tools have produced. New key phrases must be assigned and relations between the slices must be determined. The most important relation states that understanding some slice is a prerequisite to understanding a second slice. This work of assigning metadata is done mostly by students who have mastered the subject, supervised by experienced teachers. The authoring tool developed in the Trial-Solution project supports this work. When we are working with a library of sliced books, we are facing the problem that different people may have introduced different key phrases for the same concept. These different key phrases must be adjusted in order to determine automatically related content in different books. The Trial-Solution Metadata Server is the tool to achieve this. It has an automated component, but it also lets human experts revise thesauri submitted by authors or slicing book reengineers. As a result, the Metadata Server will generate a configuration file that can be used by the authoring tool to unify the key phrase assignments. The last tool of the Trial-Solution tool set is the Delivery Tool. This is, what the teacher and student will use. It has built in an automated Knowledge Management System. This system applies Artificial Intelligence methods, based on Mathematical Logic, to determine from information about the user and from the metadata of the sliced book the relevant context that has to be provided for the particular slices in order to make up useful documents for the particular user in a particular situation. The Knowledge Management System is configured by a set of rules that describe the relevant properties that must be obeyed by the intended documents. 3

Slicing Book Technology enables a series of new services for the reuse of knowledge. These services will be described in the next section. 3 Slicing Book Technology at Work Slicing Information Technology (SIT) and the Trial-Solution project have developed tools that realize such services. Both tools are server based and require on the reader s side only a recent web browser and a viewer for pdf files. The SIT-Reader is a commercial product that supports the personalization of individual textbooks. The Trial-Solution Delivery Tool is a prototypical tool that can handle multiple books within the same system. In the following we describe an application of the Trial-Solution Delivery Tool, the user interface of the SIT- Reader works in a similar way in fact, the user interface of the Trial-Solution Delivery Tool was implemented by SIT on the basis of the SIT-Reader. As an example let us consider the preparation for an exam. To find suitable problems, the teacher can use the tool to search specifically for exercises on a particular topic. Topics are described by key phrases. It does not matter, if she does not know exactly which key phrases are available in the library of sliced books it suffices to enter a substring of an interesting phrase. Even if not a substring of an existing key phrase, but of one of its synonyms is entered, the key phrase will be found. Also exercises on related topics and subtopics can be retrieved on request. This kind of search would not be possible in a library of printed books. In fact, the slices in the library of the Trial-Solution project are quite small, in the average each page of a book is decomposed into 9 slices. Some of these slices, especially exercises, can be as small as a single line. Others, like proofs can reach over a number of pages. In order to locate particular slices of this size, a large system of key phrases is needed. It is unrealistic to expect the user to browse such a long list of some 1000 key phrases, however using electronic search systems, they can be easily used. Our teacher may inspect the proposed exercises and select some of them for the exam. She collects them in a personal book on the server, if necessary she repeats this procedure to find more exercises. To help students to prepare for the exam, she may in parallel compile a collection of similar exercises that can be used by the students. But each student may need another information, some need more explanations, others need less. So she will not only publish the collection of training exercises printed or as a pdf file(mathematical documents pose high demands to typesetting, so pdf generated from L A TEXwill is the preferred output format), instead she will send to the students a description of her collection. Each student can import this description and add further information according to his personal needs. After the import, the student will find a new personal book on the system. To get help for particular exercises, the student can select them and apply the scenario Get prerequisites. Then the system will add all slices that are referred by the selected exercises. In this way, the student gets for each exercise the slices describing the theory that is needed to solve it. So, even if the book does not contain solutions, the student may be pointed into the right direction for finding the solution himself. 4

Our student may require more help - the scenario Related examples will add it, eventually even from other books. In its user profile the student may have specified that he prefers content from a book that is particularly close to his interests. Then the related examples will be collected from this book if possible. The Trial-Solution Delivery Tool knows how to build useful documents. For example if an interesting example says We modify the function from the previous example as follows the previous example will be added too. Eventually, all this may not be enough help, the student may finally realize that he has to learn the theory in order to understand the examples. So he selects a scenario Add theory. Quite similar to the planning of a lecture described above, the tool will compose a script that comprises exactly those slices that lead to the selected examples and exercises and will omit all parts that are not needed for this purpose. So far, the tool knows nothing about the user. Therefore it must come up with a large script starting right at the introductory chapters of available books. But that may be too much, since our student has already well understood some parts. But instead of cutting out all known slices, it is sufficient when our student selects a few known slices that he has understood recently and marks them as known. Then, the tool can infer that he has understood also all their prerequisites etc. A subsequent selection of the Add theory scenario for the learning objectives will then omit all these slices that are inferred as known. The reader is invited to investigate many of these possibilities on its own at http://www.slicing.de/books/. 4 Outlook For the future we expect a number of extensions of the knowledge management services described above. For example, the personalized books may be sent to a print on demand service. In fact, this has been already implemented in the project and is tested in cooperation with a print on demand service at the Technical University Chemnitz. However, since the user has always the possibility to request from the Delivery Tool a small new document that only contains what he actually needs, it may be usually more comfortable to him to print these documents by himself. Only for the production of tailor-made scripts that have to be studied systematically, a print on demand service may be an option. One deficiency of the technology described here is, that information on the knowledge of the user can be only obtained directly from the user. Due to the restriction of the source material preexisting documents an automated or semiautomated system to assess the knowledge of the user would have to be implemented separately. This goes beyond the project, however, the Delivery Tool of the project is equipped with an open interface that permits external assessment systems, after passing a security check, to modify the user models. This interface has been successfully tested with the assessment system of the e- learning platform WebCT. Thus Slicing Book Technology can bridge the current gap between the large content base available in approved textbooks and the new e-learning services. For a wide deployment, the current architecture will have to be extended. 5

Instead of a central database of materials we envisage a distributed system where several sites offer their material, negotiate automatically about conditions of delivery within a framework of contracts between the rights owners (including the possibility of free delivery) and where the knowledge management systems of each site are specialized in certain fields and communicate with each other to find the most appropriate knowledge for their human clients. Mathematics has the privilege to investigate these possibilities first. References [1] I. Dahn:Symbiose von Buch und Internet, Proc. Learntec 2000, Karlsruhe 2000, p. 551-558 [2] I. Dahn: Automatic Texbook Construction and Web Delivery in the 21st Century, J. of Structural Learning and Intelligent Systems, vol. 14(4), 2001, pp. 401-413 [3] H. Wolter, I. Dahn: Analysis Individuell, Springer-Verlag, Heidelberg 2000 6