Interaction Matrix Model for Language Production Steven Gibson Cal State University, Northridge March 12, 2011 Abstract The Congruent Interaction Matrix (CIM) model is being formulated to represent knowledge updating and language production. The line of research taken in this paper, is limited to a set of questions restricted to concepts and actions that can be used in modeling human language behavior. The Congruent Interaction Matrix model introduced in this paper proposes virtual structures represented as matrices. The theoretical and practical value of developing this framework and set of algorithms is discussed, in order to create tools useful for modeling human communication interactions. Possible future research studies and applications are suggested. The development of these tools could have implications for human and machine communication analysis and production. Keywords: knowledge representation, model, framework, communication, automated reasoning Paper Submission to ICAI 2011 2011 World Congress in Computer Science Contact: Steven Gibson steven.gibson.737@my.csun.edu 1
1. Introduction Computer generation of human language is now in common use. Improvements in computer language production may produce more diverse output through the application of new concepts and algorithms. The construction of a Congruent Interaction Matrix (CIM) virtual model could be useful in representing communication behaviors. The CIM Model is composed of a data structure, information and a set of rules to mediate interactions. The structure is implemented as matrices which are composed of cells that can include information enmeshing individual interaction details. These components in the form of cells also represent ways to recall stored meaning. These matrix components can link to each other based on content and meaning. Each CIM component may contains a rule which may actuate and then result in that component becoming part of the one active matrix. During communication interactions the active matrix results from the rules being actuated. In this way a model can be built in which individual rules interact during simulated or real interactions. The matrix experiences interaction results when the matrix is reorganized by the actuation and matching of components. This report explains the design and implementation of software that can model an aspect of language use. This software postulates that human linguistic activities can be treated as matrices that can be matched, recalled and replaced. The model represents activity of matrices in the production of language behaviors. During a simulated run the model represents the interpretation of sensory data and the responding language behavior. The model is proposed to offer a realistic approach to language production. This software demonstrates matrix interactions used in knowledge production and language production. 2. Research Question There would be value to building a mathematical or algorithmic model for human speech production. This model would need to represent how people chose the words and sentences in response to input. And then aspects of the model need to demonstrate how to generate new language or new knowledge. This system for modeling language production should be testable with simulations or mathematical modeling. It seems likely that humans experience knowledge updating in a way that can be modeled by computers [1]. Inspiration for this approach to language comes from data driven approaches to language generation like VINC[2]. 3. Interaction Matrix The CIM model can be described in different ways. This section will give some descriptions of the mathematical and algorithmic model in Congruent Interaction Matrix. The key aspects of the model is the data structure approach to interaction components and how interactions actuate new components. This model assumes that individual communication interactions use stored data and can undergo knowledge updating and change during its lifetime. A list of aspects of the Interaction Matrix model includes a 1
collection of data points, response cues and programed components. Statistical rules as well as single cases should be equally valid with this model. We first will look at individual cases, but the model may also hold for groups. Because every individual is unique, variation is normal. Interaction matrix properties shift because individuals have built in knowledge and respond to new input through language. There is a natural variation in the internal knowledge of individuals and their responses. In each individual case the internal state contains the components and any other knowledge relevant for engaging in verbal behavior. A complete Interaction Matrix process will involve perception, conceptualization, component lookup, parsing and speech production. This is intended to model language behavior in the world. The process for an language actor consists of a two-way association between input components (individual words, and other components) and internal components (words, meanings, connections). Each matched association will generate a score. When a actor needs to process components, she looks up all possible component matches, orders them and picks the one with the best score for generation of a new matrix. Thus the resulting matrix uses the components with the highest score as elements in the active matrix. New meaning is defined in the process through the choice of components. 3.2 Spreadsheet Representation One metaphor to use in visualizing this model is a spreadsheet metaphor [3]. The active matrix can be equated to a column in a spreadsheet while future potential matrices can be seen as additional columns. Each component that is part of the communication can be represented as a cell in the column. The active matrix is updated in response to a match between input with potential matrices. Table 1: Sample CIM spreadsheet format. Active To Merge 1st Potential Match Words hello Hello goodbye Output Words hello friend Hello goodbye friend Emotion happy joy fear Relationship friend enemy stranger Status superior inferior equal Face smile frown neutral Countdown timer 2 3 2 Congruence Value na 2 na The input column is not shown in table one. The input is modeling what the individual perceives and constructs from sensory input. The components in the input is what is matched to other columns to result in a new active column. Additional potential columns would exist to the right of the 1st potential column. So there would be a 1st, 2nd, 3rd... Nth series of columns. 2
3.2 Pseudocode Representation Here is a high level description of one iteration of a simulated run of the computer version of the model: process input from user interface (1) begin timer stopwatch based on active column value (2) perform search on 1st column in queue (3) until at least one element matches continue search for additional matches return success if at least one match occurred repeat search on succeeding columns until out (4) of columns or timer runs down When timer runs out (5) if at least one successful column was returned, make it active if no successful column returned, do not change active if active was changed return sentence (6) prepare to process new input (1) This pseudocode version does not retain execution order or data structure. 3.3 Text description The software system is broken down into modules to handle aspects of the task. Modules will process the submitted cues and send out requests for matches. Each possible match will be returned by a module and kept in a buffer until the stopclock tics down. Better matches will continue to be sent to the buffer until the timer runs down. Once the timer runs down, a kill message will be sent to all remaining modules to stop the searching. Each possible match that reaches the buffer will be compared by quality with the previous match of this series. The better match will be retained while the other will be discarded. After the timer runs out the successful match will be moved to the top of the queue for subsequent searches. 4. Applications This model may serve in A.I. applications to aid in language research. Another use for this model is to examine knowledge production [4]. We can track effective decision selection. Modeling of existing language production methods can help understand human language use. There can be examination of styles and practices coming from using and maintaining an adequate communication system. Perhaps there can be steps taken to arrive at a whole new research program for linguistics. this may aid in explaining the characteristics and interactions between actors. And to model and produce possible concepts in language evolution. 3
5. Computer Simulation Part of the development of this model included a software component to aid the feasibility study of language production. A computer simulation is coded using Erlang. The system will take input including a representation of emotional cues. The concerns included concurrent searches for matches, input of data and result output. Erlang was the software system chosen to construct the model. Erlang is a general-purpose concurrent functional programming language, with strict evaluation, single assignment, and dynamic typing [5]. The software developed is called ERDlanguage and operates through a web interface to produce sentences in response to input. The software consists of a multi-module server backend that interacts with a web browser based interface for human input and computer output. 5.1 Screen Figure 1: Sample data screen. 4
5.2 Software The software should run anywhere Erlang can run, including Linux, Windows and Mac computers desktops and laptops. To download the software from this project visit: http://sourceforge.net/projects/erdialog/files/erdlanguage/ An online version is at: http://stevengibson.org:8101 6. Discussion In this paper a new model (CIM) has been advanced which is constructed with a mathematical and algorithmic structure. After initially providing a summary and boundary framework for the model, likely testbeds are identified where the model can be implemented and utilized. A working implementation is produced in the form of a software product. Some possible benefits for developing this model are listed. Also some aspects of an demonstration has been offered through the application of an example case. It is possible that this model could be applied to human language production and A.I. applications. The Congruent Interaction Matrix model could be tested in a heuristic manner while providing theoretically meaningful test results. Future CIM models could be produced in evolutionary ways similar to how linguistics describes language change. The hope is to develop one more explanatory framework for modeling language production, and in this way capturing a richer range of human communication interaction. Part of this model assumes that the study of human communications interaction requires building models based on actors whose internal states reflect a more diverse complexity of interaction components which can be modeled. Experiments need to include techniques and technologies that do language simulations and experiments to help develop corrections to the CIM model. The plan is to not implement CIM in one frozen state but in versions and releases that adapt continuously to the environment of human or artificial actors. The likely most significant contribution of this work will be in the field of communications studies where queries can be carried out in modeling human verbal dialog. Natural language production by computer has involved development of rules and data sets that produce meaningful output. While software development has achieved success, more efforts are in order. The rule and data set approach has resulted in inconsistent results and does not account for the need to account for metalinguistic judgments including pragmatic and contextual variation in language production [2]. This present software offers a first step in supporting language production. One part of the process of constructing natural language like outputs from human like linguistic inputs is shown here. A line of future research would involve the input and processing of valid human originated sentences to generate the language outputs. References [1] S. Gibson, Process algebra modeling of human communication, pp. 685 688, In H. R. Arabnia and D. de la Fuente and J. A. Olivas, editors, CSREA Press, July 2009. 5
[2] M. Levison and G. Lessard, A system for natural language sentence generation, Computers and the Humanities, vol. 26, no. 1, pp. pp. 43 58, 1992. [3] S. Powell, K. Baker, and B. Lawson, A critical review of the literature on spreadsheet errors, Decision Support Systems, vol. 46, pp. 128 138, Dec. 2008. [4] S. O. Hansson, A survey of non-prioritized belief revision, Erkenntnis (1975-), vol. 50, no. 2/3, pp. pp. 413 427, 1999. [5] J. Armstrong, Erlang, Commun. ACM, vol. 53, pp. 68 75, September 2010. 6