Effect of Foreign Language on Text Transcription Performance: Finns Writing English

Effect of Foreign Language on Text Transcription Performance: Finns Writing English Poika Isokoski and Timo Linden TAUCHI, Department of Computer Sciences FIN-33014 University of Tampere, Finland {Poika.Isokoski, Timo.Linden}@uta.fi In Proceedings of NordiCHI 2004, 109-112, ACM Press, 2004. This copy is posted with permission from the ACM and may not be redistributed. The definitive copy of the paper can be downloaded from the ACM Digital Library and located using the following DOI: http://doi.acm.org/10.1145/1028014.1028032 ACM COPYRIGHT NOTICE. Copyright 2004 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org.

Effect of Foreign Language on Text Transcription Performance: Finns Writing English Poika Isokoski and Timo Linden TAUCHI, Department of Computer Sciences FIN-33014 University of Tampere, Finland {Poika.Isokoski, Timo.Linden}@uta.fi ABSTRACT To promote inter-study comparability it is desirable to standardize experimental procedures in text entry experiments. This includes standardizing the language. The current trend is to use English. To clarify the implications of use of English in non-english speaking countries we measured text entry performance with a QWERTY keyboard for 16 participants transcribing phrases in two languages. The languages were Finnish - the first language of the participants - and English, in which the participants had considerable skill. English language entry was about 16% slower than Finnish. The participants also made more errors when transcribing English. Author Keywords Text entry, foreign language, keyboard ACM Classification Keywords H.5.2. Information interfaces and presentation: User Interfaces Input Devices and Strategies. INTRODUCTION Text entry experiments are undertaken all over the world. In the past experimental methods have varied and this complicates between-study comparisons. Efforts are underway to streamline and standardize text entry experiments. In particular, phrase sets and experimental procedures published by Soukoreff and MacKenzie [6, 9, 10, 11] are gaining popularity. For text entry researchers in non-english speaking countries this development is a challenge. If English phrases are to be used, we must decide how to design our experiments in a meaningful way. We can recruit native English speakers as test subjects or focus exclusively on issues specific to particular languages. Alternatively, we can use English and non-native English speakers in experiments and attempt to rationalize results that occur due to the use of foreign language. If there are no language effects, all is well. However, there is a feeling in the research community that such effects are present. Because of this further research is needed. Having ourselves produced a body of work using participants to whom English is a foreign language [2, 3, 4], we are very interested in the effects that the language might have. We have mostly measured lower text entry rates than reported elsewhere (compare for example the QWERTY soft keyboard results in [2] and [8]). Note that we are not the only ones faced with the language problem. Other researchers have reported results on non-native English speakers writing in English [1, 5]. Standardizing experimental procedures and materials is generally speaking a good idea. In some situations it is an equally good idea to digress from standards. For example when testing systems for the Chinese market, performance with the English language is likely to be irrelevant. Unfortunately, the choice is not always as clear; there are many languages that are written with approximately the same alphabet as English. In these cases using English instead of the local language is a reasonable option for improving inter-study comparability. Inter-study comparability of text entry results is becoming increasingly important. The number of text entry experiments published annually is increasing. A practitioner in the field wants to know which text entry systems to support in a product. If the performance results cannot be compared, it is questionable whether typical experiments contribute anything of practical value. Within-study comparisons can offer good internal validity regardless of the language. However, from a practical viewpoint they are a very slow way to find the best system. Our goal was to experimentally verify that the use of the participants first language produces different results than using English as a foreign language. While such a language effect seems highly probable, it has not, to our knowledge, been measured. In addition to verifying the existence of a language effect, we also wanted to know how large it is. There may be a wealth of information available on this issue from the era when transcription typing was a popular profession. However, the experimental procedures and the

measures that we use today are different from those used in typewriter era [10]. Furthermore, we wanted to see results measured with the same software that we have used in other experiments to ensure that issues such as the manner of presenting the text and the text itself would not interfere with conclusions regarding the results of our earlier experiments. METHOD Participants 16 unpaid volunteers (10 male, 6 female), all staff and students of our university, participated in the experiment. The mean age of the participants was 27.4 years (range 23-45, SD=5.6). All were native speakers of Finnish and had studied English in school for several years (mean=8.6, range 7-9 years). All but three used written or spoken English in their daily work. All participants had several years of experience in typing with the Finnish QWERTY layout. All in all, the participants represented the kind of mix in English and typing skills that we have previously used in our text entry experiments. Similar populations presumably exist at other sites that are involved in text entry research. Task The task was to transcribe as many phrases as possible in five minutes. The phrases were displayed one at a time as shown in Figure 1. A new phrase was displayed after the participant pressed the enter key at the end of the previous phrase. To avoid ill effects of premature presses of the enter key we required that the length of the transcribed phrase had to be no less than the length of the presented phrase minus one in order for the enter key to function. The participants were instructed to transcribe the phrases as fast as possible while making as few errors as possible. Correcting errors was possible by erasing text using the backspace key and re-typing it. As shown in Figure 1, the presented text was visible during the transcription. According to the results of Soukoreff and MacKenzie [10] this slows down text entry slightly in comparison to hiding the presented phrase during text entry. Apparatus The software that was used for presenting the phrases and collecting the keystroke data was the same that we used in an earlier experiment [4]. It was written in Java and run under Microsoft Windows using the Java runtime environment delivered with the J2SE software development kit by Sun Microsystems. The software selected the phrases randomly from a set of 500 phrases published by Soukoreff and MacKenzie [6]. Because entering lower case alphabet is not considered representative of typical text entry [7, 12], we had previously [4] modified the phrase set by adding upper case characters and punctuation where appropriate. For this experiment we additionally translated the whole phrase set Figure 1. The task display. into Finnish. We made no attempt at copying all aspects of the phrase set. Thus, instead of a faithful copy, the translation produced a phrase set in Finnish that shares most of the thematic and semantic content of the original set 1. The change of language did not change the average length of the phrases much. The Finnish phrases are on average 0.26 characters shorter than the English. Assuming that the phrase set is large enough, and that the translator used typical Finnish vocabulary, the resulting set is as close to typical Finnish as possible. So far we have not verified this formally. The traditional straight Finnish QWERTY keyboard layout was used. Some of the participants preferred the split design keyboards, but we forced them to use the traditional layout. This may have affected their performance. Because all participants had at some point used the straight layout, we used it rather than the split design. Letting the participants use their preferred design would have been possible, but some participants reported that they used both designs making personal preference difficult to verify. Overall, we did not consider the keyboard design to be a significant issue in this experiment. Procedure and Design Each participant performed four five-minute blocks of the task in one session. Both languages were used twice to see if significant learning took place, and if so, whether it was stronger with one language than the other. The order of presentation of the languages was balanced between subjects so that eight participants began with English (E) and eight with Finnish (F). Over the four blocks the presentation order was either EFEF or FEFE. Our dependent measures were text entry rate (in words per minute), errors left in the phrases (measured as the 1 For example, some phrases were conveniently expressed in one compound word ( a problem with the engine translates to moottoriongelma ). The text entry task in these two cases is different, but both phrases, are typical to the respective languages.

minimum string distance 2 [9]), and keystrokes per character. The independent within subject variables were the language (English or Finnish) and block number within language (1 or 2). Additionally the presentation order was included as a between subjects variable. A 2x2x2 (language x block x order) mixed model analysis of variance was used to test the effect of the independent variables on each of the dependent measures separately. RESULTS AND DISCUSSION Learning We did not observe a significant change between the first and the second block in either language in any of our dependent measures. However, it is a well-known fact that text entry performance improves with practice even when experienced users are being tested. We did not observe this merely because our experiment was too short. WPM 90 80 70 60 50 40 30 20 10 0 16 English 1 Finnish 1 English 2 Finnish 2 Figure 2. Text entry rate in the four blocks. Text entry rate We measured text entry rate in words per minute, where one word equals five characters (including punctuation and spaces). Participants were allowed to take pauses after pressing the pause button (see Figure 1), and there was a natural pause between phrases because the participants had to read the new phrase or a significant portion of it before they began typing. All these pauses and the first character following each pause were excluded from the text entry rate calculations. As expected, the language did have an effect on the text entry rate (F 1,14 =35, p<0.001). Average text entry rate with Finnish was 49.7 wpm. English was about 16% slower at 41.8 wpm. A Boxplot of the text entry rates for the four blocks is shown in Figure 2. Error rate For error rate measurement we utilized the MSD/KSPC methodology of MacKenzie and Soukoreff [9] rather than the newer unified metric [10]. We considered this simpler method sufficient, because we needed only a rough overall measure of error rate to see if participants made a different speed-accuracy trade-off with different languages. Error corrections during the transcription task were allowed. This led to two kinds of errors: those that were corrected and those that were not corrected. The minimum string distance (MSD) was used to count the errors that were not corrected. The keystrokes per character (KSPC) measure reflects the number of keystrokes used for correcting errors. If there are no extra keystrokes, KSPC is equal to one. If some errors are corrected, KSPC is greater than one. 2 Minimum string distance between two strings is equal to the minimum number of character insertions, deletions, and substitutions needed to transform one string to the other. Minimum String Distance The language did have a statistically significant effect on the number of errors left in the transcribed phrases (F 1,14 =5.8, p<0.05). Figure 3 shows the number of errors per character. The foreign language (English) exhibited higher MSDs. On average the MSD per character for English was 0.0044 and for Finnish it was 0.0025. KSPC The language also had a statistically significant effect on the number of keystrokes per character (F 1,14 =4.7, p<0.05). Like MSD, the KSPC data in Figure 4 show slightly higher values with English. The overall average KSPC was 1.114 for English and 1.098 for Finnish. Summary In summary, we found no statistically significant difference between the first and second blocks with a given language. Also, the presentation order of the languages did not have a statistically significant effect on any of the measures listed above. The interactions of language, training, and order MSD per Character.025.020.015.010.005 0.000 11 15 11 English 1 Finnish 1 English 2 Finnish 2 Figure 3. Average MSD in the four blocks. 2

KSPC 1.3 1.2 1.2 1.1 1.1 1.0 English 1 English 2 Finnish 1 Finnish 2 Figure 4. Average KSPC in the four blocks. were all non-significant (at p<0.05 level). Thus, of those factors tested, only language had a statistically significant effect. This effect was present in all of our three performance measures. CONCLUSIONS AND FUTURE WORK The language used in a text entry experiment does have an effect on performance. Native speakers of Finnish with good skills in English were about 16% slower when transcribing English. However, we cannot recommend 16% as a universal conversion factor because of several issues. Firstly, the effect of language may be different when the overall speed of text entry is different. Secondly, the different cognitive loading inherent in different text entry methods may intensify the difficulties with the foreign language. Thirdly, different languages may interact in different ways. Languages that are close to each other in sentence- and word structure may exhibit different performance differences than languages that are not similar. Fourthly, the foreign language also produced more errors meaning that at the same error rate it would have been even slower. The contribution of each of these factors to the performance penalty associated with a foreign language should be clarified in future work. The languages and participant pool in this experiment were the same that we have used in our earlier experiments. Therefore we suspect that the effect of the language may have been similar in those experiments. Ultimately, if inter-study comparability of text entry experiments is desired, we need a thorough understanding of the effects of languages and language skills. It may be possible to find correction coefficients that can be utilized to improve inter-study comparability of the results measured in different languages. Until precise coefficients are found, it is probably safe to assume that results measured with non-native speakers of English are likely to under-estimate the performance of native speakers. ACKNOWLEDGMENTS This work was supported by the Academy of Finland (grant 73987) and by Tampere Graduate School in Information Science and Engineering. REFERENCES 1. Goldstein, M., Book, R., Alsiö, G., and Tessa, S., Non- Keyboard QWERTY Touch Typing: A Portable Input Interface for the Mobile User, Proceedings of CHI 99, 32-39, ACM Press, 1999. 2. Isokoski, P., Performance of Menu-Augmented Soft Keyboards, CHI2004, Human Factors In Computing Systems, CHI Letters, 6(1), 423-430, ACM Press, 2004. 3. Isokoski, P and Raisamo, R., Device independent text input: A rationale and an example. Proceedings of the Working Conference on Advanced Visual Interfaces AVI2000, 76-83, ACM Press, 2000. 4. Isokoski, P. and Raisamo, R., Quikwriting as a Multi- Device Text Entry Method, Proceedings of NordiCHI 2004, ACM Press, 2004, 105-108. 5. Költringer, T. and Grechening, T., Comparing the Immediate Usability of Graffiti 2 and Virtual Keyboard. CHI2004 Extended Abstracts, 1175-1178, ACM Press, 2004. 6. MacKenzie, I. S., and Soukoreff, R. W., Phrase sets for evaluating text entry techniques. CHI2003 Extended Abstracts, 754-755, ACM Press, 2003. 7. MacKenzie I. S, and Soukoreff, R. W., Text Entry for Mobile Computing: Models and Methods, Theory and Practice, Human-Computer Interaction, 17(2&3), 147-198, Lawrence Erlbaum Associates, 2002. 8. MacKenzie I. S., and Zhang S. Z., The design and Evaluation of a High-Performance Soft Keyboard, Proceedings of CHI 99. 25-31, ACM Press, 1999. 9. Soukoreff, R. W., and MacKenzie, I. S., Measuring errors in text entry tasks: An application of the Levenshtein string distance statistic, CHI2001 Extended Abstracts, 319-320. ACM Press, 2001. 10. Soukoreff, R. W., and MacKenzie, I. S., Metrics for text entry research: An evaluation of MSD and KSPC, and a new unified error metric. CHI 2003, Human Factors In Computing Systems, CHI Letters, 5(1), 113-120. ACM Press, 2003. 11. Soukoreff, R. W. and MacKenzie I. S., Recent Developments in Text Entry Error Rate Measuremens, CHI2004 Extended Abstracts, 1425-1428, ACM Press, 2004. 12. Zhai S., Hunter, M., and Smith, B. A., Performance Optimization of Virtual Keyboards, Human-Computer Interaction, 17(2&3), 229-269, Lawrence Erlbaum Associates, 2002.