A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique

Similar documents
Appendix L: Online Testing Highlights and Script

Introduction to Moodle

Lectora a Complete elearning Solution

Class Numbers: & Personal Financial Management. Sections: RVCC & RVDC. Summer 2008 FIN Fully Online

An Industrial Technologist s Core Knowledge: Web-based Strategy for Defining Our Discipline

Implementing a tool to Support KAOS-Beta Process Model Using EPF

Using Moodle in ESOL Writing Classes

Prototype Development of Integrated Class Assistance Application Using Smart Phone

Automating Outcome Based Assessment

ATENEA UPC AND THE NEW "Activity Stream" or "WALL" FEATURE Jesus Alcober 1, Oriol Sánchez 2, Javier Otero 3, Ramon Martí 4

learning collegiate assessment]

2 User Guide of Blackboard Mobile Learn for CityU Students (Android) How to download / install Bb Mobile Learn? Downloaded from Google Play Store

Using Blackboard.com Software to Reach Beyond the Classroom: Intermediate

Rule Learning With Negation: Issues Regarding Effectiveness

Approaches for analyzing tutor's role in a networked inquiry discourse

MyUni - Turnitin Assignments

Five Challenges for the Collaborative Classroom and How to Solve Them

Using SAM Central With iread

Digital Fabrication and Aunt Sarah: Enabling Quadratic Explorations via Technology. Michael L. Connell University of Houston - Downtown

Specification of the Verity Learning Companion and Self-Assessment Tool

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

IVY TECH COMMUNITY COLLEGE

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Bluetooth mlearning Applications for the Classroom of the Future

On-the-Fly Customization of Automated Essay Scoring

The Moodle and joule 2 Teacher Toolkit

Multimedia Courseware of Road Safety Education for Secondary School Students

ZACHARY J. OSTER CURRICULUM VITAE

Virtual Seminar Courses: Issues from here to there

Test Administrator User Guide

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Evidence for Reliability, Validity and Learning Effectiveness

Evaluation of Learning Management System software. Part II of LMS Evaluation

Rule Learning with Negation: Issues Regarding Effectiveness

SOFTWARE EVALUATION TOOL

Measuring Deliberation's Content: A Coding Scheme

Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition

Online Marking of Essay-type Assignments

Blended E-learning in the Architectural Design Studio

ASSESSMENT REPORT FOR GENERAL EDUCATION CATEGORY 1C: WRITING INTENSIVE

Session Six: Software Evaluation Rubric Collaborators: Susan Ferdon and Steve Poast

Scott Foresman Addison Wesley. envisionmath

Towards a Collaboration Framework for Selection of ICT Tools

SECTION 12 E-Learning (CBT) Delivery Module

STUDENT MOODLE ORIENTATION

Teachers Guide Chair Study

Trust and Community: Continued Engagement in Second Life

Pod Assignment Guide

Teaching ideas. AS and A-level English Language Spark their imaginations this year

10 Tips For Using Your Ipad as An AAC Device. A practical guide for parents and professionals

Experience and Innovation Factory: Adaptation of an Experience Factory Model for a Research and Development Laboratory

re An Interactive web based tool for sorting textbook images prior to adaptation to accessible format: Year 1 Final Report

Justin Raisner December 2010 EdTech 503

Word Segmentation of Off-line Handwritten Documents

Impact of Digital India program on Public Library professionals. Manendra Kumar Singh

Situational Virtual Reference: Get Help When You Need It

New Ways of Connecting Reading and Writing

This is the author s version of a work that was submitted/accepted for publication in the following source:

Renaissance Learning P.O. Box 8036 Wisconsin Rapids, WI (800)

Moodle 2 Assignments. LATTC Faculty Technology Training Tutorial

Field Experience Management 2011 Training Guides

Eduroam Support Clinics What are they?

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

Using Virtual Manipulatives to Support Teaching and Learning Mathematics

Communication around Interactive Tables

Ministry of Education, Republic of Palau Executive Summary

Introduction to the Practice of Statistics

Learning Microsoft Publisher , (Weixel et al)

Experience College- and Career-Ready Assessment User Guide

Test Effort Estimation Using Neural Network

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

DO NOT DISCARD: TEACHER MANUAL

UML MODELLING OF DIGITAL FORENSIC PROCESS MODELS (DFPMs)

Causal Link Semantics for Narrative Planning Using Numeric Fluents

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Star Math Pretest Instructions

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

Cambridge NATIONALS. Creative imedia Level 1/2. UNIT R081 - Pre-Production Skills DELIVERY GUIDE

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

LEGO MINDSTORMS Education EV3 Coding Activities

Visit us at:

BUILD-IT: Intuitive plant layout mediated by natural interaction

Applying Fuzzy Rule-Based System on FMEA to Assess the Risks on Project-Based Software Engineering Education

Requirements-Gathering Collaborative Networks in Distributed Software Projects

Concept mapping instrumental support for problem solving

COVER SHEET. This is the author version of article published as:

Interactions often promote greater learning, as evidenced by the advantage of working

THE WEB 2.0 AS A PLATFORM FOR THE ACQUISITION OF SKILLS, IMPROVE ACADEMIC PERFORMANCE AND DESIGNER CAREER PROMOTION IN THE UNIVERSITY

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

Web-based Learning Systems From HTML To MOODLE A Case Study

TA Certification Course Additional Information Sheet

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Clerical Skills Level I

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Unit purpose and aim. Level: 3 Sub-level: Unit 315 Credit value: 6 Guided learning hours: 50

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Helping Graduate Students Join an Online Learning Community

REVIEW OF CONNECTED SPEECH

1. READING ENGAGEMENT 2. ORAL READING FLUENCY

Transcription:

A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique Hiromi Ishizaki 1, Susan C. Herring 2, Yasuhiro Takishima 1 1 KDDI R&D Laboratories, Inc. 2 Indiana University Bloomington Abstract We introduce an efficient coding system for dynamic topic analysis (DTA), a computer-mediated discourse analysis technique that codes and visualizes topical development over time in online discussions. Our system provides three main functionalities: intuitive coding with a touch screen interface, automated inter-rater agreement computation, and visualization of the coding results. Using the system, we conducted a preliminary DTA of 28,131 user comments from the popular music distribution sites SoundCloud and Last.fm. The analysis shows that most SoundCloud and Last.FM comments are narrowly on-topic and prompt-focused, as compared to discussions in other social media that exhibit more topical elaboration. Keywords: Computer-mediated discourse analysis; dynamic topic analysis; android application doi: 10.9776/16452 Copyright: Copyright is held by the authors. Contact: ishizaki@kddilabs.jp; herring@indiana.edu; takisima@kddilabs.jp 1 Introduction The rapid increase in social media services allows many users to communicate with each other via online networks. These services produce large amounts of authentic communication data that can be mined to analyze user behavior and identify hidden user demands for service. The system design presented here is intended to facilitate such efforts. Computer-mediated discourse analysis (CMDA) (Herring, 2004), an approach based in linguistics, was developed to analyze behavior that takes place through online communication services. Dynamic topic analysis (DTA) (Herring, 2003) is a CMDA technique specifically designed to analyze how discussion/conversation evolves over time by focusing on transitions between topical units. Studies have employed DTA to analyze online user behaviors, including in political discussion (Stromer-Galley & Martinson, 2009), dyadic exchanges on Twitter (Honeycutt & Herring, 2009), text chat during multiplayer online games (Herring et al., 2009), and comments posted to online music distribution sites (Ishizaki et al., 2013). VisualDTA (Herring & Kurtz, 2006) is a tool that visually represents the flow and coherence of online conversations. Using VisualDTA, Herring (2013) identified a shift in topic development patterns in computer-mediated communication over time, from a step-wise pattern in early chat and discussion forums to a prompt-focused pattern in recent media-sharing and social network sites. DTA requires manual coding of the data. Messages are broken down into topical propositions, and each proposition is coded in relation to the previous proposition (if any) it relates to as: on-topic, parallel shift, or break. Parallel shifts are further coded for their semantic distance from the previous proposition on a scale from 1 to 3. (By convention, on-topic propositions are assigned a semantic distance of 0, and breaks are assigned a semantic distance of 4.) A proposition is coded as "on-topic" when it expresses a simple reaction to, elaboration upon, or continuation of the same topic, or provides an expected response to a question. A "parallel shift" expresses movement of the conversation onto new ground that is related to what came before. A "break" indicates a non-sequitur or abrupt topic change, unrelated to anything that came before. As for the semantic distance of parallel shifts, a smaller number means that the relation between the proposition and what it relates to is immediately obvious, whereas larger numbers mean that the relation is less obvious but ultimately understandable. Different coders may understand the relations and semantic distance between propositions differently; therefore, inter-rater agreement is generally required to enhance the reliability of the analysis. Independent coders should code and compare the coded data for each item and discuss disagreement, repeating the process until an acceptable level of agreement is obtained. Coding is traditionally done using a text editor or spreadsheet application, and it is time-consuming work, especially for large datasets.

2 The Coding System We implemented a coding system for DTA that provides a touch screen interface, automated inter-rater reliability computation, and visualization of the coding results. For the touch screen interface, we implemented the coding application on an Android tablet (Figure 1). The interface enables coders to apply the DTA coding scheme intuitively without a keyboard, just by tapping to select the previous proposition, topic relation type, and semantic distance from pull down menus. On the left side of the tablet, a visualization window lets coders see their coding results instantly. We implemented the prototype application on the Android tablet, following the I/O format of VisualDTA in order to achieve full compatibility with VisualDTA. After the person administrating the coding prepares the input dataset, the coding application downloads the data from a server via a Wi-Fi network. If there is no network, the data can be stored in a micro SD card for the tablet. DTA visualization window Touch and select coding value from pull down menu Figure 1. Screenshot of the coding application on Android tablet We also implemented an automated inter-rater reliability computation function, which lets the analyzer compile coded data from the tablets and compute reliability based on measures of inter-rater agreement. In order to enhance the reliability of an analysis, we implemented five major measures: percent agreement, Holsti s coefficient, Scott s pi, Cohen s kappa, and Krippendorf s alpha (Holsti, 1969; Krippendorff, 2004). The analyzer can select a measure depending on the nature of the sample data and the coding situation. Since the system can easily detect and show disagreement on coded propositions, the coders can discuss disagreements and refine their coding results immediately. Finally, we implemented server-based visualization functions. The system computes turn-taking and DTA visualization based on a coded dataset; the graphical results can be shown as web pages. Figure 2 shows an example of DTA results applied to the SoundCloud commenting data. The left side of the figure shows the DTA visualization, and the right side shows the graphical turn-taking diagram. Additional information, such as speech act category, can be coded for each unit and displayed in the diagram, as shown in Figure 2. In order to avoid overlapping the plots and lines, we modified the original DTA visualization to display a curved line with a different color for each user to show the flow of conversation. In the turn-taking diagram on the right, the x-axis shows user IDs, and the y-axis shows time. Orange boxes are comments, and white boxes inside the orange ones are topical propositions. 2

Figure 2. An example of DTA results for SoundCloud comments (DTA visualization and turn taking) 3 Preliminary Data Analysis 3.1 Data Sample and Coding We applied DTA to a dataset we collected from two music distribution sites: Last.fm and SoundCloud. Last.fm is a distribution/streaming platform that functions as an Internet radio-based social network site. A feature of SoundCloud is that it allows users to insert a comment at a specific point in time of the track. We refer to these as timed comments. Text comments can also be posted below the waveform; we refer to these as "regular comments. The two modes of commenting are illustrated in Figure 3. For our preliminary analysis, we collected 58 music entries from the "house" and "pop" music genres on SoundCloud in October 2012. Each entry had between 100 and 1000 comments. We then collected all the entries from Last.fm that included the same songs as the SoundCloud sample (11 entries). As data for analysis, we extracted all 28,131 comments posted by 17,074 users on SoundCloud and Last.fm. We divided the comments into propositions based on sentence-final punctuation, resulting in 53,268 utterances in the dataset. A structural utterance roughly corresponds to a topical proposition. Following initial training, two coders independently assigned DTA coding to 524 randomly selected propositions from SoundCloud and compared their codes, and any issues that arose were resolved through discussion. Then the two coders independently coded each utterance in the dataset, and we extracted the coded utterances on which both coders agreed. In the end, 51,928 coded utterances were analyzed. 3.2 Results From the coding results, we found that most SoundCloud and Last.fm comments tend to refer back to the initial prompt: the song or its creator. Figure 4 shows an example visualization of dynamic topic transitions on SoundCloud and Last.fm. The comments mostly respond to the initial prompt of the song (proposition 0), expressing reactions to it or to the artist. In contrast, the propositions in the timed comments on SoundCloud are more likely to respond to previous propositions, as shown by the diagonal lines in Figure 5. Propositions are connected to next propositions in timed comments via on-topic reactions and sometimes rather tenuously-connected parallel shifts (e.g., propositions 6, 7, and 12b); the sequence also has two breaks (propositions 5 and 12a), comments unrelated to anything that came before. 3

Figure 3. The SoundCloud interface, with two timed comments expanded in the waveform. Regular comments appear below the waveform on the left. Figure 4. Visualization of dynamic topic transitions (First 20 propositions of LAST.FM entry ID 1090 and SoundCloud entry ID 1116 regular comments) 4

Figure 5. Visualization of dynamic topic transitions (SoundCloud entry ID 1149, timed comments) 4 Conclusion In this work, we implemented a coding system for dynamic topic analysis in order to provide a more efficient way to code large amounts of data using the DTA technique. We illustrated the results of DTA applied to a large amount of comment data collected from SoundCloud and Last.fm, as coded on an android tablet with a touch screen interface. The analysis shows that SoundCloud and Last.FM propositions tend to remain narrowly on-topic and prompt-focused, as Herring (2013) observed for social media communication, and that the topical patterns of regular and timed SoundCloud comments differ. In future work, we plan to conduct quantitative analysis based on the results of our visualization system and use the system to analyze other CMC data sets. 5 References Herring, S. C. (2003). Dynamic topic analysis of synchronous chat. New research for new media: Innovative research methodologies symposium working papers and readings. Herring, S. C. (2004). Computer-mediated discourse analysis: An approach to researching online behavior. In S. A. Barab, R. Kling, & J. H. Gray (Eds.), Designing for virtual communities in the service of learning (pp. 338-376). New York: Cambridge University Press. Herring, S. C. (2013). Discourse in Web 2.0: Familiar, reconfigured, and emergent. In D. Tannen & A. M. Tester (Eds.), Georgetown University Round Table on Languages and Linguistics 2011: Discourse 2.0: Language and new media (pp. 1-25). Washington, DC: Georgetown University Press. Herring, S. C., & Kurtz, A. J. (2006). Visualizing dynamic topic analysis. In Proceedings of CHI, 1-6. New York: ACM. 5

Herring, S. C., Kutz, D. O., Paolillo, J. C., & Zelenkauskaite, A. (2009). Fast talking, fast shooting: Text chat in an online first-person game. In Proceedings of the 42nd Hawaii International Conference on System Sciences (pp. 1-10). Los Alamitos, CA: IEEE Press. Holsti, O. R. (1969). Content analysis for the social sciences and humanities. Reading, MA: Addison- Wesley. Honeycutt, C., & Herring, S. C. (2009). Beyond microblogging: Conversation and collaboration via Twitter. In Proceedings of the 42nd Hawaii International Conference on System Sciences. Los Alamitos, CA: IEEE Press. Ishizaki, H., Herring, S. C., Hattori, G., Ono, C., & Takishima, Y. (2013). A computer-mediated discourse analysis of user commenting behavior on an online music distribution site. Forum on Information Technology 2013, Tottori, Japan, 12(3), 47-52. Ishizaki, H., Herring, S. C., Hattori, G., & Takishima, Y. (2015). Understanding user behavior on online music distribution sites: A discourse approach. Proceedings of iconference 2015. Krippendorff, K. (2004). Reliability in content analysis. Human Communication Research, 30(3), 411-433. Stromer-Galley, J, & Martinson, A. M. (2009). Coherence in political computer-mediated communication: Analyzing topic relevance and drift in chat. Discourse & Communication, 3(2), 195 216. 6