ONLINE FORUM THREAD RETRIEVAL USING DATA FUSION AMEER TAWFIK ABDULLAH ALBAHEM

Similar documents
SIMILARITY MEASURE FOR RETRIEVAL OF QUESTION ITEMS WITH MULTI-VARIABLE DATA SETS SITI HASRINAFASYA BINTI CHE HASSAN UNIVERSITI TEKNOLOGI MALAYSIA

yang menghadapi masalah Down Syndrome. Mereka telah menghadiri satu program

STUDENTS SATISFACTION LEVEL TOWARDS THE GENERIC SKILLS APPLIED IN THE CO-CURRICULUM SUBJECT IN UNIVERSITI TEKNOLOGI MALAYSIA NUR HANI BT MOHAMED

UNIVERSITY ASSET MANAGEMENT SYSTEM (UniAMS) CHE FUZIAH BINTI CHE ALI UNIVERSITI TEKNOLOGI MALAYSIA

PROBLEMS IN ADJUNCT CARTOGRAPHY: A CASE STUDY NG PEI FANG FACULTY OF LANGUAGES AND LINGUISTICS UNIVERSITY OF MALAYA KUALA LUMPUR

AN INVESTIGATION INTO THE FACTORS AFFECTING SECOND LANGUAGE LEARNERS CLASSROOM PARTICIPATION

UNIVERSITI PUTRA MALAYSIA TYPES OF WRITTEN FEEDBACK ON ESL STUDENT WRITERS ACADEMIC ESSAYS AND THEIR PERCEIVED USEFULNESS

UNIVERSITI PUTRA MALAYSIA IMPACT OF ASEAN FREE TRADE AREA AND ASEAN ECONOMIC COMMUNITY ON INTRA-ASEAN TRADE

UNIVERSITI PUTRA MALAYSIA RELATIONSHIP BETWEEN LEARNING STYLES AND ENTREPRENEURIAL COMPETENCIES AMONG STUDENTS IN A MALAYSIAN UNIVERSITY

NATIONAL INSTITUTE OF OCCUPATIONAL SAFETY AND HEALTH

Faculty Of Information and Communication Technology

SULIT FP511: HUMAN COMPUTER INTERACTION/SET 1. INSTRUCTION: This section consists of SIX (6) structured questions. Answer ALL questions.

UNIVERSITI PUTRA MALAYSIA SKEW ARMENDARIZ RINGS AND THEIR RELATIONS

UNIVERSITI PUTRA MALAYSIA

PENGGUNAAN ICT DALAM KALANGAN GURU PELATIH KEMAHIRAN HIDUP FAKULTI PENDIDIKAN, UTM

BODJIT KAUR A/P RAM SINGH

UNIVERSITI PUTRA MALAYSIA

IMPROVING STUDENTS SPEAKING ABILITY THROUGH SHOW AND TELL TECHNIQUE TO THE EIGHTH GRADE OF SMPN 1 PADEMAWU-PAMEKASAN

SYARAT-SYARAT KEMASUKAN DI TATI UNIVERSITY COLLEGE

GARIS PANDUAN BAGI POTONGAN PERBELANJAAN DI BAWAH PERENGGAN 34(6)(m) DAN 34(6)(ma) AKTA CUKAI PENDAPATAN 1967 BAGI MAKSUD PENGIRAAN CUKAI PENDAPATAN

UNIVERSITI PUTRA MALAYSIA ECONOMIC VALUATION OF CONSERVATION OF LIVING HERITAGE IN MELAKA CITY, MALAYSIA CHIAM CHOOI CHEA

TINJAUAN TENTANG AMALAN KOMUNIKASI GURU MATEMATIK SEKOLAH MENENGAH

Lulus Matrikulasi KPM/Asasi Sains UM/Asasi Sains UiTM/Asasi Undang-Undang UiTM dengan mendapat sekurangkurangnya

Jurnal Pendidikan Bahasa Melayu JPBM (Malay Language Education Journal MyLEJ)

INSTRUCTION: This section consists of SIX (6) structured questions. Answer FOUR (4) questions only.

Impact of Learner-Centred Teaching Environment with the Use of Multimedia-mediated Learning Modules in Improving Learning Experience

PEMBINAAN DAN PENILAIAN KESESUAIAN MODUL PENGAJARAN KENDIRI PERMODELAN OBJEK PADU MATA PELAJARAN REKABENTUK BERBANTU KOMPUTER

Pendekatan Pengajaran Guru Dan Kesannya Terhadap Pencapaian Pelajar Dalam Mata Pelajaran Kemahiran Hidup Di Sekolah Menengah Kebangsaan Senai, Johor

DFVBCPIft-m ASD (VALUATION OF A FIBRE OPTIC i.earning mudi.hi:: for iethnology-based. it mm. SVlViA t i s AI IIMS. i u»y I tuwv!...

PENGGUNAAN GAMBAR RAJAH DALAM MENYELESAIKAN MASALAH GERAKAN LINEAR SITI NOR HIDAYAH BINTI ISMAIL UNIVERSITI TEKNOLOGI MALAYSIA

KESAN DASAR PENGAJARAN MATEMATIK DAN SAINS DALAM BAHASA INGGERIS Dl SEKOLAH RENDAH

UNIVERSITI PUTRA MALAYSIA GENDER, PASSAGE CONTENT AND TEXT TYPES IN READING COMPREHENSION AMONG ESL LEARNERS

THE ROLES OF INTEGRATING INFORMATION COMMUNICATION TECHNOLOGY (ICT) IN TEACHING SPEAKING AT THE FIRST SEMESTER OF ENGLISH STUDENTS OF FKIP UIR

Jurnal Pendidikan Bahasa Melayu JPBM (Malay Language Education Journal MyLEJ)

UNIVERSITI PUTRA MALAYSIA

THE EFFECT OF USING SILENT CARD SHUFFLE STRATEGY TOWARD STUDENTS WRITING ACHIEVEMENT A

Research Journal ADE DEDI SALIPUTRA NIM: F

PROFORMA KURSUS Course Proforma. FAKULTI PENDIDIKAN Faculty of Education SEMESTER I, SESI

PENGGUNAAN BAHAN ILUSTRASI OBJEK PADANAN ABJAD (IOAP) BAGI MENINGKATKAN KEUPAYAAN MENGECAM HURUF KANAK-KANAK PRASEKOLAH

KURIKULUM STANDARD SEKOLAH MENENGAH SAINS PELAKSANAAN PENTAKSIRAN SEKOLAH

ILLOCUTIONARY ACTS FOUND IN HARRY POTTER AND THE GOBLET OF FIRE BY JOANNE KATHLEEN ROWLING

Dian Wahyu Susanti English Education Department Teacher Training and Education Faculty. Slamet Riyadi University, Surakarta ABSTRACT

BORANG PENGESAHAN STATUS TESIS

ISU KRITIKAL PENGGUNAAN TULISAN JAWI DALAM PELAKSANAAN KURIKULUM PENDIDIKAN ISLAM PERINGKAT SEKOLAH MENENGAH: PANDANGAN PAKAR

Abstrak. Masalah Pembelajaran Bahasa bukan Saintifik dalam Pembelajaran Sains

A SURVEY ON UTM TESL UNDERGRADUATES READING PREFERENCE: BETWEEN HYPERTEXTS AND BOOKS

PEMBELAJARAN MOBILE BAGI KURSUS JAVA DI POLITEKNIK

AMALAN PEMBANGUNAN PROFESIONAL (LDP) DALAM KALANGAN GURU SEKOLAH MENENGAH DAERAH MANJUNG

TAHAP PERANCANGAN BAHAN SUMBER, KEMUDAHAN DAN PERALATAN PENGAJARAN DALAM KALANGAN GURU PENDIDIKAN JASMANI

Syamsul Rizal Vera Fitria

KEPERLUAN SUSUNATUR DAN PERANCANGAN TAPAK BAGI KESELAMATAN KEBAKARAN (ARIAL 18 ) NORAINI BINTI ISMAIL FAKULTI ALAM BINA UNIVERSITI MALAYA 2007

KANDUNGAN BAB PERKARA MUKASURAT PENGAKUAN PELAJAR PENGHARGAAN ABSTRAK ABSTRACT SENARAI JADUAL SENARAI RAJAH SENARAI LAMPIRAN SENARAI SINGKATAN

TAHAP PENGUASAAN KEMAHIRAN MANIPULATIF DI KALANGAN GURU PELATIH KIMIA UNIVERSITI TEKNOLOGI MALAYSIA

FAKTOR-FAKTOR YANG MUNGKIN MEMPENGARUHI PERLAKSANAAN PROGRAM BIMBINGAN TAULAN DALAM MEMBANTU GURU SAINS MENGUASAI BAHASA INGGERIS DI SEKOLAH

PENGGUNAAN PERISIAN PAINTER DALAM PEMBELAJARAN ASAS LUKISAN DIKALANGAN PELAJAR FSM, UPSI

Novi Riani, Anas Yasin, M. Zaim Language Education Program, State University of Padang

JuKu: Jurnal Kurikulum & Pengajaran Asia Pasifik - Oktober 2014, Bil. 2 Isu 4

Agenda 4.2: LAPORAN KAJIAN KEPUASAN PELANGGAN TAHUN 2014

Abstrak. Penerapan Rutin Berfikir dalam Membina Penguasaan Kosa Kata Murid. Khuraisah Mohd Abthar

TEACHING WRITING DESCRIPTIVE TEXT BY COMBINING BRAINSTORMING AND Y CHART STRATEGIES AT JUNIOR HIGH SCHOOL

PROGRAM I SEE YOU ; APLIKASI PEMBELAJARAN MASTERI DALAM MATAPELAJARAN SAINS PMR DI SMK (P) TEMENGGONG IBRAHIM, BATU PAHAT, JOHOR ABSTRAK

Katakunci : E-learning, MOODLE, Photosynthesis, KBSM PENGENALAN

PENGHASILAN BAHAN E-PEMBELAJARAN BAGI TOPIK POLYGONS II UNTUK PELAJAR TINGKATAN TIGA BERASASKAN MOODLE

CHAPTER III RESEARCH METHODOLOGY. A. Research Method. descriptive form in conducting the research since the data of this research

KESAN PENGIKTIRAFAN MS ISO 9002 TERHADAP PSIKOLOGI DAN SITUASI KERJA GURU-GURUDAN KAKITANGAN SEKOLAH: SATU TINJAUAN

PENGESAHAN PENYELIA. Tandatangan : PROF DR. NOOR AZLAN BIN AHMAD ZANZALI

Jurnal Pendidikan Bahasa Melayu JPBM (Malay Language Education Journal MyLEJ)

Konflik Kerja-keluarga, Kesihatan Mental dan Kecenderungan Tukar Ganti Kerja dalam Kalangan Guru

PENGGUNAAN KOMPUTER DI KALANGAN GURU DALAM PENGAJARAN MATA PELAJARAN MATEMATIK DI DAERAH KOTA STAR, KEDAH DANIEL CHAN

THE ROLE OF ENGLISH TEACHERS ON HELPING PASSIVE LEARNERS IN CLASSROOM (A Study at The Ninth Grade Students of SMP N 31 Andalas Padang)

HUBUNGAN ANTARA KUALITI GURU BAHASA ARAB DAN KECENDERUNGAN MINAT PELAJAR DALAM BAHASA ARAB

BAB 4 METODOLOGI KAJIAN

An Investigation into Teacher Practice of Jigsaw Technique in Teaching Narrative for Eight Graders of SMPN 1 Menganti

HUBUNGAN ANTARA KEBIMBANGAN TERHADAP MATEMATIK DENGAN PENCAPAIAN DALAM KALANGAN PELAJAR SEKOLAH RENDAH

KOLABORASI DALAM PEMBELAJARAAN BERASASKAN MASALAH MELALUI PENDEKATAN LESSON STUDY

HUBUNGAN MINAT DAN SIKAP TERHADAP PENCAPAIAN PELAJAR DALAM KURSUS DPA3043 AUDITING. Fazlina Binti. Abd Rahiman. Aniza Suriati Binti Abdul Shukor

KEMAHIRAN BERKOMUNIKASI SECARA BERKESAN DALAM KALANGAN PELAJAR SARJANA MUDA SAINS SERTA PENDIDIKAN (PENGAJIAN ISLAM)

PEMBANGUNAN DAN PENGESAHAN INSTRUMEN UJIAN KEMAHIRAN BERFIKIR ARAS TINGGI FIZIK BAGI TAJUK DAYA DAN GERAKAN ROHANA BINTI AMIN

COOPERATIVE LEARNING TIME TOKEN IN THE TEACHING OF SPEAKING

PENGAMALAN KERJA BERPASUKAN DALAM PANITIA KEMAHIRAN HIDUP BERSEPADU DI SEKOLAH MENENGAH DAERAH JOHOR BAHRU

UNIVERSITI TEKNOLOGI MALAYSIA JUDUL: PEMBANGIINAN E-PETA MINDA BERTAJUK REDOX REACTION IN ELECTROLYTIC CELL AND CHEMICAL CELL KIMIA TINGKATAN LIMA

TEACHING ENGLISH PRONUNCIATION AT THE SIXTH YEAR OF SD NEGERI KAUMAN BLORA

Noor Chahaya Ngosman Sekolah Kebangsaan Rantau Panjang, Kementerian Pendidikan Malaysia

KESESUAIAN PERSEKITARAN PEMBELAJARAN, INTERAKSI GURU-PELAJAR, KOMITMEN BELAJAR DAN KESELESAAN PEMBELAJARAN DALAM KALANGAN PELAJAR BIOLOGI

PENILAIAN ESEI BERBANTUKAN KOMPUTER MENGGUNAKAN TEKNIK BAYESIAN DAN PENGUNDURAN LINEAR BERGANDA

REKACIPTA INSTRUMEN PUZZLE HEKSAGON KIT BAGI UJIAN KETANGKASAN AHMAD SYUKRI BIN MUHAMMAD

PEMBANGUNAN SISTEM SOKONGAN PEMBELAJARAN KENDIRI ATAS TALIAN BAGI TAJUK STATISTIK II AMIRUDIN BIN ABD RAHMAN UNIVERSITI TEKNOLOGI MALAYSIA

UNIVERSITI MALAYA PERAKUAN KEASLIAN PENULISAN. Nama: Rosnah binti Ishak (No. Kad Pengenalan: )

KEBOLEHAN PENULISAN KANAK-KANAK BERUMUR 6 HINGGA 9 TAHUN SURAYAH BINTI ZAIDON

RANCANGAN KURSUS. Muka surat : 1 daripada 6. Nama dan Kod Kursus: Komputer dalam Pendidikan Kimia(MPS1343) Jumlah Jam Pertemuan: 3 x 14 = 42 jam

MEMBANGUN WEB PORTAL BERASASKAN MOODLE BERTAJUK PROBABILITY SPM

Assessing School-based Learning: A Developmental Framework for Student Teachers

UNIVERSITI PUTRA MALAYSIA EFFECTIVENESS OF PROBLEM-BASED LEARNING - TEACHING ALGEBRA AMONG FORM FOUR STUDENTS

Masalah dalam Pengajaran dan Pembelajaran bagi Kursus Teknologi Elektrik di Kolej Vokasional

PERSEPSI PELAJAR TERHADAP SAINTIS DAN KEFAHAMAN PELAJAR DALAM SAINS (SEKOLAH MENENGAH) GHANDISWARI A/P PANIANDI UNIVERSITI TEKNOLOGI MALAYSIA

INCREASING STUDENTS ABILITY IN WRITING OF RECOUNT TEXT THROUGH PEER CORRECTION

MINAT MEMBACA DALAM KALANGAN GURU PELATIH TAHUN DUA FAKULTI PENDIDIKAN UTM SKUDAI MD ZAKI BIN MD GHAZALI

PEMBANGUNAN MODEL PENERAPAN ETIKA DAN NILAI (ENI) BERASASKAN AKTIVITI INKUIRI : APLIKASI INTERPRETIVE STRUCTURAL MODELLING (ISM)

SIFU Oleh: HANITA LADJAHARUN SMK Bandaraya Kota Kinabalu, Sabah ABSTRAK

FAKTOR-FAKTOR YANG MEMPENGARUHI MASALAH PONTENG KELAS DI KALANGAN PELAJAR DI KOLEJ KEMAHIRAN BELIA NASIONAL, PONTIAN NORAINIZA BT SAINI

Transcription:

ONLINE FORUM THREAD RETRIEVAL USING DATA FUSION AMEER TAWFIK ABDULLAH ALBAHEM A thesis submitted in fulfilment of the requirements for the award of the degree of Master of Science (Computer Science) Faculty of Computing Universiti Teknologi Malaysia SEPTEMBER 2013

To my wife and parents iii

iv ACKNOWLEDGEMENT First of all, all praise to Allah for giving me the strength and the patience to complete this task. My supervisor, Prof Naomie Salim, thanks for being a family member rather than an academic advisor. Your unlimited support in various aspects of my study has been a corner stone on the success of this research. Part of the thesis would have not been completed without the valuable advice from Jangwon Seo at Center for Intelligent Information Retrieval, University of Massachusetts, Amherst. Using the corpus that developed by Sumit Bhatia from the Pennsylvania State University has enabled conducting the thesis experiments. Thank you Sumit, your collaboration is very much appreciated.. This work would have not seen the light without the scholarship provided by the Yemeni Ministry of High Education and Scientific Research. In addition, I would like to thank the Malaysian Ministry of Higher Education (MOHE) and the Research Management Centre (RMC) at the Universiti Teknologi Malaysia (UTM) for sponsoring the publication of this research. It just has been a routine to make the family members the last ones to thank; however, my parents, family members and friends, thank you for your support. My wife, I am speechless when it comes to thanking you. Your support, caring and sacrifice have been beyond what an ordinary person would do for his partner. What you have given me is just immeasurable, thank you! Ameer Tawfik Albaham, Malaysia

v ABSTRACT Online forums empower people to seek and share information via discussion threads. However, finding threads satisfying a user information need is a daunting task due to information overload. In addition, traditional retrieval techniques do not suit the unique structure of threads because thread retrieval returns threads, whereas traditional retrieval techniques return text messages. A few representations have been proposed to address this problem; and, in some representations aggregating query relevance evidence is an essential step. This thesis proposes several data fusion techniques to aggregate evidence of relevance within and across thread representations. In that regard, this thesis has three contributions. Firstly, this work adapts the Voting Model from the expert finding task to thread retrieval. The adapted Voting Model approaches thread retrieval as a voting process. It ranks a list of messages, then it groups messages based on their parent threads; also, it treats each ranked message as a vote supporting the relevance of its parent thread. To rank parent threads, a data fusion technique aggregates evidence from threads ranked messages. Secondly, this study proposes two extensions of the voting model: Top K and Balanced Top K voting models. The Top K model aggregates evidence from only the top K ranked messages from each thread. The Balanced Top K model adds a number of artificial ranked messages to compensate the difference if a thread has less than K ranked messages (a padding step). Experiments with these voting models and thirteen data fusion methods reveal that summing relevance scores of the top K ranked messages from each thread with the padding step outperforms the state of the art on all measures on two datasets. The third contribution of this thesis is a multi-representation thread retrieval using data fusion techniques. In contrast to the Voting Model, data fusion methods were used to fuse several ranked lists of threads instead of a single ranked list of messages. The thread lists were generated by five retrieval methods based on various thread representations; the Voting Model is one of them. The first three methods assume a message to be the unit of indexing, while the latter two assume the title and the concatenation of the thread message texts to be the units of indexing respectively. A thorough evaluation of the performance of data fusion techniques in fusing various combinations of thread representations was conducted. The experimental results show that using the sum of relevance scores or the sum of relevance scores multiplied by the number of retrieving methods to develop multi-representation thread retrieval improves performance and outperforms all individual representations.

vi ABSTRAK Forum dalam talian membolehkan pengguna mencari dan berkongsi maklumat melalui benang perbincangan. Walau bagaimanapun, pencarian benang perbincangan adalah satu tugas yang bukan mudah disebabkan oleh beban maklumat. Disamping itu, teknik dapatan semula tradisional tidak sesuai dengan struktur unik benang perbincangan kerana dapatan semula benang mengembalikan benang, sementara teknik dapatan semula tradisional mengembalikan mesej teks. Beberapa perwakilan telah dicadangkan; dan mengagregat bukti relevansi maklumat carian merupakan satu langkah penting. Tesis ini mencadangkan beberapa teknik gabungan data untuk mengagregat bukti relevansi perwakilan benang perbincangan. Tesis ini mempunyai tiga sumbangan. Pertama, kerja ini mengadaptasi model undian dari tugas carian pakar kepada dapatan semula benang perbincangan. Kesesuaian Model Undian mendekati dapatan semula benang perbincangan sebagai satu proses undian. Ia memberi susunan kedudukan kepada senarai mesej, dan kemudian mengumpulkan mesej berdasarkan benang perbincangan induk mereka; ia juga bertindak pada setiap susunan mesej perbincangan sebagai undi yang menyokong kaitan benang induk. Untuk mendapatkan susunan kedudukan benang perbincangan induk, teknik gabungan data mengagregat bukti dari mesej benang perbincangan. Kedua, kajian ini mencadangkan dua lanjutan model undian: K-Teratas dan K-Teratas Seimbang model undian. Model K-Teratas mengagregat bukti hanya daripada K mesej tertinggi. Model K-Teratas Seimbang menambah sesuatu susunan mesej nombor untuk mengimbangi perbezaan jika benang perbincangan mempunyai kurang daripada K mesej tertinggi (langkah tambahan). Melalui kajian dengan Model Undian dan 13 kaedah gabungan data, keputusan menunjukkan bahawa penjumlahan skor dari K mesej tertinggi dari setiap benang perbincangan dengan langkah tambahan mengatasi kaedah semasa dalam semua penilaian ke atas dua set data. Sumbangan ketiga tesis ini adalah dapatan multi-perwakilan benang perbincangan menggunakan teknik gabungan data. Berbeza dengan Model Undian, kaedah gabungan data telah digunakan untuk menggabungkan beberapa senarai benang perbincangan dan bukannya satu senarai mesej. Senarai benang perbincangan telah dihasilkan oleh lima model dapatan semula berdasarkan pelbagai perwakilan, antaranya Model Undian. Tiga kaedah yang pertama menganggap mesej sebagai unit pengindeksan, manakala dua kaedah yang terakhir menggunakan tajuk dan gabungan teks mesej benang perbincangannya. Penilaian yang menyeluruh ke atas gabungan pelbagai kombinasi perwakilan benang perbincangan telah dijalankan. Keputusan ujikaji menunjukkan bahawa menggunakan jumlah skor relevan atau jumlah skor relevan didarab dengan bilangan kaedah dapatan untuk membangunkan multi-perwakilan dapatan semula benang perbincangan boleh meningkatkan prestasi dan mengatasi semua perwakilan individu.