Data Fusion and Bias
|
|
- Ann Horn
- 6 years ago
- Views:
Transcription
1 Data Fusion and Bias Performance evaluation of various data fusion methods İlker Nadi Bozkurt Computer Engineering Department Bilkent University Ankara, Turkey Hayrettin Gürkök Computer Engineering Department Bilkent University Ankara, Turkey Eyüp Serdar Ayaz Computer Engineering Department Bilkent University Ankara, Turkey Abstract Data fusion is the combination of the results of independent searches on a document collection into one single output result set. It has been shown that this can greatly improve retrieval effectiveness over that of the individual results. This paper compares some major data fusion and system selection techniques by experimenting on TREC ad hoc collections. Results Merger: the results of search engines are merged using metasearch algorithms Keywords: data fusion; metasearch; information retrieval; rank aggregation; performance evaluation I. INTRODUCTION Data fusion (metasearch) is the term usually applied to techniques that combine final results from a number of search engines in order to improve retrieval. Briefly, a metasearch engine takes as input n ranked lists output by each of n search engines in response to a given query. Then, it computes a single ranked list as output which is usually an improvement over any of the used input lists. Metasearch offers significant advantages to information retrieval. First, it improves upon the performance of individual search engines because different retrieval methods often return very different irrelevant documents, but many of the same relevant documents. Second, metasearch provides more consistent and reliable performance than individual search engines. Since metasearch aggregates the advice of several systems, it does not reflect the tendency of a single system, resulting in a more reliable search system. [1] A reference software component architecture of a metasearch engine is illustrated in Figure 1. [2] The numbers on the edges indicate the sequence of actions for a query to be processed. According to this illustration, the functionality of each software component and the interactions among these components are as follows: Database Selector: the search engines to be fused selected using some system selection methods Query Dispatcher: the queries are submitted to the selected search engines Document Selector: the documents to be used from each search engine are determined Fig. 1: Metasearch software component architecture In this paper, we will deal with Database Selector and Results Merger components. The rest will be used as is from TREC results. We can classify the metasearch algorithms that will be mentioned throughout this paper by the data they require: whether they need relevance scores or only ranks. See Figure 2. ranks only relevance scores Reciprocal Rank Borda-fuse Condorcet-fuse CombMNZ CombANZ CombSUM Fig. 2: Some fusion methods according to the data they require As we have different fusion methods, we also have options for system selection process. In this work we will concentrate on all, best and bias selection methods. 1
2 The organization of the paper is as follows: In section 2 we briefly mention related work that is done in the area of data fusion. In section 3, we describe data fusion and system selection methods used in this work in detail. In section 4, we explain our experimental design in terms of the data sets and measures used. We present the experimental results and comparisons in Section 5. Finally, we conclude the paper in Section 6. II. RELATED WORK A number of fusion techniques based on normalised scores were proposed by Fox and Shaw [2]. These techniques use relevance scores from different systems. Among these, CombSUM and CombMNZ were shown to achieve the best performance and have been used in subsequent research. Aslam and Montague compared the fusion process to a democratic election in which there are voters (the search engines) and many candidates (documents). They achieved positive results by implementing adapted implementations of two algorithms designed for that situation. Borda-fuse [3] awards a score to each document depending on its position in each input result set, with its final score being the sum of these. Condorcet-fuse [4] ranks documents based on a pairwise comparison of each. A document is ranked above another if it appears above it in more input result sets. These two methods, together with the Reciprocal Rank method, use only ranks in contrast to relevance scores. In addition to fusion methods, there has been some work on system selection methods. Mowshowitz and Kawaguchi proposed selecting systems whose query responses are different from the norm so that more refined results can be achieved [5]. III. DATA FUSION AND SYSTEM SELECTION METHODS A. Data fusion methods 1) CombSUM CombSUM uses the summation of relevance scores by each system as the fused relevance score. 2) CombMNZ CombMNZ defines the fused relevance score as the sum of the relevance scores given by each system (that is, CombSUM), multiplied by the number of systems that returned the document. 3) CombANZ CombANZ is similar to CombMNZ except that, instead of multiplying, we divide CombSUM by the number of systems that returned the document. In other words, it returns the average relevance score. We can use the following general formula to calculate CombMNZ, CombANZ and CombSUM. Let n denote the number of systems that returned the document and rel ij relevance score of document i given by system j: score(i) = where p = 0, 1 and -1 for CombSUM, CombMNZ and CombANZ respectively. 4) Reciprocal rank fusion In this approach, retrieval systems determine the rank positions. When a duplicated document is found the inverse of its rankings are summed up, since the documents returned by more than one retrieval system might be more likely to be relevant. Systems that are not ranking a document are skipped. The following equation shows the computation of the rank score of document i using the position information of this document in all of the systems (j = 1...n). 1 1/ First, Rank Position score of each document to be combined is evaluated, then using these rank position scores, documents are sorted in non-decreasing order [6]. Example: Suppose that we have four different retrieval systems A, B, C, and D with a document collection composed of documents a, b, c, d, e, f, and g. Let us assume that for a given query their results are ranked as follows: A = (b, d, c, a) B = (a, b, c, f, g) C = (c, a, f, e, b, d) D = (a, d, g, f) Now, we compute the rank position of each document in the document list, and the rank scores of the documents are as follows: r(a) = 1/(1/4+1/1+1/2+1/1) = 0.36 r(b) = 1/(1/1+1/2+1/5+1/2) = 0.45 r(c) = 1/(1/3+1/3+1/1) = 0.60 r(d) = 1/(1/2+1/6+1/2) = 0.85 r(e) = 1/(1/4) = 4.00 r(f) = 1/(1/4+1/3+1/4) = 1.20 r(g) = 1/(1/5+1/3) =
3 Sorting the scores we have found in non-decreasing order, the final ranked list of documents is: a > b > c > d > f > g > e i.e. a is the document with the highest rank (top most document). 5) Borda-fuse This is a method taken from social theory of voting and used in data fusion. The highest ranked individual (in an n- way vote) gets n votes and each subsequent gets one vote less (so the number two gets n-1, the number three gets n-2 and so on). If there are candidates left unranked by the voter, the remaining points are divided evenly among the unranked candidates. Then, for each alternative, all the votes are added up and the alternative with the highest number of votes wins the election. Example: Consider the example given in Reciprocal Rank fusion again. The Borda count (BC) of a document i is computed by summing the Borda count values in individual systems (BC A in system A, etc.) as follows: BC(i) = BC A (i) + BC B (i) + BC C (i) + BC D (i) Now we compute the BC for each document: BC(a) = = 24 BC(b) = = 18 BC(c) = = 19 BC(d) = = 15.5 BC(e) = = 9.5 BC(f) = = 15 BC(g) = = 11 Sorting the scores we have found in non-increasing order, the final ranked list of documents is: a > c > b > d > f > g > e 6) Condorcet-fuse This is another method taken from social theory of voting. In the Condorcet election method, voters rank the candidates in the order of preference. The vote counting procedure then takes into account each preference of each voter for one candidate over another. The Condorcet voting algorithm is a majoritarian method that specifies the winner as the candidate, which beats each of the other candidates in a pair wise comparison. Example: Again, consider the previous example but this time, let the ranks be given as: A: a > c = b > g B: b > a > c > d > f > e > g C: a = b > c > f > g > e D: c > e > d Note that, for instance, in system A documents b and c have the same original rank. In the first stage, we use an N x N matrix for the pair wise comparison, where N is the number of candidates. Each non-diagonal entry (i, j) of the matrix shows the number of votes i over j (i.e., cell [a,b] shows the number of wins, loses, and ties of document a over document b, respectively). In a system while counting votes, a document loses to all other retrieved documents if it is not retrieved by that system. a b c d e f g a - 1, 1, 2 3, 1, 0 3, 1, 0 3, 1, 0 3, 0, 1 3, 0, 1 b 1, 1, 2-2, 1, 1 3, 1, 0 3, 1, 0 3, 0, 1 3, 0, 1 c 1, 3, 0 1, 2, 1-4, 0, 0 4, 0, 0 4, 0, 0 4, 0, 0 d 1, 3, 0 1, 3, 0 0, 4, 0-1, 2, 1 2, 1, 1 2, 1, 1 e 1, 3, 0 1, 3, 0 0, 4, 0 2, 1, 1-1, 2, 1 2, 2, 0 f 0, 3, 1 0, 3, 1 0, 4, 0 1, 2, 1 2, 1, 1-2, 1, 1 g 0, 3, 1 0, 3, 1 0, 4, 0 1, 2, 1 2, 2, 0 1, 2, 1 - Table 1: Pairwise wins, losses and ties in Condorcet After that, we determine the pair wise winners. Each complimentary pair is compared, and the winner receives one point in its win column and the loser receives one point in its lose column. If the simulated pair wise election is a tie, both receive one point in the tie column. Win Lose Tie a b c d e f g Table 2: Total wins, losses and ties in Condorcet To rank the documents we use their win and lose values. If the number of wins that a document has, is higher than the other one, then that document wins. Otherwise, if their win property is equal, we consider their lose scores; the document which has smaller lose score wins. If both win and lose scores are equal then both documents are tied. The final ranking of the documents in the example is: a = b > c > d = f > e > g 3
4 In our implementation, the documents d and f are assigned the rank of 4 and 5 in a random fashion and documents a and b will be assigned the rank of 1 and 2 in a random fashion. Condorcet Paradox: In a paradoxical case, there would be an equivalence class of winners, and one would be unable to pick the top winner or rank them. A commonly used example for this is the following: A: a > b > c B: b > c > a C: c > a > b In this example, if equivalent sources are considered tied, this problem is resolved. B. System selection methods 1) Normal All systems are selected to be used in fusion process. 2) Best Only the top performing systems are selected to be used in data fusion. One common way is to select a number of systems that yield high MAP when evaluated against TREC qrels. 3) Bias The systems that behave differently from the norm (majority of all systems used in fusion) are selected. The motivation behind this approach is that, usage of such systems would eliminate ordinary systems from data fusion and this could provide better discrimination among documents and systems. To compute the bias of a particular system, we first calculate the similarity of the vectors of the norm and the retrieval system, using a metric, e.g., their dot product divided by the square root of the product of their lengths, (the cosine similarity measure). The bias value is obtained by subtracting this similarity value from 1. So the similarity function for vectors v and w is the following:, The bias between these two vectors is defined as follows [5]: B(v, w) = 1 s(v, w) Two variants of bias calculation exist. One ignores the order of documents in the retrieved set and the other does not. To ignore position, frequency of document occurrence is used to calculate bias. To take the order of documents into account, we may increment the frequency of a document by m/i where m is the number of positions and i is the position of the document in the retrieved result set. Since users usually just look at the documents of higher rank, considering order in bias calculation gains importance [6]. Example: Suppose that we use two hypothetical retrieval systems A and B to define the norm, and three queries processed by each retrieval system. The documents retrieved by A and B for three queries are as follows (first row corresponds to the first query, etc.): Then the (seven) distinct documents retrieved by either A or B are a, b, c, d, e, f, and g and the response vectors for A, B and the norm are: respectively. X A = (2, 4, 2, 3, 1, 2, 0) X B = (2, 4, 1, 3, 1, 3, 1) X = (4, 8, 3, 6, 2, 5, 1) The similarity vector X A to X is: 76/ / and that of X B to X is: 79/ / = So the bias values for each system are: Bias(A) = = Bias(B) = = If we repeat the calculations by taking the order of documents into account, response vectors are: X A = (11/2, 9, 11/2, 11/3, 1, 3, 0) X B = (8, 7, 4, 16/3, 1, 25/6, 1) X = (27/2, 16, 19/2, 9, 2, 43/6, 1) Then the bias values are found in the same way as: Bias(A) = = Bias(B) = = IV. DATA SETS AND MEASURES We used the ad hoc tracks of TREC-3, -4, -5 and -7 [7]. Table 3 gives the number of runs for each TREC used in this track and in our experiments. 4
5 TREC Runs Table 3: Number of TREC runs We used mean non-interpolated average precision (MAP) to evaluate systems. Precision is the proportion of the retrieved documents that are relevant. Average precision for a single topic is the average of the precision after each relevant document is retrieved and using zero as the precision for not retrieved document. For multiple topics (queries), we used mean of these average precisions. All experiments are done on a Linux system with an Intel Core 2 Duo processor with 4GB of RAM. None of the steps has taken more than a few seconds to yield result so we were able to test with various pool depths and pseudorel percentages easily to find the optimum values. V. EXPERIMENTAL RESULTS In our first set of experiments we examined the effect of different pool depths and pseudorel percentages. We tested this using TREC7 dataset with Borda fuse and CombMNZ methods. A pool depth range of 30 to 150 and pseudorel percentage of 0.1 to 0.7 is examined. The following figures show the results: TREC-7 up to pool depth of 150 and pseudorel percentage of 0.7. We did not examine if these values are also optimal for other TRECs we used, but we nevertheless used these values in the rest of the experiments, as we do not expect much performance degradation if these values are not optimal. Fig. 4: Comparison of pool depth and pseudorel percentage with Normal system selection The following tables show our experiment results for BordaCount, CombMNZ, CombANZ and CombSUM fusion methods with Best, Bias and Normal system selection approaches on TREC 3,4,5 and 7. Note that we used different number of systems in each TREC for testing the bias concept, because the number of systems participated in each TREC is different and we have to use a certain percentage of the number of systems with bias to get meaningful results. In the following tables maximum entry in each column is shown in bold, showing the best system selection method for that data fusion method in the corresponding TREC. The maximum entry of each row is shown underlined, showing the best data fusion method for that system selection method in the corresponding TREC. Fig. 3: Comparison of pool depth and pseudorel percentage with Best system selection The figures show that as we increase the pool depth and pseuodrel percentage, the MAP score of the fused results improve. As we increase the pool depth and pseudorel percentage, if relevant documents continue to appear higher ranked than non-relevant documents, the average precision of the fused system will improve. This is clearly the case for TREC-3 BORDA COMBMNZCOMBANZ COMBSUM BEST BEST BIAS BIAS NORMAL Table 4: TREC-3 results 5
6 TREC-4 BORDA COMBMNZCOMBANZ COMBSUM BEST BEST BIAS BIAS NORMAL Table 5: TREC-4 results TREC-5 BORDA COMBMNZ COMBANZ COMBSUM BEST BEST BIAS BIAS NORMAL Table 6: TREC-5 results TREC-7 BORDA COMBMNZCOMBANZ COMBSUM BEST BEST BIAS BIAS NORMAL Table 7: TREC-7 results The above tables contain information for both comparing different system selection methods for each data fusion method and and comparing different data fusion methods for each system selection method. When we examine the columns of the above tables we see that for all but one of the test results, best system selection always yields better results than normal system selection and bias concept in system selection. This is an expected result, because best systems have high relevant overlap and when relevant overlap is higher than nonrelevant overlap data fusion yields performance improvement as conjectured by Lee[8]. Further examination of the columns of the above tables shows that using bias concept in system selection shows improvement over normal system selection on TREC 5 and TREC 7. On TREC 3 normal system selection yields better results than bias system selection but a further examination of TREC 3 results show that normal system selection yields competitive results against even best system selection. One reason for this maybe that all input systems participating in TREC 3 produce similar results. On TREC4 we see very weird results, almost the opposite of everything we expected. This may be the reason it was not used in experiments in [6]. Also in [3],[9] and [10], TREC3 and TREC5 data is used in experiments but not TREC4. To see which data fusion method performs the best, we examine the rows of the above tables. The results show that except for TREC4, CombMNZ yields the best results. The second best is almost always CombSUM. These results agree with the previuos results in the literature, e.g. see[3], as CombMNZ is a very competitive data fusion method. To see the effectiveness of our fusion methods, we compare their MAP scores against that of the top and the median systems of the TREC under consideration with normal system selection. The MAP values for all systems are seen on Table 8. Figure 5 demonstrates the performance of our fusion methods. The performance of CombMNZ and CombSUM are comparable to that of the top system in TREC-5 and TREC-7. CombANZ and Borda-fuse never outperforms other systems in any TREC. The median system s is worse than best fusion methods (CombMNZ and CombSUM) except for TREC-4. TOP SYSTEM MEDIAN SYSTEM BORDA- FUSE TREC-3 TREC-4 TREC-5 TREC COMBANZ COMBMNZ COMBSUM Table 8: MAP values of different systems with normal system selection MAP TREC TREC-3 TREC-4 TREC-5 TREC-7 TOP SYSTEM MEDIAN SYSTEM BORDA-FUSE COMBANZ COMBMNZ COMBSUM Fig. 5: MAP graph of various systems with normal system selection 6
7 VI. CONCLUSION In this paper, we evaluated and made comparisons on system selection and merging methods used in data fusion. For system selection methods, we considered the effectiveness of ranking by selecting all of the systems, some of the best performing systems and finally the systems that behave differently from the majority (i.e. biased systems). Experiments show that, the superior system selection method is best selection. We demonstrate that usage of biased systems improves retrieval effectiveness. In some cases bias selection outperforms normal selection and usually yields MAP scores close to best selection. Among data fusion methods, CombSUM and CombMNZ performs much better than any other method.. However, these two methods require existence of relevance scores from systems. In cases where we only have the ordering but not the original scores, these methods can not be applied, and we may have to use methods which only use ranking information such as Condorcet method [6] and Borda Count. REFERENCES [1] M. Montague and J. A. Aslam. Condorcet Fusion for Improved Retrieval. In Proceedings of the 11th international conference on information and knowledge management (pp ). [2] E. A. Fox and J. A. Shaw. Combination of multiple searches. In Proceedings of the 2nd Text Retrieval Conference (TREC-2), National Institute of Standards and Technology Special Publication , pages , [3] J. A. Aslam and M. Montague. Models for metasearch. In SIGIR 01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages , New York, NY, USA, ACM Press. [4] M. Montague and J. A. Aslam. Relevance score normalization for metasearch. In CIKM 01: Proceedings of the tenth international conference on Information and knowledge management, pages , New York, NY, USA, ACM Press. [5] A. Mowshowitz and A. Kawaguchi, Measuring search engine bias, Information Processing and Management: an International Journal, v.41 n.5, p , September [6] R. Nuray and F. Can, Automatic ranking of information retrieval systems using data fusion, Information Processing and Management: an International Journal, v.42 n.3, p , May [7] Text REtrieval Conference (TREC) Home Page, < [8] J.H.Lee, Analyses of Multilple Evidence Combination, Proceedings of the 20 th Annual ACM-SIGIR, pp , 1995 [9] M. Montague and J. A. Aslam, Metasearch Consistency, SIGIR `01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages , New York, NY, USA, ACM Press. [10] D. Lillis, F. Toolan, R. Collier, J. Dunnion, ProbFuse: A Probabilistic Approach to Data Fusion, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages , New York, NY, USA, ACM Press. 7
OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS
OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,
More informationGrade 6: Correlated to AGS Basic Math Skills
Grade 6: Correlated to AGS Basic Math Skills Grade 6: Standard 1 Number Sense Students compare and order positive and negative integers, decimals, fractions, and mixed numbers. They find multiples and
More informationUNIT ONE Tools of Algebra
UNIT ONE Tools of Algebra Subject: Algebra 1 Grade: 9 th 10 th Standards and Benchmarks: 1 a, b,e; 3 a, b; 4 a, b; Overview My Lessons are following the first unit from Prentice Hall Algebra 1 1. Students
More informationThe Good Judgment Project: A large scale test of different methods of combining expert predictions
The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationTerm Weighting based on Document Revision History
Term Weighting based on Document Revision History Sérgio Nunes, Cristina Ribeiro, and Gabriel David INESC Porto, DEI, Faculdade de Engenharia, Universidade do Porto. Rua Dr. Roberto Frias, s/n. 4200-465
More informationLearning to Rank with Selection Bias in Personal Search
Learning to Rank with Selection Bias in Personal Search Xuanhui Wang, Michael Bendersky, Donald Metzler, Marc Najork Google Inc. Mountain View, CA 94043 {xuanhui, bemike, metzler, najork}@google.com ABSTRACT
More informationRule Learning with Negation: Issues Regarding Effectiveness
Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX
More informationThe lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.
Name: Partner(s): Lab #1 The Scientific Method Due 6/25 Objective The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.
More informationInstructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100
San Diego State University School of Social Work 610 COMPUTER APPLICATIONS FOR SOCIAL WORK PRACTICE Statistical Package for the Social Sciences Office: Hepner Hall (HH) 100 Instructor: Mario D. Garrett,
More informationAGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS
AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic
More informationThis scope and sequence assumes 160 days for instruction, divided among 15 units.
In previous grades, students learned strategies for multiplication and division, developed understanding of structure of the place value system, and applied understanding of fractions to addition and subtraction
More informationCROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2
1 CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 Peter A. Chew, Brett W. Bader, Ahmed Abdelali Proceedings of the 13 th SIGKDD, 2007 Tiago Luís Outline 2 Cross-Language IR (CLIR) Latent Semantic Analysis
More informationOn document relevance and lexical cohesion between query terms
Information Processing and Management 42 (2006) 1230 1247 www.elsevier.com/locate/infoproman On document relevance and lexical cohesion between query terms Olga Vechtomova a, *, Murat Karamuftuoglu b,
More informationLearning From the Past with Experiment Databases
Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University
More informationPrimary National Curriculum Alignment for Wales
Mathletics and the Welsh Curriculum This alignment document lists all Mathletics curriculum activities associated with each Wales course, and demonstrates how these fit within the National Curriculum Programme
More informationEntrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany
Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Jana Kitzmann and Dirk Schiereck, Endowed Chair for Banking and Finance, EUROPEAN BUSINESS SCHOOL, International
More informationNumeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C
Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C Using and applying mathematics objectives (Problem solving, Communicating and Reasoning) Select the maths to use in some classroom
More informationOn the Combined Behavior of Autonomous Resource Management Agents
On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science
More informationDiagnostic Test. Middle School Mathematics
Diagnostic Test Middle School Mathematics Copyright 2010 XAMonline, Inc. All rights reserved. No part of the material protected by this copyright notice may be reproduced or utilized in any form or by
More informationComment-based Multi-View Clustering of Web 2.0 Items
Comment-based Multi-View Clustering of Web 2.0 Items Xiangnan He 1 Min-Yen Kan 1 Peichu Xie 2 Xiao Chen 3 1 School of Computing, National University of Singapore 2 Department of Mathematics, National University
More informationOCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationPage 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified
Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General Grade(s): None specified Unit: Creating a Community of Mathematical Thinkers Timeline: Week 1 The purpose of the Establishing a Community
More informationMathematics process categories
Mathematics process categories All of the UK curricula define multiple categories of mathematical proficiency that require students to be able to use and apply mathematics, beyond simple recall of facts
More informationExtending Place Value with Whole Numbers to 1,000,000
Grade 4 Mathematics, Quarter 1, Unit 1.1 Extending Place Value with Whole Numbers to 1,000,000 Overview Number of Instructional Days: 10 (1 day = 45 minutes) Content to Be Learned Recognize that a digit
More informationPre-Algebra A. Syllabus. Course Overview. Course Goals. General Skills. Credit Value
Syllabus Pre-Algebra A Course Overview Pre-Algebra is a course designed to prepare you for future work in algebra. In Pre-Algebra, you will strengthen your knowledge of numbers as you look to transition
More informationA cognitive perspective on pair programming
Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2006 Proceedings Americas Conference on Information Systems (AMCIS) December 2006 A cognitive perspective on pair programming Radhika
More informationAlgebra 1 Summer Packet
Algebra 1 Summer Packet Name: Solve each problem and place the answer on the line to the left of the problem. Adding Integers A. Steps if both numbers are positive. Example: 3 + 4 Step 1: Add the two numbers.
More informationBridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models
Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models Jung-Tae Lee and Sang-Bum Kim and Young-In Song and Hae-Chang Rim Dept. of Computer &
More informationMathematics Success Level E
T403 [OBJECTIVE] The student will generate two patterns given two rules and identify the relationship between corresponding terms, generate ordered pairs, and graph the ordered pairs on a coordinate plane.
More informationMathematics subject curriculum
Mathematics subject curriculum Dette er ei omsetjing av den fastsette læreplanteksten. Læreplanen er fastsett på Nynorsk Established as a Regulation by the Ministry of Education and Research on 24 June
More informationAlgebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview
Algebra 1, Quarter 3, Unit 3.1 Line of Best Fit Overview Number of instructional days 6 (1 day assessment) (1 day = 45 minutes) Content to be learned Analyze scatter plots and construct the line of best
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationNCEO Technical Report 27
Home About Publications Special Topics Presentations State Policies Accommodations Bibliography Teleconferences Tools Related Sites Interpreting Trends in the Performance of Special Education Students
More informationDublin City Schools Mathematics Graded Course of Study GRADE 4
I. Content Standard: Number, Number Sense and Operations Standard Students demonstrate number sense, including an understanding of number systems and reasonable estimates using paper and pencil, technology-supported
More informationOn-the-Fly Customization of Automated Essay Scoring
Research Report On-the-Fly Customization of Automated Essay Scoring Yigal Attali Research & Development December 2007 RR-07-42 On-the-Fly Customization of Automated Essay Scoring Yigal Attali ETS, Princeton,
More informationCal s Dinner Card Deals
Cal s Dinner Card Deals Overview: In this lesson students compare three linear functions in the context of Dinner Card Deals. Students are required to interpret a graph for each Dinner Card Deal to help
More information(Sub)Gradient Descent
(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include
More informationRemainder Rules. 3. Ask students: How many carnations can you order and what size bunches do you make to take five carnations home?
Math Concepts whole numbers multiplication division subtraction addition Materials TI-10, TI-15 Explorer recording sheets cubes, sticks, etc. pencils Overview Students will use calculators, whole-number
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationAre You Ready? Simplify Fractions
SKILL 10 Simplify Fractions Teaching Skill 10 Objective Write a fraction in simplest form. Review the definition of simplest form with students. Ask: Is 3 written in simplest form? Why 7 or why not? (Yes,
More informationStatewide Framework Document for:
Statewide Framework Document for: 270301 Standards may be added to this document prior to submission, but may not be removed from the framework to meet state credit equivalency requirements. Performance
More informationMontana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011
Montana Content Standards for Mathematics Grade 3 Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011 Contents Standards for Mathematical Practice: Grade
More informationIntroduction to Causal Inference. Problem Set 1. Required Problems
Introduction to Causal Inference Problem Set 1 Professor: Teppei Yamamoto Due Friday, July 15 (at beginning of class) Only the required problems are due on the above date. The optional problems will not
More informationPerformance Analysis of Optimized Content Extraction for Cyrillic Mongolian Learning Text Materials in the Database
Journal of Computer and Communications, 2016, 4, 79-89 Published Online August 2016 in SciRes. http://www.scirp.org/journal/jcc http://dx.doi.org/10.4236/jcc.2016.410009 Performance Analysis of Optimized
More informationUnderstanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)
Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010) Jaxk Reeves, SCC Director Kim Love-Myers, SCC Associate Director Presented at UGA
More informationGCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education
GCSE Mathematics B (Linear) Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education Mark Scheme for November 2014 Oxford Cambridge and RSA Examinations OCR (Oxford Cambridge
More informationData Fusion Models in WSNs: Comparison and Analysis
Proceedings of 2014 Zone 1 Conference of the American Society for Engineering Education (ASEE Zone 1) Data Fusion s in WSNs: Comparison and Analysis Marwah M Almasri, and Khaled M Elleithy, Senior Member,
More informationUsing Web Searches on Important Words to Create Background Sets for LSI Classification
Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract
More informationMathematics Scoring Guide for Sample Test 2005
Mathematics Scoring Guide for Sample Test 2005 Grade 4 Contents Strand and Performance Indicator Map with Answer Key...................... 2 Holistic Rubrics.......................................................
More informationNotes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1
Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial
More informationCSC200: Lecture 4. Allan Borodin
CSC200: Lecture 4 Allan Borodin 1 / 22 Announcements My apologies for the tutorial room mixup on Wednesday. The room SS 1088 is only reserved for Fridays and I forgot that. My office hours: Tuesdays 2-4
More informationHow to read a Paper ISMLL. Dr. Josif Grabocka, Carlotta Schatten
How to read a Paper ISMLL Dr. Josif Grabocka, Carlotta Schatten Hildesheim, April 2017 1 / 30 Outline How to read a paper Finding additional material Hildesheim, April 2017 2 / 30 How to read a paper How
More informationChapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard
Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA Alta de Waal, Jacobus Venter and Etienne Barnard Abstract Most actionable evidence is identified during the analysis phase of digital forensic investigations.
More informationA Comparison of Charter Schools and Traditional Public Schools in Idaho
A Comparison of Charter Schools and Traditional Public Schools in Idaho Dale Ballou Bettie Teasley Tim Zeidner Vanderbilt University August, 2006 Abstract We investigate the effectiveness of Idaho charter
More informationTeam Formation for Generalized Tasks in Expertise Social Networks
IEEE International Conference on Social Computing / IEEE International Conference on Privacy, Security, Risk and Trust Team Formation for Generalized Tasks in Expertise Social Networks Cheng-Te Li Graduate
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationA simulated annealing and hill-climbing algorithm for the traveling tournament problem
European Journal of Operational Research xxx (2005) xxx xxx Discrete Optimization A simulated annealing and hill-climbing algorithm for the traveling tournament problem A. Lim a, B. Rodrigues b, *, X.
More informationFunctional Skills Mathematics Level 2 assessment
Functional Skills Mathematics Level 2 assessment www.cityandguilds.com September 2015 Version 1.0 Marking scheme ONLINE V2 Level 2 Sample Paper 4 Mark Represent Analyse Interpret Open Fixed S1Q1 3 3 0
More informationTest Effort Estimation Using Neural Network
J. Software Engineering & Applications, 2010, 3: 331-340 doi:10.4236/jsea.2010.34038 Published Online April 2010 (http://www.scirp.org/journal/jsea) 331 Chintala Abhishek*, Veginati Pavan Kumar, Harish
More informationTruth Inference in Crowdsourcing: Is the Problem Solved?
Truth Inference in Crowdsourcing: Is the Problem Solved? Yudian Zheng, Guoliang Li #, Yuanbing Li #, Caihua Shan, Reynold Cheng # Department of Computer Science, Tsinghua University Department of Computer
More informationWE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT
WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working
More informationUnsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model
Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.
More informationAttributed Social Network Embedding
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, MAY 2017 1 Attributed Social Network Embedding arxiv:1705.04969v1 [cs.si] 14 May 2017 Lizi Liao, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua Abstract Embedding
More informationFragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing
Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing D. Indhumathi Research Scholar Department of Information Technology
More informationOhio s Learning Standards-Clear Learning Targets
Ohio s Learning Standards-Clear Learning Targets Math Grade 1 Use addition and subtraction within 20 to solve word problems involving situations of 1.OA.1 adding to, taking from, putting together, taking
More informationLecture 2: Quantifiers and Approximation
Lecture 2: Quantifiers and Approximation Case study: Most vs More than half Jakub Szymanik Outline Number Sense Approximate Number Sense Approximating most Superlative Meaning of most What About Counting?
More informationAP Statistics Summer Assignment 17-18
AP Statistics Summer Assignment 17-18 Welcome to AP Statistics. This course will be unlike any other math class you have ever taken before! Before taking this course you will need to be competent in basic
More informationMULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY
MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract
More informationArtificial Neural Networks written examination
1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14
More informationCOMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS
COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS L. Descalço 1, Paula Carvalho 1, J.P. Cruz 1, Paula Oliveira 1, Dina Seabra 2 1 Departamento de Matemática, Universidade de Aveiro (PORTUGAL)
More informationOrganizational Knowledge Distribution: An Experimental Evaluation
Association for Information Systems AIS Electronic Library (AISeL) AMCIS 24 Proceedings Americas Conference on Information Systems (AMCIS) 12-31-24 : An Experimental Evaluation Surendra Sarnikar University
More informationProbability and Statistics Curriculum Pacing Guide
Unit 1 Terms PS.SPMJ.3 PS.SPMJ.5 Plan and conduct a survey to answer a statistical question. Recognize how the plan addresses sampling technique, randomization, measurement of experimental error and methods
More informationChapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4
Chapters 1-5 Cumulative Assessment AP Statistics Name: November 2008 Gillespie, Block 4 Part I: Multiple Choice This portion of the test will determine 60% of your overall test grade. Each question is
More informationSoftware Maintenance
1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories
More informationAn Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method
Farhadi F, Sorkhi M, Hashemi S et al. An effective framework for fast expert mining in collaboration networks: A grouporiented and cost-based method. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(3): 577
More informationMaths Games Resource Kit - Sample Teaching Problem Solving
Teaching Problem Solving This sample is an extract from the first 2015 contest resource kit. The full kit contains additional example questions and solution methods. Rationale and Syllabus Outcomes Learning
More informationAn Investigation into Team-Based Planning
An Investigation into Team-Based Planning Dionysis Kalofonos and Timothy J. Norman Computing Science Department University of Aberdeen {dkalofon,tnorman}@csd.abdn.ac.uk Abstract Models of plan formation
More informationVariations of the Similarity Function of TextRank for Automated Summarization
Variations of the Similarity Function of TextRank for Automated Summarization Federico Barrios 1, Federico López 1, Luis Argerich 1, Rosita Wachenchauzer 12 1 Facultad de Ingeniería, Universidad de Buenos
More informationClassroom Connections Examining the Intersection of the Standards for Mathematical Content and the Standards for Mathematical Practice
Classroom Connections Examining the Intersection of the Standards for Mathematical Content and the Standards for Mathematical Practice Title: Considering Coordinate Geometry Common Core State Standards
More informationCarnegie Mellon University Department of Computer Science /615 - Database Applications C. Faloutsos & A. Pavlo, Spring 2014.
Carnegie Mellon University Department of Computer Science 15-415/615 - Database Applications C. Faloutsos & A. Pavlo, Spring 2014 Homework 2 IMPORTANT - what to hand in: Please submit your answers in hard
More informationA study of speaker adaptation for DNN-based speech synthesis
A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,
More informationHuman Emotion Recognition From Speech
RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationConversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games
Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games David B. Christian, Mark O. Riedl and R. Michael Young Liquid Narrative Group Computer Science Department
More informationSouth Carolina English Language Arts
South Carolina English Language Arts A S O F J U N E 2 0, 2 0 1 0, T H I S S TAT E H A D A D O P T E D T H E CO M M O N CO R E S TAT E S TA N DA R D S. DOCUMENTS REVIEWED South Carolina Academic Content
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationFinding Translations in Scanned Book Collections
Finding Translations in Scanned Book Collections Ismet Zeki Yalniz Dept. of Computer Science University of Massachusetts Amherst, MA, 01003 zeki@cs.umass.edu R. Manmatha Dept. of Computer Science University
More informationMaximizing Learning Through Course Alignment and Experience with Different Types of Knowledge
Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February
More informationClass-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification
Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,
More informationMath 121 Fundamentals of Mathematics I
I. Course Description: Math 121 Fundamentals of Mathematics I Math 121 is a general course in the fundamentals of mathematics. It includes a study of concepts of numbers and fundamental operations with
More informationAlgebra 2- Semester 2 Review
Name Block Date Algebra 2- Semester 2 Review Non-Calculator 5.4 1. Consider the function f x 1 x 2. a) Describe the transformation of the graph of y 1 x. b) Identify the asymptotes. c) What is the domain
More informationSchool of Innovative Technologies and Engineering
School of Innovative Technologies and Engineering Department of Applied Mathematical Sciences Proficiency Course in MATLAB COURSE DOCUMENT VERSION 1.0 PCMv1.0 July 2012 University of Technology, Mauritius
More informationAxiom 2013 Team Description Paper
Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association
More informationProcedia - Social and Behavioral Sciences 226 ( 2016 ) 27 34
Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 226 ( 2016 ) 27 34 29th World Congress International Project Management Association (IPMA) 2015, IPMA WC
More informationExperiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling
Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad
More information