Team Formation for Generalized Tasks in Expertise Social Networks

Size: px
Start display at page:

Download "Team Formation for Generalized Tasks in Expertise Social Networks"

Transcription

1 IEEE International Conference on Social Computing / IEEE International Conference on Privacy, Security, Risk and Trust Team Formation for Generalized Tasks in Expertise Social Networks Cheng-Te Li Graduate Institute of Networking and Multimedia National Taiwan University, Taipei, Taiwan d @csie.ntu.edu.tw Man-Kwan Shan Department of Computer Science National Chengchi University, Taipei, Taiwan mkshan@cs.nccu.edu.tw Abstract Given an expertise social network and a task consisting of a set of required skills, the team formation problem aims at finding a team of experts who not only satisfy the requirements of the given task but also communicate to one another in an effective manner. To solve this problem, Lappas et al. [9] has proposed the Enhance Steiner algorithm. In this work, we generalize this problem by associating each required skill with a specific number of experts. We propose three approaches to form an effective team for the generalized task. First, we extend the Enhanced-Steiner algorithm to a generalized version for generalized tasks. Second, we devise a density-based measure to improve the effectiveness of the team. Third, we present a novel grouping-based method that condenses the expertise information to a group graph according to required skills. This group graph not only drastically reduces the search space but also avoid redundant communication costs and irrelevant individuals when compiling team members. Experimental results on the DBLP dataset show the teams found by our methods performs well in both effectiveness and efficiency. Keywords-Team Formation; Social Network; Generalized Tasks; Expertise Networks I. INTRODUCTION Team formation is essential in the field of organization theory. In an organization, a successful project relies on not only the expertise of participated members but also on the communication and collaboration between members. In other words, if we wish to form a team of experts for a given task that consists of some required skills, it is critical to find a set of persons whose professional skills satisfy the given task and who are able to communicate effectively with each another. Given an expertise social network, the formation of a team aims to find a crew of experts for a given task consisting of some required skills. An expertise social network consists of a pool of candidates. Each candidate is an expert in some skills. In addition, according to previous collaboration between experts, there is a weight on the edge to indicate the communication cost between them. As a result, the team formation problem is to find some experts from these candidates to meet a given task and the total communication cost among these found experts is as low as possible. For example, assume that a project leader aims to organize a team for a given task of four required skills R={s 1, s 2, s 3, s 4 }. There are six candidates P={1, 2, 3, 4, 5, 6}. Each candidate i is an expert in a set of some skills X i while X 1 ={s 4 }, X 2 ={s 1 }, X 3 ={s 1, s 3 }, X 4 ={s 1, s 4 }, X 5 ={s 2 }, and X 6 ={s 3 }. Also assume that there exists the social network among these experts, as shown in Figure 1. In Figure 1, bold lines indicate the existence of previous collaborations between two candidates while the weights on edges stand for the communication costs between them. To form a team which meets the given task, if the communication cost is not considered, four teams are qualified: T 1 ={1, 2, 3, 5}, T 2 ={1, 2, 5, 6}, T 3 ={3, 4, 5}, and T 4 ={4, 5, 6}. However, if the communication cost is considered and measured by the cost of the minimum spanning tree, T 4 ={4, 5, 6} is the best. Both T 1 and T 2 are formed by two disconnected components. The communication cost of T 4 is 0.3 while that of T 3 is 0.6. Figure 1. An expertise social network. An enhanced graph, where nodes with black color are experts and those with white color are skills. The work of Lappas et al. [9] is the first attempt to exploit the expertise social network together with the communication cost to organize a team for a given task. However, the given task they tackle is a basic task which consists of a set of skills without consideration of the required number of experts for each skill. In real-life situations, however, it is likely that more than one expert is demanded for some skills of a given task. For example, project leaders or team organizers have higher potential to allocate a certain number of specialists for each required skill to form an effective team to perform some task. To meet the real-life requirement aforementioned, we propose to generalize the problem by associating a specific number to each required skill. Specifically, the expected team must satisfy (1) its members possess all the required skills in the given task, (2) for each required skill, the team contains at least the specified number of experts, and (3) the total communication cost among the members of a team should be as low as possible. We regard the kind of task allowing designated number of experts for each required skill as generalized tasks. For example, in Figure 1, if a given task requires forming a team with two experts in skill s 1, one expert in skill s 3 and one expert in skill s 4, then T 5 ={3, 4} is the best team with the lowest cost /10 $ IEEE DOI /SocialCom

2 Lappas et al. [9] have proved that the team formation problem is NP-hard. By defining the communication cost as the weighted sum of the minimum spanning tree which connects the found experts, they proposed two approximate methods, Cover-Steiner and Enhanced-Steiner algorithms, to solve the team formation problem for basic tasks. In this paper, we investigate the team formation problem for generalized tasks. Given an expertise network and a task consisting of required skills where each skill is associated with a specific number of experts, our goal is to find a team of experts that satisfies the skill requirement and the communication cost among these found experts is as low as possible. Considering the effectiveness of the formed team and the efficiency of the team formation process, we propose three novel approaches for generalized tasks based on the Enhanced- Steiner algorithm. First, we extend the Enhanced-Steiner algorithm to a generalized version for generalized tasks. Second, rather than picking a seed node randomly in the original Enhanced-Steiner algorithm, we consider the potential interactions among experts surrounded by required skills and propose a density-based measure for selecting the seed node more effectively. Third, we present a novel grouping-based approach to find the team for generalized tasks. Our method starts from aggregating experts in the expertise network to a group graph according to the required skills. To satisfy the required skills with specific numbers, we devise a Role- Composition algorithm to extract the final subgraph of team by connecting experts according to different roles among groups. Finally, our method is evaluated on communication cost, cardinality of team members, and the number of intermediators (i.e., experts who have no required skills but to ensure the connectivity of communications). We also investigate the time efficiency of the proposed methods. The remainder of this paper is organized as follows: in Section II, we review some related works. The problem definition and some notations are described in Section III. In Section IV we present the generalized Enhanced-Steiner algorithm for generalized tasks. Then we propose the densitybased measure to improve the generalized Enhanced-Steiner algorithm in Section V. In Section VI, we propose the groupbased team formation method for generalized tasks. Section VII exhibits experimental results and Section VIII concludes this paper. II. RELATED WORKS Previous works related to our approach can be categorized into the team formulation, and connection subgraph discovery. Team Formation. The team formation problem is majorly tackled in the field of Operations Research. Wi et al. [14] solves the team formation by transforming into an integer programming problem to find an optimal match between individuals and requirements. In [6], Fitzpatrick et al. evaluates individuals drive and temperament to investigate the quality of a team. Chen et al. [2] estimates experts interpersonal attributes from psychological view to arrange a team. While the above three studies purely work from the viewpoint of psychology, some consider the underlying interactions among experts in a term. Gaston et al. [7] studies the potential relationship between the diverse network structures among experts and the performance of a team. However, they neither take it as a computational problem nor find the team. In [1], Cheatham et al. simply collects the neighbored individuals surrounded each skill in a social-concept graph to form the team. But they totally ignore the communication cost among individuals. Lappas et al. [9], as described in Section I, is the first to solve the team formation problem with consideration of communication cost. They proposed two approximated methods, Cover-Steiner and Enhanced-Steiner algorithms, to solve the team formation problem for basic tasks. Experimental results showed the Enhanced-Steiner outperforms the Cover- Steiner algorithm. The Enhance Steiner algorithm is a two step algorithm. The first step constructs an enhanced graph which enhances the expertise social network by adding the skill nodes and connecting an edge from each skill node to each individual node who is expert in the skill. An example is shown in Figure 1. The second step searches for the solution by finding a Steiner tree from the enhanced graph. Given a graph H=(V H, E H ), a required set of vertices V R V H, a Steiner tree is a connected and acyclic subgraph of H which spans all vertices of V R with minimum cost. To find a Steiner tree, there exist many algorithms. Lappas et al. [9] proposed a greedy heuristic algorithm shown in Algorithm 0. Initially, the algorithm starts by selecting a skill node randomly from the enhanced graph (line 2). Then each round of the algorithm finds the skill node that has the minimum distance to the set of nodes which already added to the solution (line 4). All the nodes along the shortest path from this skill node to the current solution are added to the new current solution set (line5 & 6). Algorithm 0. Enhanced-Steiner for basic tasks. Input: G=(V,E), V={1,...,n}; the skill sets {X 1,..., X n } of individuals; a task R={s 1,...,s q }. Output: Team V V and its induced subgraph G[V ]. 1: H=(V H, E H ) EnhancedGraph(G, R). 2: V v, where v is a random node from V R. 3: while (V R \ V ) φ do 4: v* argmin u R\V dist(u, V ) in H. 5: if Path (v*, V ) φ then 6: V V {Path (v*, V )}. 7: V V \ {s 1,..., s q }. Connection Subgraph Discovery. Given a set of nodes, the problem of connection subgraph discovery is to find a subgraph possessing the best connection between the query nodes. Its objective is similar to team formation except that each node is not associated with a set of skills. Faloutsos et al. [5] is the first to find the connection subgraph for two given nodes. The most well-known solution is the random walk with restart (RWR) [10], which measures the proximity between nodes. According to the variations of input forms, including allowing AND/OR constraints [11], providing some interactive feedback with users [12], querying a small graph describing the desired relationships between entity types [13], RWR is evaluated to be an approach to extract effective connection subgraphs. More recently, a Steiner-Tree-based approximation algorithm, STAR [8], was proposed to find the connection 10

3 subgraph in multi-relational graphs. Moreover, Cheng et al. considered the community structures with the modularity measure for connection subgraph finding [4]. They also proposed a sort of correlation index to find the groups that query nodes belong to, as well as the best connection among these groups [3]. III. PROBLEM DEFINITION [Definition 3.1] Let A={a 1,...,a m } be a universe of m skills. An expertise social network is an undirected and weighted graph G=(V, E). Each node i in V={1,...,n} is an individual who possesses a set of skills X i A. Each edge (i, j) in E captures the interaction between two individuals. The weight on each edge (i, j) stands for the collaboration cost between individual i and j. Note that edges with low (high) weight represent the high (low) collaboration cost between two nodes. For example, in the co-authorship network, if two persons coauthor more publications, the weight on their edge would be lower correspondingly. [Definition 3.2] A generalized task R = (S, K) is a set of required skills, {(s i, k i ) i, 1 i q, s i A, k i is an integer}, where k i denotes the required number of experts in skill s i, q is the number of required skills. [Definition 3.3] Given a graph G=(V, E) and V V, the communication cost of V is defined as the sum of the weights of edges of the minimum spanning tree on the induced subgraph G[V ], denoted by CC(V ). Problem Definition: Given an expertise social network G=(V, E) and a generalized task R, the team formation problem for generalized task is to find a set of individuals V V which forms an induced subgraph G[V ] such that (1) ( si, ki) R, si U j V ' X j, (2) ( si, ki) R, ki { j j V ' and si X j}, and (3) the communication cost CC(V ) is minimized. IV. GENERALIZED ENHANCED-STEINER ALGORITHM The original Enhanced-Steiner algorithm [9] was proposed to solve the team formation problem for basic tasks. In this section, we propose to generalize this algorithm for generalized tasks. To describe this algorithm, some definitions are given. [Definition 4.1] Given two nodes i, j V, the distance dist(i,j) between two nodes i and j is the weighted sum of edges along their shortest path. Meanwhile, path(i, j) is the sequence of nodes along their shortest path. [Definition 4.2] The distance between a node i and a set of nodes V is defined as dist(i, V ) = min j V dist(i, j). Likewise, path(i, V ) is the set of nodes along the shortest path from i to j. Based on these two definitions, Algorithm 1 incrementally finds and adds the selected team members to the solution set. Therefore, two sets of nodes, U and V are maintained. U contains the skill nodes whose number of experts does not meet, while V is the current solution set which contains the selected individual nodes. The algorithm repeats several rounds until the number of experts for each required skill is sufficient (line 4). At each round, a skill node from U is found. This is the node v* that has the minimum distance to V (line 6). Then the number of required experts respect to the skill node v* is decreased by one (line 8). Moreover, the number of experts with respect to each skill of the added individual nodes along the shortest path is decreased by one, too (line 9 & 10). All the nodes, along the shortest path from v* to V, are added to the solution set V (line 11). Algorithm 1. Generalized Enhanced-Steiner for generalized tasks. Input: G=(V,E), the skill sets {X 1,..., X n } of individuals; a task R=(S, K)={(s i, k i ) 1 i q }. Output: Team V V and its induced subgraph G[V ]. 1: H=(V H, E H ) EnhancedGraph(G, S). 2: V v, where v is a random node from V R. 3: k v = k v 1, where v is a skill node. 4: while U do 5: for each s j V do if k j 0 then U = S \ s j 6: v* argmin u U dist(u, V ) in H 7: if path (v*, U) φ then 8: k v* = k v* 1 9: for each w, w X j, j path (v*,v )&j V do 10 k w = k w 1 11: V V path (v*, V ) 12: E H = E H \ {EdgesInPath(v*, V ) \ E} 13: V V \ {s 1,..., s q }. V. IMPROVEMENT OF GENERALIZED ENHANCED-STEINER The original Enhanced-Steiner algorithm and our proposed generalized algorithm start by selecting a skill node randomly from the enhanced graph. Instead of picking a seed node randomly, in this section, we improve the generalized algorithm by selecting a seed node based on the neighborhood structure of skill nodes. This comes from the observation that the higher the neighborhood density of a skill node is, the more chance this skill node has smaller distance to other nodes as traversing the enhanced graph. To achieve this goal, we propose the ε-neighborhood density, which is modified from the measure of clustering coefficient, by the following definitions. [Definition 5.1] Given two nodes v, w V, the length l(v, w) between nodes is defined as the number of edges in the shortest path between v and w. [Definition 5.2] The ε-neighborhood of a node v V, is the set of nodes N ε (v) = {w i 1 l(v, w i ) ε, w i V}. [Definition 5.3] Given a set of nodes U V, the density of 2 { eij eij E, v i, v j U} U, is defined as Density( U ) = U ( U 1 ) In other words, the density of U is the ratio of the number of edges between each pair of nodes in U to the maximal possible number of edges. [Definition 5.4] The ε-neighborhood density of a node v V, is defined as Density(N ε (v)), where N ε (v) is v s ε-neighborhood. 11

4 Therefore, line 2 of Algorithm 1 is improved by selecting a node with the highest density. Let s take Figure 1 for example again: the 2-neighborhood densities of s 1, s 2, s 3, and s 4 are 0.33, 1.0, 0.67, and 0.33 respectively. Starting from s 2 (s 3, s 4 ), the solution is {4, 5, 6} with communication cost 0.3 (0.3, 0.4); if starting from s 1, the resulted team is {3, 4, 5} with cost 0.6. VI. GROUPING-BASED TEAM FORMATION Although the generalized Enhanced-Steiner algorithm suffices to find the teams for generalized tasks, it still suffers from the following problem. When the required task consists of many skills or when the expertise social network contains many individuals and interactions, it will take much more time for the greedy search in the generalized enhanced graph. In this section, we intend to propose a grouping-based team formation method for generalized tasks. The idea of our proposed grouping method is to aggregate the raw expertise network into an abstract structure, group graph, which records only relevant individuals and potential interactions among groups for the required skills. Group graph is helpful to reduce the search space when finding the team using Enhanced-Steiner algorithm. Moreover, it is capable of guiding the graph traversals to avoid redundant communication cost and decrease the cardinality of the compiled team. Our grouping-based team formation for generalized tasks is a four-stage method. The first is the skill-based individual grouping which collects individuals into groups according to required skills. The second is to construct a group graph, in which linkages capture the individuals interactions among groups. The third one applies the Enhanced-Steiner algorithm on the constructed group graph to find an effective subgraph of groups for required skills. Finally, a role composition method is developed to organize the team satisfying both required skills and the corresponding number of experts. A. Skill-based Individual Grouping The first step is to group individuals in the expertise network according to the required skills. A group, with respect to one of the required skills, say s i, is a connected subgraph in which each individual node is good at s i. Figure 2 shows an example of grouping for the required skills s 1, s 2, and s 3. Each group is surrounded by a dotted circle. It can be observed there are two groups corresponding to skill s 1. These two groups are separated components since they have no interactions. Besides, node i belongs to two groups because i is skilled in both s 1 and s 2. Node m does not belong to any groups, because m is not skilled in any of these three required skills and won t be considered for further processing. In general, most individuals are good at more than one skill. Therefore, groups tend to be overlapped one another. This overlapping provides a potential for reducing the cardinality of the built team. Moreover, grouping is helpful for finding the team members more efficiently. B. Group Graph Construction Now we have aggregated individuals to groups based on the required skills. But the underlying interactions among groups are not modeled yet. These underlying interactions between groups are essential for finding effective communications of groups with respect to required skills. Observing Figure 2 in detail, there are three kinds of relationships between groups. First, two groups are overlapped in node i since individual i is good at both s 1 and s 2. Second, the group with respect to skill s 1 (left) interacts directly with the group with respect to skill s 2 by the collaboration between individuals d and f, d and e. The same relationship exists between the group with respect to skill s 1 (right) and the group with respect to skill s 3. Third, while one of the relationships between the group with respect to skill s 2 and that with respect to s 3 is a direct communication, the other relationship between them is an indirect communication by an inter-mediator q. Figure 2. An Example of Skill-based Individual Grouping. If we regard these three cases as the equal relationships, the closeness (i.e., overlapped, direct, and indirect) of two groups will be ignored and it could bring about ineffective teams. Therefore, we further associate each group interaction with a weight value to encode the communication costs between groups. The costs will not only reflect the correlation between different required skills but also guide us how to search an effective combination of groups. By integrating groups of individuals, interactions between groups, and weights on relationships, we construct a group graph to be the indexed structure that represents the abstracted information about the required skills in the expertise social network. Here we formally define the group graph, group nodes, and group links. [Definition 6.1] A group graph H=(V H, E H ) is a weighted graph and is constructed according to the required skills from the expertise social network G=(V, E), where V H is a finite set of group nodes, E H V H V H is a finite set of group links, and each edge (grp si, grp sj ) E H is associated with a weight w H ij. [Definition 6.2] Group nodes are defined according to the required skills. For a required skill s S, a group node grp s V H contains a set of nodes V (grp s ) V in G and must satisfy (1) u V '( grps), s X u, (2) nodes in V (grp s ) form an induced connected subgraph G[V (grp s )]. [Definition 6.3] A group link e H E H is defined on two group nodes grp si and grp sj in G H. Besides, the corresponding induced subgraphs of these two group nodes, G[V (grp si )] and G[V (grp sj )], need to be reachable one another. Note that a group link between its two induced subgraphs in G can be overlapped, direct, or indirect connected. 12

5 A group graph may be regarded as a super-level-graph of the expertise network while each group in a group graph is a super-node with respect to a set of individual nodes. To encode the interactions between groups into the group graph, we associate each group link with a weight. The weight on a group link is calculated by the communication costs between individuals in two groups from the original expertise network G. Given two groups, each of which contains several individuals in G, we employ the distance measure in single link hierarchical clustering to compute the weight. Specifically, for a group link e H = (grp si, grp sj ), the minimum shortest length between individuals in grp si and grp sj from the expertise graph G=(V,E) is defined as the weight of e H. It can be formulated as weight( eh ) = min{ distg ( u, v)}. where u V (grp si ) V, v V (grp sj ) V, and e H E H, dist G (u, v) means the shortest distance between node u and v in the expertise network G. Moreover, we need to keep track of the mapping between each group link and its corresponding minimum shortest path. We denote this mapping as MSP(e H )=path(u, v). Figure 3 shows the group graph for the expertise network in Figure 2. Zero weight indicates that two groups are overlapped. The MSP of the group link e H =(s 2, s 3 ) is the path containing edges (h, q) and (q, p) in G of Figure 2. In this final stage, we present a role composition method to decode the connection subgraph of groups H[V H ] and to find the team members who communicate effectively. The rationale of our role composition method lies in that the members of the final team V act as different functional roles in their communication network G[V ]. An inter-mediator who doesn t meet any of the required skills acts as the mediator between two skilled groups. For example, in Figure 4, the individual q acts as an inter-mediator between skilled groups s 2 and s 3. Connectors deal with coordinating people between different skilled groups. The connectors can be individuals who are good at multiple skills, or individuals who communicate directly with the inter-mediator. For example in Figure 4, the individual i is a connector who is good at both s 1 and s 2. Both h and p are also connectors who communicate directly with intermediator q. Intra-mediators are individuals who communicate with more than one connector directly. For the example in Figure 4, the individual g is an intra-mediator connecting two connectors i and h within the skilled group s 2. The other individuals who belong to one skilled group are regarded as collaborators. Hence, the individuals, including f, l, k, n, and o, are collaborators. Recall that we have recorded the mapping from group links to the corresponding minimum shortest path between two groups. Thus we can find these roles easily. Figure 3. Group Graph Construction from Figure 2. The effective subgraph of Figure 3 C. Applying Ehanced-Steiner Algorithm The constructed group graph provides two merits allowing us to find an effective connection among groups for required skills in an efficient manner. First, the group graph reduces the search space of expertise network into a well-condensed form that increases the efficiency and scalability for team formation. Second, since we minimize the costs among groups by shortest paths, the group graph can guide the greedy graph search to avoid redundant costs and decrease the cardinality of team members and inter-mediators. We apply the Enhanced-Steiner algorithm to the group graph. Similar to the original and generalized Enhanced-Steiner algorithm, an enhanced graph is constructed by adding and connecting those required skills to group nodes that possess corresponding skills. By adopting Algorithm 1, a subgraph that connects groups with required skills using minimum communication cost will be derived. We denote this effective subgraph of groups by H[V H ], where V H V H. By continuing the example of Figure 3, its effective subgraph of groups is shown in Figure 3. D. The Role Composition Method Till now we have found the connection subgraph of groups satisfying all required skills with minimum communication cost between groups. However, we will still stay in group level and do not consider the specified number for each required skill. Figure 4. The final team of Figure 2 by Role Composition method. Algorithm 2. Role Composition. Input: An effective subgraph of groups H[V H ] =(V H, E H ); the mapping from groups link to minimum shortest path MSP; Output: Team V V and its induced subgraph G[V ]. 1: for each e H E H do 2: V V MSP(e H ). //add connectors and inter-mediators 3: Update k i and k j of end vertices grp si and grp sj of e H. 4: For each pair=<x, y> of nodes in V do 5: if x grp si and y grp si, where grp si V H then 6: // G[grp si ] is the induced graph of individuals in grp si. 7: V V {Path(x, y) in G[grp si ]}.//add intra-mediators 8: Update k i of s i. 9: while a skill s i whose specified number k i is not met do 10: v* = arg min {dist(u, V ) in G}. u V '\ grpsi and grpsi V ' H 11: V V v*. // add collaborators 12: Update k i of s i. Based on the observed different roles within/among groups, including connectors, inter-mediators, intra-mediators, and collaborators, we present a role composition algorithm in the following to report the resulted team for the given generalized tasks. We first find the connectors and inter-mediators (line 1-3). And then we find the intra-mediators by connecting the connectors within a group (line 4-8). Finally we check the 13

6 specified number of experts for each required skill (line 9). If the numbers of any required skills are not met, we will add more collaborators by shortest paths in the corresponding groups until satisfying the given generalized task (line 10-12). Figure 4 shows the resulted subgraph of team for the generalized task {<s 1, 2>, <s 2, 4>, <s 3, 3>}, in which those highlighted nodes and edges compose of the final subgraph. VII. EVALUATIONS In this section we evaluate and compare the performance of our proposed methods to find teams for generalized tasks. We show that our proposed improvement and the new method can not only provide good quality on effectiveness in terms of the communication cost, cardinality of team, and number of intermediators, but also can be executed efficiently in terms of run time. A. The DBLP Dataset We use the DBLP bibliography database to be our expertise data. The snapshot on December 30, 2008 of data mining related conferences (including KDD, ICDM, SDM, PAKDD, PKDD, ICML, CIKM, WWW, and SIGIR) are used. We construct the expertise social network using co-authorships. The set of skilled persons consists of authors that co-work at least three papers. The skill set X i of each author i consists of the set of terms occurring in at least four paper titles that he published. Two authors are connected in the network if they co-authored at least three papers. Totally there are 5482 authors, skills, and edges. The weights on edges are computed by w(i, j)=1/ P i P j, where P i is the set of papers of i. B. Experiment Design We conduct the experiment to demonstrate the effectiveness, including the communication cost and cardinality of the team, on the generalized Enhanced-Steiner algorithm with random seed, the generalized Enhanced-Steiner algorithm with high density seed, the grouping-based approach with random seed, and the grouping-based approach with high density seed. Instead of picking skills randomly from entire skill set [9], we take each paper as one task and the corresponding terms as the required skills. Papers in four years (i.e., ) are used, and the numbers of required skill sets are 1467, 1712, 1755, and 1824 respectively. On the other hand, for generalized tasks, we show the communication cost, cardinality of the team, number of inter-mediators, and the efficiency on run time. Since there is no information about the specified numbers for skills (i.e., teams) in the collected DBLP dataset, we generate the synthetic data for generalized tasks. Every generated task is controlled by two parameters: (1) t the number of required skills in the task, and (2) r a fixed ratio which determines how the number of specified number for a skill. Specifically, we randomly pick t required skills from the terms appearing in paper titles. And if a term occurs F times in all paper titles, we round off F*r to be the specified number of the skill. For the results we show in the following, we set t {2, 4,..., 20} and r=0.02. For every (t, r), we generate 100 random generalized tasks and report the average results obtained by different methods. (c) Figure 5. Average communication cost, average cardinality of teams, (c) average number of inter-mediator. C. Experimental Results Generalized Tasks Figure5 compares the effectiveness and efficiency of our proposed algorithms for generalized tasks. In Figure 5, Generalzied+Random, Generalized+Density, Grouping+Random, and Grouping+Density denotes the generalized Enhanced-Steiner algorithm with random seed, the generalized Enhanced-Steiner algorithm with high density seed, the grouping-based approach with random seed, and the grouping-based approach with high density seed respectively. Figure 5 and 5 shows that our proposed grouping-based method outperforms our generalized algorithm on both communication cost and the cardinality of the team. Especially as the number of required skills grows, the effectiveness of the grouping-based method is more remarkable. It is because the grouping can not only reduce the costs for individuals with the 14

7 same skill but also minimize the inter-mediators between groups, as shown in Figure 5(c). Besides, we can observe that numbers of inter-mediators do not rise as the numbers of required skills increase for the grouping-based method. In a short summary, for generalized tasks, using our proposed grouping-based method can find more effective teams. Table 1. The manually selected generalized tasks, in which skills are either from multiple disciplinary research fields or irrelevant topics. ID Generalized Tasks (# of required skills = 6) A {visualization=3, gene=2, graph=5, video=2, convex=2, crawler=1} B {music=1, biological=2, translation=6, image=4, kernel=3, coding=2} C {biological=3, editing=1, entropy=2, security=3, language=8, video=3} D {speech=3, interactive=7, query=3, social=1, optimization=2, security=2} (c) Figure 6. Average communication cost, average cardinality of teams, and (c) average number of inter-mediators by using our generalized and grouping methods w/o density for basic tasks. (c) Figure 7. Communication cost, cardinality of the team, and (c) the number of inter-mediators for selected generalized tasks listed in Table 1. Basic Tasks Basic tasks are special cases of generalized tasks. We can exploit the proposed generalized Enhance- Steiner algorithm and grouping-based approach to find teams of experts for basic tasks. Figure 6 compares the effectiveness of the Enhanced-Steiner algorithm and the proposed approaches. Since the experimental results in [9] showed that the Enhanced-Steiner performs better than the Cover-Steiner algorithm, we only compare the effectiveness of our proposed algorithms with Enhanced-Steiner. In Figure 6, Lappas denotes the Enhanced-Steiner algorithm, which starts with random seed, proposed by Lappas et al. Lappas+Density denotes the modified version of the Enhanced-Steiner algorithm which starts with high density seed. The results, as shown in Figure 6 and 6, demonstrate that the Lappas and the generalized approaches perform better than the groupingbased method on both communication cost and the cardinality of the formed team. This is possibly due to the inter-mediators. 15

8 For basic tasks, only one individual is needed for each required skill. The proposed generalized approaches employ more intermediators as shortcuts to have good results. On the other hand, the grouping-based approaches minimize the number of intermediators and needs longer pathways to meet the tasks, as given in Figure 6(c). In short, for the basic tasks, using our density-based Enhanced-Steiner method will obtain better effectiveness. Moreover, we demonstrate the ability of handling skills scattered in either diverse research fields or irrelevant topics for our proposed grouping-based method. We manually compile four generalized tasks whose number of required skill is 6, as listed in Table 1. The performance on communication cost, cardinality, and number of inter-mediators is shown in Figure 7, where the Avg. is the value # required skill=6 under the corresponding measure in Figure 7. We can see the grouping-based method outperforms the generalized algorithm in general. This indicates the grouping is capable of finding teams of experts effectively for skills distributed in highly diverse fields. Figure 8. Average execution time run in seconds when using the proposed generalized and grouping-based methods w/o high density seed. Efficiency Figure 8 shows the time efficiency for our generalized and grouping-based methods. It can be observed that the grouping-based method outperforms the generalized one in general, except for the number of required skills is low (i.e., bellow 6). As the number of required skills grows, the average execution time (in second) of the group-based method grows linearly and slowly while that of the generalized one rises very quickly. It is attributed to the grouping, which produces the group graph, drastically reduces the search space. In summary, our grouping-based method can find effective teams for generalized tasks in an efficient manner. VIII. CONCLUSIONS We tackle the problem of finding teams of experts for generalized tasks consisting of a set of required skills and each of which is associated with a specified numbers. To form an effective team efficiently, we modify the Enhanced-Steiner algorithm to deal with generalized tasks. Moreover, to pick the seed node more wisely, rather than randomly, we propose a density-based method to pick a good seed. More importantly, we propose a grouping-based approach with a role composition method to further lift the effectiveness and efficiency of the team formation process. The experimental results show our proposed algorithms form an effective team for generalized task efficiently, where the effectiveness is evaluated by the communication cost, cardinality of the team, and the number of inter-mediators. REFERENCES [1] M. Cheatham and K. Cleereman. Application of Social Network Analysis to Collaborative Team Formation. In Proc. of Intl. Symposium on Collaborative Technologies and Systems, , [2] S. J. Chen and L. Lin. Modeling Team Member Characteristics for the Formation of a Multifunctional Team in Concurrent Engineering. IEEE Transactions on Engineering Management, 51(2): , [3] J. Cheng, Y. Ke, and W. Ng. Efficient Processing of Group-Oriented Connection Queries in a Large Graph. In Proc. of ACM Conference on Information and Knowledge Management (CIKM 09), , [4] J. Cheng, Y. Ke, W. Ng, and J. X. Yu. Context-Aware Object Connection Discovery in Large Graphs. In Proc. of IEEE Intl. Conference on Data Engineering (ICDE 09), , [5] C. Faloutsos, K. S. McCurley, and A. Tomkins. Fast Discovery of Connection Subgraph. In Proc. of ACM Intl. Conference on Knowledge Discovery and Data Mining (KDD 04), , [6] E. L. Fitzpatrick and R. G. Askin. Forming Effective Worker Teams with Multi-functional Skill Requirements. Computers Industrial Engineering, 48(3), , [7] M. Gaston, J. Simmons, and M. desjardins. Adapting Network Structures for Efficient Team Formation. In Proc. of the AAAI Fall Symposium on Artificial Multi-agent Learning, [8] G. Kasneci, M. Ramanath, M. Sozio, F. M. Suchanek, and G. Weikum. STAR: Steiner-Tree Approximation in Relationship Graphs. In Proc. of IEEE Intl. Conference on Data Engineering (ICDE 09), , [9] T. Lappas, K. Liu, and E. Terzi. Finding a Team of Experts in Social Networks. In Proc. of ACM Intl. Conference on Knowledge Discovery and Data Mining (KDD 09), , [10] H. Tong, C. Faloutsos, and J.-Y. Pan. Fast Random Walk with Restart and Its Application. In Proc. of IEEE Intl. Conference on Data Mining (ICDM 06), , [11] H. Tong and C. Faloutsos. Center-piece Subgraph: Problem Definition and Fast Solution. In Proc. of ACM Intl. Conference on Knowledge Discovery and Data Mining (KDD 06), , [12] H. Tong, H. Qu, H. Jamjoom, and C. Faloutsos. ipog: Fast Interactive Proximity Querying on Graphs. In Proc. of ACM Intl. Conference on Information and Knowledge Management (CIKM 09), , [13] H. Tong, C. Faloutsos, B. Gallagher, and T. Eliassi-Rad. Fast Best-effort Pattern Matching in Large Attributed Graphs. In Proc. of ACM Intl. Conference on Knowledge Discovery and Data Mining (KDD 07), , [14] H. Wi, S. Oh, J. Mun, and M. Jung. A Team Formation Model Based on Knowledge and Collaboration. Expert System with Application, 36(5): ,

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method Farhadi F, Sorkhi M, Hashemi S et al. An effective framework for fast expert mining in collaboration networks: A grouporiented and cost-based method. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(3): 577

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

CSC200: Lecture 4. Allan Borodin

CSC200: Lecture 4. Allan Borodin CSC200: Lecture 4 Allan Borodin 1 / 22 Announcements My apologies for the tutorial room mixup on Wednesday. The room SS 1088 is only reserved for Fridays and I forgot that. My office hours: Tuesdays 2-4

More information

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations 4 Interior point algorithms for network ow problems Mauricio G.C. Resende AT&T Bell Laboratories, Murray Hill, NJ 07974-2070 USA Panos M. Pardalos The University of Florida, Gainesville, FL 32611-6595

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas Exploiting Distance Learning Methods and Multimediaenhanced instructional content to support IT Curricula in Greek Technological Educational Institutes P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou,

More information

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Proceedings of 28 ISFA 28 International Symposium on Flexible Automation Atlanta, GA, USA June 23-26, 28 ISFA28U_12 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Amit Gil, Helman Stern, Yael Edan, and

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

On-Line Data Analytics

On-Line Data Analytics International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob

More information

On the Combined Behavior of Autonomous Resource Management Agents

On the Combined Behavior of Autonomous Resource Management Agents On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

School Size and the Quality of Teaching and Learning

School Size and the Quality of Teaching and Learning School Size and the Quality of Teaching and Learning An Analysis of Relationships between School Size and Assessments of Factors Related to the Quality of Teaching and Learning in Primary Schools Undertaken

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Mining Student Evolution Using Associative Classification and Clustering

Mining Student Evolution Using Associative Classification and Clustering Mining Student Evolution Using Associative Classification and Clustering 19 Mining Student Evolution Using Associative Classification and Clustering Kifaya S. Qaddoum, Faculty of Information, Technology

More information

Multimedia Application Effective Support of Education

Multimedia Application Effective Support of Education Multimedia Application Effective Support of Education Eva Milková Faculty of Science, University od Hradec Králové, Hradec Králové, Czech Republic eva.mikova@uhk.cz Abstract Multimedia applications have

More information

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Exploration. CS : Deep Reinforcement Learning Sergey Levine Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?

More information

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Active Learning. Yingyu Liang Computer Sciences 760 Fall Active Learning Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven,

More information

Matrices, Compression, Learning Curves: formulation, and the GROUPNTEACH algorithms

Matrices, Compression, Learning Curves: formulation, and the GROUPNTEACH algorithms Matrices, Compression, Learning Curves: formulation, and the GROUPNTEACH algorithms Bryan Hooi 1, Hyun Ah Song 1, Evangelos Papalexakis 1, Rakesh Agrawal 2, and Christos Faloutsos 1 1 Carnegie Mellon University,

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

Efficient Online Summarization of Microblogging Streams

Efficient Online Summarization of Microblogging Streams Efficient Online Summarization of Microblogging Streams Andrei Olariu Faculty of Mathematics and Computer Science University of Bucharest andrei@olariu.org Abstract The large amounts of data generated

More information

Radius STEM Readiness TM

Radius STEM Readiness TM Curriculum Guide Radius STEM Readiness TM While today s teens are surrounded by technology, we face a stark and imminent shortage of graduates pursuing careers in Science, Technology, Engineering, and

More information

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing D. Indhumathi Research Scholar Department of Information Technology

More information

A Comparison of Standard and Interval Association Rules

A Comparison of Standard and Interval Association Rules A Comparison of Standard and Association Rules Choh Man Teng cmteng@ai.uwf.edu Institute for Human and Machine Cognition University of West Florida 4 South Alcaniz Street, Pensacola FL 325, USA Abstract

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

Truth Inference in Crowdsourcing: Is the Problem Solved?

Truth Inference in Crowdsourcing: Is the Problem Solved? Truth Inference in Crowdsourcing: Is the Problem Solved? Yudian Zheng, Guoliang Li #, Yuanbing Li #, Caihua Shan, Reynold Cheng # Department of Computer Science, Tsinghua University Department of Computer

More information

Attributed Social Network Embedding

Attributed Social Network Embedding JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, MAY 2017 1 Attributed Social Network Embedding arxiv:1705.04969v1 [cs.si] 14 May 2017 Lizi Liao, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua Abstract Embedding

More information

Abstractions and the Brain

Abstractions and the Brain Abstractions and the Brain Brian D. Josephson Department of Physics, University of Cambridge Cavendish Lab. Madingley Road Cambridge, UK. CB3 OHE bdj10@cam.ac.uk http://www.tcm.phy.cam.ac.uk/~bdj10 ABSTRACT

More information

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract

More information

Welcome to. ECML/PKDD 2004 Community meeting

Welcome to. ECML/PKDD 2004 Community meeting Welcome to ECML/PKDD 2004 Community meeting A brief report from the program chairs Jean-Francois Boulicaut, INSA-Lyon, France Floriana Esposito, University of Bari, Italy Fosca Giannotti, ISTI-CNR, Pisa,

More information

Matching Similarity for Keyword-Based Clustering

Matching Similarity for Keyword-Based Clustering Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

A Reinforcement Learning Variant for Control Scheduling

A Reinforcement Learning Variant for Control Scheduling A Reinforcement Learning Variant for Control Scheduling Aloke Guha Honeywell Sensor and System Development Center 3660 Technology Drive Minneapolis MN 55417 Abstract We present an algorithm based on reinforcement

More information

Organizational Knowledge Distribution: An Experimental Evaluation

Organizational Knowledge Distribution: An Experimental Evaluation Association for Information Systems AIS Electronic Library (AISeL) AMCIS 24 Proceedings Americas Conference on Information Systems (AMCIS) 12-31-24 : An Experimental Evaluation Surendra Sarnikar University

More information

A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique

A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique Hiromi Ishizaki 1, Susan C. Herring 2, Yasuhiro Takishima 1 1 KDDI R&D Laboratories, Inc. 2 Indiana University

More information

BENCHMARK TREND COMPARISON REPORT:

BENCHMARK TREND COMPARISON REPORT: National Survey of Student Engagement (NSSE) BENCHMARK TREND COMPARISON REPORT: CARNEGIE PEER INSTITUTIONS, 2003-2011 PREPARED BY: ANGEL A. SANCHEZ, DIRECTOR KELLI PAYNE, ADMINISTRATIVE ANALYST/ SPECIALIST

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

arxiv: v1 [math.at] 10 Jan 2016

arxiv: v1 [math.at] 10 Jan 2016 THE ALGEBRAIC ATIYAH-HIRZEBRUCH SPECTRAL SEQUENCE OF REAL PROJECTIVE SPECTRA arxiv:1601.02185v1 [math.at] 10 Jan 2016 GUOZHEN WANG AND ZHOULI XU Abstract. In this note, we use Curtis s algorithm and the

More information

Customized Question Handling in Data Removal Using CPHC

Customized Question Handling in Data Removal Using CPHC International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 29-34 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Customized

More information

An Introduction to Simio for Beginners

An Introduction to Simio for Beginners An Introduction to Simio for Beginners C. Dennis Pegden, Ph.D. This white paper is intended to introduce Simio to a user new to simulation. It is intended for the manufacturing engineer, hospital quality

More information

A heuristic framework for pivot-based bilingual dictionary induction

A heuristic framework for pivot-based bilingual dictionary induction 2013 International Conference on Culture and Computing A heuristic framework for pivot-based bilingual dictionary induction Mairidan Wushouer, Toru Ishida, Donghui Lin Department of Social Informatics,

More information

Data Modeling and Databases II Entity-Relationship (ER) Model. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases II Entity-Relationship (ER) Model. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases II Entity-Relationship (ER) Model Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Database design Information Requirements Requirements Engineering

More information

University of Groningen. Systemen, planning, netwerken Bosman, Aart

University of Groningen. Systemen, planning, netwerken Bosman, Aart University of Groningen Systemen, planning, netwerken Bosman, Aart IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots Varun Raj Kompella, Marijn Stollenga, Matthew Luciw, Juergen Schmidhuber The Swiss AI Lab IDSIA, USI

More information

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 98 (2016 ) 368 373 The 6th International Conference on Current and Future Trends of Information and Communication Technologies

More information

Calibration of Confidence Measures in Speech Recognition

Calibration of Confidence Measures in Speech Recognition Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

More information

BMBF Project ROBUKOM: Robust Communication Networks

BMBF Project ROBUKOM: Robust Communication Networks BMBF Project ROBUKOM: Robust Communication Networks Arie M.C.A. Koster Christoph Helmberg Andreas Bley Martin Grötschel Thomas Bauschert supported by BMBF grant 03MS616A: ROBUKOM Robust Communication Networks,

More information

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models Jung-Tae Lee and Sang-Bum Kim and Young-In Song and Hae-Chang Rim Dept. of Computer &

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

Reinforcement Learning by Comparing Immediate Reward

Reinforcement Learning by Comparing Immediate Reward Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate

More information

Characteristics of Collaborative Network Models. ed. by Line Gry Knudsen

Characteristics of Collaborative Network Models. ed. by Line Gry Knudsen SUCCESS PILOT PROJECT WP1 June 2006 Characteristics of Collaborative Network Models. ed. by Line Gry Knudsen All rights reserved the by author June 2008 Department of Management, Politics and Philosophy,

More information

PNR 2 : Ranking Sentences with Positive and Negative Reinforcement for Query-Oriented Update Summarization

PNR 2 : Ranking Sentences with Positive and Negative Reinforcement for Query-Oriented Update Summarization PNR : Ranking Sentences with Positive and Negative Reinforcement for Query-Oriented Update Summarization Li Wenie, Wei Furu,, Lu Qin, He Yanxiang Department of Computing The Hong Kong Polytechnic University,

More information

Disambiguation of Thai Personal Name from Online News Articles

Disambiguation of Thai Personal Name from Online News Articles Disambiguation of Thai Personal Name from Online News Articles Phaisarn Sutheebanjard Graduate School of Information Technology Siam University Bangkok, Thailand mr.phaisarn@gmail.com Abstract Since online

More information

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Ohio s Learning Standards-Clear Learning Targets

Ohio s Learning Standards-Clear Learning Targets Ohio s Learning Standards-Clear Learning Targets Math Grade 1 Use addition and subtraction within 20 to solve word problems involving situations of 1.OA.1 adding to, taking from, putting together, taking

More information

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Neural Network GUI Tested on Text-To-Phoneme Mapping A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis

More information

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,

More information

The Importance of Social Network Structure in the Open Source Software Developer Community

The Importance of Social Network Structure in the Open Source Software Developer Community The Importance of Social Network Structure in the Open Source Software Developer Community Matthew Van Antwerp Department of Computer Science and Engineering University of Notre Dame Notre Dame, IN 46556

More information

arxiv: v1 [cs.lg] 3 May 2013

arxiv: v1 [cs.lg] 3 May 2013 Feature Selection Based on Term Frequency and T-Test for Text Categorization Deqing Wang dqwang@nlsde.buaa.edu.cn Hui Zhang hzhang@nlsde.buaa.edu.cn Rui Liu, Weifeng Lv {liurui,lwf}@nlsde.buaa.edu.cn arxiv:1305.0638v1

More information

Shockwheat. Statistics 1, Activity 1

Shockwheat. Statistics 1, Activity 1 Statistics 1, Activity 1 Shockwheat Students require real experiences with situations involving data and with situations involving chance. They will best learn about these concepts on an intuitive or informal

More information

A simulated annealing and hill-climbing algorithm for the traveling tournament problem

A simulated annealing and hill-climbing algorithm for the traveling tournament problem European Journal of Operational Research xxx (2005) xxx xxx Discrete Optimization A simulated annealing and hill-climbing algorithm for the traveling tournament problem A. Lim a, B. Rodrigues b, *, X.

More information

Seminar - Organic Computing

Seminar - Organic Computing Seminar - Organic Computing Self-Organisation of OC-Systems Markus Franke 25.01.2006 Typeset by FoilTEX Timetable 1. Overview 2. Characteristics of SO-Systems 3. Concern with Nature 4. Design-Concepts

More information

CSL465/603 - Machine Learning

CSL465/603 - Machine Learning CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am

More information

A Case-Based Approach To Imitation Learning in Robotic Agents

A Case-Based Approach To Imitation Learning in Robotic Agents A Case-Based Approach To Imitation Learning in Robotic Agents Tesca Fitzgerald, Ashok Goel School of Interactive Computing Georgia Institute of Technology, Atlanta, GA 30332, USA {tesca.fitzgerald,goel}@cc.gatech.edu

More information

MYCIN. The MYCIN Task

MYCIN. The MYCIN Task MYCIN Developed at Stanford University in 1972 Regarded as the first true expert system Assists physicians in the treatment of blood infections Many revisions and extensions over the years The MYCIN Task

More information

Learning Methods for Fuzzy Systems

Learning Methods for Fuzzy Systems Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8

More information

A Study of Metacognitive Awareness of Non-English Majors in L2 Listening

A Study of Metacognitive Awareness of Non-English Majors in L2 Listening ISSN 1798-4769 Journal of Language Teaching and Research, Vol. 4, No. 3, pp. 504-510, May 2013 Manufactured in Finland. doi:10.4304/jltr.4.3.504-510 A Study of Metacognitive Awareness of Non-English Majors

More information

Mining Association Rules in Student s Assessment Data

Mining Association Rules in Student s Assessment Data www.ijcsi.org 211 Mining Association Rules in Student s Assessment Data Dr. Varun Kumar 1, Anupama Chadha 2 1 Department of Computer Science and Engineering, MVN University Palwal, Haryana, India 2 Anupama

More information

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.

More information

(Sub)Gradient Descent

(Sub)Gradient Descent (Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

AP Statistics Summer Assignment 17-18

AP Statistics Summer Assignment 17-18 AP Statistics Summer Assignment 17-18 Welcome to AP Statistics. This course will be unlike any other math class you have ever taken before! Before taking this course you will need to be competent in basic

More information

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE Pierre Foy TIMSS Advanced 2015 orks User Guide for the International Database Pierre Foy Contributors: Victoria A.S. Centurino, Kerry E. Cotter,

More information

Comment-based Multi-View Clustering of Web 2.0 Items

Comment-based Multi-View Clustering of Web 2.0 Items Comment-based Multi-View Clustering of Web 2.0 Items Xiangnan He 1 Min-Yen Kan 1 Peichu Xie 2 Xiao Chen 3 1 School of Computing, National University of Singapore 2 Department of Mathematics, National University

More information

Transfer Learning Action Models by Measuring the Similarity of Different Domains

Transfer Learning Action Models by Measuring the Similarity of Different Domains Transfer Learning Action Models by Measuring the Similarity of Different Domains Hankui Zhuo 1, Qiang Yang 2, and Lei Li 1 1 Software Research Institute, Sun Yat-sen University, Guangzhou, China. zhuohank@gmail.com,lnslilei@mail.sysu.edu.cn

More information

An Online Handwriting Recognition System For Turkish

An Online Handwriting Recognition System For Turkish An Online Handwriting Recognition System For Turkish Esra Vural, Hakan Erdogan, Kemal Oflazer, Berrin Yanikoglu Sabanci University, Tuzla, Istanbul, Turkey 34956 ABSTRACT Despite recent developments in

More information

The Evolution of Random Phenomena

The Evolution of Random Phenomena The Evolution of Random Phenomena A Look at Markov Chains Glen Wang glenw@uchicago.edu Splash! Chicago: Winter Cascade 2012 Lecture 1: What is Randomness? What is randomness? Can you think of some examples

More information

Discriminative Learning of Beam-Search Heuristics for Planning

Discriminative Learning of Beam-Search Heuristics for Planning Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University

More information

Term Weighting based on Document Revision History

Term Weighting based on Document Revision History Term Weighting based on Document Revision History Sérgio Nunes, Cristina Ribeiro, and Gabriel David INESC Porto, DEI, Faculdade de Engenharia, Universidade do Porto. Rua Dr. Roberto Frias, s/n. 4200-465

More information

TD(λ) and Q-Learning Based Ludo Players

TD(λ) and Q-Learning Based Ludo Players TD(λ) and Q-Learning Based Ludo Players Majed Alhajry, Faisal Alvi, Member, IEEE and Moataz Ahmed Abstract Reinforcement learning is a popular machine learning technique whose inherent self-learning ability

More information

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT The Journal of Technology, Learning, and Assessment Volume 6, Number 6 February 2008 Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the

More information

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are:

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are: Every individual is unique. From the way we look to how we behave, speak, and act, we all do it differently. We also have our own unique methods of learning. Once those methods are identified, it can make

More information

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C Using and applying mathematics objectives (Problem solving, Communicating and Reasoning) Select the maths to use in some classroom

More information