An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

Size: px
Start display at page:

Download "An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method"

Transcription

1 Farhadi F, Sorkhi M, Hashemi S et al. An effective framework for fast expert mining in collaboration networks: A grouporiented and cost-based method. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(3): May DOI /s An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method Farnoush Farhadi, Maryam Sorkhi, Sattar Hashemi, and Ali Hamzeh School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran {farnoush.farhadi, m.sorkhi.iust}@gmail.com; s hashemi@shirazu.ac.ir; ali@cse.shirazu.ac.ir Received September 1, 2011; revised February 13, Abstract The growth of social networks in modern information systems has enabled the collaboration of experts at a scale that was unseen before. Given a task and a graph of experts where each expert possesses some skills, we tend to find an effective team of experts who are able to accomplish the task. This team should consider how team members collaborate in an effective manner to perform the task as well as how efficient the team assignment is, considering each expert has the minimum required level of skill. Here, we generalize the problem in multiple perspectives. First, a method is provided to determine the skill level of each expert based on his/her skill and collaboration among neighbors. Second, the graph is aggregated to the set of skilled expert groups that are strongly correlated based on their skills as well as the best connection among them. By considering the groups, search space is significantly reduced and moreover it causes to prevent from the growth of redundant communication costs and team cardinality while assigning the team members. Third, the existing RarestFirst algorithm is extended to more generalized version, and finally the cost definition is customized to improve the efficiency of selected team. Experiments on DBLP co-authorship graph show that in terms of efficiency and effectiveness, our proposed framework is achieved well in practice. Keywords expert team, social network, team formation 1 Introduction Team formation is an essential process in the field of project management. The skill levels of experts in the social network and how they effectively collaborate and communicate with each other are important factors to achieve effectiveness and efficiency in a selected team. Existence of a social network among people is common in real life scenarios. Social networks such as Facebook, MySpace, Twitter and other examples between professionals including LinkedIn and Xing can be used as a source to connect individuals who form a team for a project. In this case, the related graph indicates that people in the same community or department can collaborate easier than those who work in different divisions. So, choosing individuals according to their previous collaborations and skillfulness levels can be useful for an effective team formation. In this paper we study the problem of finding a team of experts who accomplish a given task effectively and efficiently. Suppose that there is a set of candidates X = {1,..., n} where each candidate i is an expert with a set of skills s i while the skillfulness level of each individual i per each skill s i is specified. Considering the skill grade of selected individuals can affect the efficiency and quality of the project. It is supposed that the individuals are organized as a weighted and undirected graph G(X, E). The weights of the edges in graph G show interaction between individuals and it is used as a measure to reflect how the corresponded individuals can communicate and collaborate closely. Given a task T that requires some skills s i, our aim is to find a team of individuals X X where each skill in T is covered by at least specified required number of experts aforementioned. Lappas et al. [1] have proved the team formation problem is NP-hard. They propose two approximate methods, RarestFirst and Enhanced Steiner algorithms, by defining the communication cost as diameter and sum of the weights of minimum spanning tree, respectively without considering the skill grading and specified required number of experts for each required skill. They only concentrate on two instantiations of the communication cost since they believe these criteria Regular Paper Springer Science + Business Media, LLC & Science Press, China

2 578 J. Comput. Sci. & Technol., May 2012, Vol.27, No.3 are practical, simple and intuitive. C. Li et al. [2] propose a grouping graph method that condenses the expertise information to a grouped graph according to the required skills. They extend the Enhanced Steiner algorithm of Lappas et al. [1] to its generalized version for generalized tasks and offer a Generalized Enhanced Steiner algorithm for generalized tasks by associating each required skill with a specified number of experts. To investigate the team formation problem for generalized tasks based on RarestFirst algorithm of Lappas et al. [1] and with a method to calculate the skill grade of experts, our Skill Grading method is associated with a new adapted definition of communication cost to optimize both distance and skill level of individuals as an instantiation of the efficiency in the selected team. Given an expert social network, forming a team of experts for a given task that includes some required skills is intended. Despite this approach, a generalized algorithm is determined to specify the required number of experts for each skill in the given task. Each expert has some skills with different specified skillfulness levels which are calculated according to an adapted similarity measure that described in detail in Subsection To satisfy the real-life projects, we face real job tasks whose skills should be performed by a specific number of experts. For example, as shown in Fig.1, we have an expert social network where every node represents an expert and the undirected weighted edges indicate the existence of previous communication between corresponded nodes. Suppose, we have a task T = {(s 1, n 1 ), (s 2, n 2 ), (s 3, n 3 )} with some required skills that every skill s i needs some specific number of experts n i to fulfill a task T. To build a team for this task based on RarestFirst algorithm of Lappas et al. [1], a qualified team must include: {f, e, l, k} with diameter = 0.5. But if a given task is defined as T = {(s 1, 2), (s 2, 4), (s 3, 2)}, then the best team with Fig.1. Example of expert social network, the collaboration between individuals is shown as an undirected and weighted graph. consideration of the generalized task, must be: {n, g, f, e, l, k, m}, with diameter = 0.8. Specifically, in this scenario, to generalize the basic problem, the expert team must satisfy the following: 1) every required skill s i, must be covered by at least its corresponded specific required number of experts n i, 2) all the required skills in a given task must be accomplished by the selected team and furthermore, 3) the total communication cost among selected individuals must be minimized as much as possible. An overview of our framework is shown in Fig.2. Given a social network, we first calculate the skillfulness level of individuals with respect to all required tasks. Next, we extract groups of candidates who share a common skill. Taking the cost into consideration, the redundant groups are filtered and the appropriate groups of candidates for satisfying each required skill are identified. The final step is then organized to apply the Generalized Diameter algorithm to find the qualified candidates. By combining the Generalized Diameter algorithm with an adapted cost definition, we design a framework able to find an effective and efficient team of experts that are relevant to the given task. Fig.2. Overview of our proposed framework. The main contributions of this paper are as following. First, we present a new method to calculate the skillfulness degree of each expert based on the combination of previous experiences and the ability of collaboration with neighbors respecting each specified skill, in a given social network. Second, we also propose a groupbased method to aggregate the expert network into a grouped graph which is helpful to reduce the search space and prevent from extra communication costs and unrelated individuals. And third, a more generalized version of RarestFirst algorithm for general tasks by associating with a customized cost definition to optimize both diameter and skillfulness of team members, is presented. Note that the important criteria for graph grouping include: 1) finding a group of experts who share their skills and 2) ensuring each expert in the

3 Farnoush Farhadi et al.: An Effective Framework for Fast Expert Mining 579 group has tight collaboration with all members in the group. This feature is done because of reducing the search space and avoiding the redundant communication costs and after all, assigning the most proper team members for Generalized Diameter algorithm. The experimental results show our approach, as well as the algorithm, works fine in practice. Also, we investigate the efficiency and effectiveness of the proposed method. The rest of the paper is organized as follows. Section 2 gives a brief review on the related prior work in the field of team formation, skill grading and graph grouping. Section 3 introduces the proposed framework as follows: first problem definition and preliminaries are defined and after that we mention our proposed methods including: Skill Grading, Skilled Grouping method and Generalized Diameter algorithm associated with a customized cost definition. The experimental results on DBLP dataset are presented in Section 4 and finally the conclusion is stated in Section 5. 2 Related Work Previous work related to our method can be categorized into three groups. 2.1 Team Formation In recent years, there has been increasing research interest in team formation problem such as [3-6]. In these studies they address methods to solve team formation problem by transforming it into an integer programming. Simulated annealing [3], branch-and-cut [6], and genetic algorithm [5] are used to find an optimal match between individuals and requirements. Chen et al. [4] use a psychological test to form a team by estimating the individuals interpersonal relationship attributes and their personalities. In [7], Fitzpatrick et al. evaluate individuals drive and temperament to reach the fine quality of a team. Although these studies have been done based on psychology, they do not pay attention to the graph structure among individuals. The correlation between different graph structures and performance of a team has been shown by Gaston et al. [8], but they do not consider a computational problem of finding team formation. Kautz et al. [9] present a referral web tool for modeling social networks to assist in locating experts and evaluation tasks. Cheatham et al. [10] exactly consider the structure of social network by collecting the neighbors surrounding each skill in a social-concept graph. But they do not pay attention to the communication cost among individuals. Backstrom et al. [11] provide useful information about different aspects of team formation in social networks. The team formation problem in social networks is first solved with consideration of communication cost by Lappas et al. [1], as described in Section 1. They propose an algorithm based on diameter as RarestFirst algorithm to solve the team formation problem for basic tasks. In this approach, the algorithm first, computes the support set of each required skill s i in task T, then finds the skill s rare with the lowest cardinality support S(s rare ). For each of the candidates from the S(s rare ), the algorithm forms a subgraph by exploring the closest connected individuals in other support sets {S(s i ) s i T and s i s rare }. Finally, among all constructed subgraphs, RarestFirst algorithm selects the one that leads to the smallest diameter subgraph. The worst case running time of the RarestFirst algorithm is O( S(s rare ) n) = O(n 2 ), where n = V is the number of individuals in graph G(V, E). The proposed RarestFirst algorithm of Lappas et al. [1] for basic tasks is shown in Algorithm 1. Algorithm 1. RarestFirst Algorithm for Basic Tasks [1] Input: Graph G(V, E); individuals skill sets {X 1,..., X n} and task T. Output: Team V V and its induced subgraph G[V ]. 1. For every s i T do 2. S(s i) = {ind s i X i} 3. s rare arg min si T S(si) 4. For every ind S(s rare) do 5. For s i T and s i s rare do 6. R is d(ind, S(s i)) 7. R ind max si R is 8. ind arg minr ind 9. V = ind {path(ind, S(s i)) s i T } To determine the specific required number of individuals for each required skill in the given task, C. Li et al. [2] provided a generalized version of the basic Enhance-Steiner algorithm in [1]. In this paper we offer a Generalized Diameter algorithm of the basic Rarest- First algorithm in [1] that mentioned above. 2.2 Skill Grading Matching the skillfulness level of individuals who most closely satisfy the task requirements, is determined based on evaluating the residual amount of skill sets which are or are not satisfied by the combination of individuals under consideration studied in many communities. To formulate these applications, Zakarian et al. [6] and Boon et al. [12] aim to handle the problem in a fuzzy manner. Zakarian et al. [6] propose an analytical model for selecting the multi functional teams that prioritize team members based on task requirements. Boon et al. [12] enhance the quality of the team by combining the qualities of the candidates with the functional requirements in order to assign the right individuals to the right teams. In another study, to measure the suitability of the individual s skill set for

4 580 J. Comput. Sci. & Technol., May 2012, Vol.27, No.3 performing a task, Korwin et al. [13] address an algorithm to consider the candidates skills and utilize fuzzy set theory to evaluate compatibilities of skill sets. In [14], forming a qualified and attributed team where each attributed team satisfies at least the minimum requirements and required skills for a specific task is presented. In this case of study, a fuzzy mathematical programming is modeled and a solution algorithm based on simulated annealing is proposed for building an effective and qualified attributed team. N. Liu et al. [15] measure the similarity between users based on the correlation between their rankings of the items rather than the rating values and propose a new collaborative filtering algorithm for ranking items based on the preferences of similar users. Candillier et al. [16] give guidelines in and discuss combining traditional similarity measures to design a new weighted similarity criterion in the field of recommender systems. Here, we use a Skill Grading method for achieving an effective and efficient team based on existence weighted social network and communication cost. To achieve a specific weighted similarity measure, we propose a formula to combine Pearson and Jaccard measures to benefit from their complementarities. Pearson correlation corresponds to the cosine of deviation from the mean and considers only the common attributes. On the contrary, Jaccard similarity measures the overlap that two individuals share with their skills, but does not take into account the different use of the rating scale by different individuals. So combination of these two traditional measures with simple production is useful to apply the benefit from their advantages. 2.3 Finding Good Connection Subgraphs There are several studies on finding a good connection for given query individuals in a large graph. Given a set of individuals, the problem is to find a small connected subgraph with the best connection between the query individuals. Faloutsos et al. in [17] provide a method to find a good connection subgraph for knowledge discovery in large social networks graphs. Tong et al. [18] use the random walk with restart (RWR) method to measure the proximity of two nodes. There are many approaches to use this method (RWR) in their solutions. For CenterPiece subgraphs in [19] they let a whole family of AND/OR constraint queries, [20] proposes an efficient method (refer to ipog-b) for bipartite graphs, and [21] applies RWR and CenterPiece subgraphs idea to their proposed methods as Graph X- Ray, to find the subgraphs that match a user query pattern. Finding close relationships between a given set of multi entities is an important building block for many search, ranking, and analysis tasks. Kasneci et al. [22] present a new approximation algorithm, STAR, for relationship queries over large relationship graphs. Cheng et al. [23] use the modularity method to improve the efficiency of objective connection discovery, by breaking the large graphs into much smaller communities, and define a new correlation group method in [24] to find the groups which the query nodes belong to while considering the best connection among these groups. 3 Proposed Framework: Overview 3.1 Problem Definition Let us first define the graded variant of the team formation problem as Problem 1. Problem 1. Given the social network G(V, E), the graded skill set of individuals and a task T containing required skills s i and minimum requirement skill levels, the team formation problem is to find a set of experts V V who provide the best cover of individuals skills to required skills and form a subgraph G[V ] such that: 1) each required skill s i must be covered by at least one individual whose degree of relevant skillfulness is higher or equal to the minimum level requirement, 2) all the required skills in the given task must be accomplished by the team and furthermore, 3) the total communication cost among selected individuals must be minimized as much as possible. It is denoted by diameter communication cost of V, CC(V ). So, we first present the graded variant of the Rarest- First algorithm. For scaling the nodes abilities, a new method is proposed in this paper. Degree of skillfulness of individuals is modeled by means of a formulation as Skill Grading. In this case, a minimum level requirement of each skill for completion of the task is explicitly mentioned. Similarly, for each individual i with skill s i, the level of skillfulness with respect to s i is specified. Then only individuals whose skillfulness level is higher or equal to the minimum required skillfulness level are able to compete for covering the given skills in task T. The identical basic version of the team formation problem was accomplished by Lappas et al. [1] and we described the graded version of that as Skill Grading in Subsection Next, a skilled group method is applied to aggregate the expert graph into a grouped graph which only keeps the related individuals for the required skills and it is capable of filtering the redundant candidates and preventing the algorithm from extra communication cost. This phase is done offline. We discuss in details the skilled-group method in Subsection In order to generalize the graded variant of the team formation problem for generalized tasks, we define Problem 2. Problem 2. Given the graded social network G(V, E),

5 Farnoush Farhadi et al.: An Effective Framework for Fast Expert Mining 581 the graded skill set of individuals and a generalized task T containing required skills s i associating with specific required number of experts n i and corresponded minimum requirement skill levels m i, the team formation problem for generalized tasks is to find a set of experts V V who provide the best cover of individuals skills to satisfy the required skills and form a subgraph G[V ] such that: 1) each candidate ind must have the minimum skillfulness level m i to overcome other candidates for accomplishing skill s i, 2) each required skill s i must be covered by at least its relevant specific required number of experts n i : (s i, n i ) T, n i j, where j V and s i X j, 3) all the required skills in the given task must be accomplished by the team and additionally, 4) CC(V ) must be minimized as much as possible. Furthermore, to optimize the diameter communication cost, we redefine it as a combined measure of distance and skillfulness level to consider skillfulness and collaboration potential of target individuals. In the following, generalized version is presented in Subsection and adapted cost definition is mentioned in Subsection 3.3.4, respectively. 3.2 Preliminaries A team formation problem includes experts, a weighted social network structure, and a set of skills S. Assume a set of n individuals, X = {1,..., n} and a universe collection of m skills, S = {s 1, s 2,..., s m }. The relationship between these two sets is interpreted by each individual covering the set of skills, X i S. If s j X i, individual i has skill s j, else individual i does not have skill s j. We often refer to the skill set of an individual in our algorithm. Also, a subset of individuals X X has skill s i, if there is at least one individual in X has skill s i. A task T is simply a subset of skills S required to accomplish the project. The social network is modeled as an undirected and weighted graph G(V, E), where the vertices are experts v V, and the edges represent collaboration cost in joint activities. The edge weight w, describes the distance between two experts. The frequent collaboration cost between two experts is shown by a small edge weight, and the rare collaboration cost is shown by the high one. The edge of an unconnected pair of nodes is weighted with a heavy value, indicates dissimilarity between corresponded nodes. As it is assumed that the graph G is connected, it can transform every disconnected original graph to a connected one by simply adding very high-weight edges between every pair of nodes that belong to different connected components. Note that this very high weight is a number higher than the sum of all pair-wise shortest paths in each connected component. The graph distance function for every two nodes i, j V is the weight of the shortest path between them in G, denoted by d(i, j). Also, let path(i, j) be the set of nodes along their shortest path. The distance between node i V and a set of nodes is defined by d(i, V ) = min j V d(i, j) and path(i, V ) is defined as a set of nodes along the shortest path from i to the node j, where d(i, j) = min i V d(i, i ). A generalized task T = (S, N) is a set of required skills, {(s i, n i ) i, 1 i q, s S, n i is an integer}, where n i denotes the required number of experts for s i and q is the number of required skills to complete the task T. 3.3 Proposed Framework: Details In this subsection, we provide the details of our proposed framework. There are four basic modules of the method, as we mentioned before. The first, Skill Grading, a graded variant of skill set is calculated. The Skilled Grouping module groups all correlated and related individuals who share common skills in G and filters the redundant candidates. The Generalized Diameter module, gives a generalized version of problem for generalized tasks by associating with a modified cost definition Skill Grading The goal of this subsection is to find a solution for scaling the skill abilities of individuals. We believe that skill grading model for experts in a social network should not only rely on their skill profiles. The main idea of our method is presenting a formulation considering both expert skill information and team working ability level in a collaborative network. We focus on two well-known traditional similarity metrics that have attracted extensive research in recommender systems community, Pearson and Jaccard (see [16, 25-28]). Pearson correlation corresponds to the cosine of deviation from the mean and it considers only the attributes in common. On the other hand, Jaccard similarity measures the overlap that every two individuals share with their attributes, but does not take into account the different use of rating scale by different users. In our proposed Skill Grading algorithm, these two traditional measures are combined with simple production to benefit from their advantages. This combined measure comprises between 0 and 1. In this adapted formulization, to determine the skillfulness level of each individual i per each skill s i, the similarity between the mentioned individual and each of his/her neighbors j, with respect to s i is calculated. In (1) and (2), P si i (resp. P si j ) is the number of papers published by individual i (resp. j) with respect to

6 582 J. Comput. Sci. & Technol., May 2012, Vol.27, No.3 skill s i, and SkillSet(i) in (2) is the size of skillset of individual i. Note that in our setting, this formulation should be implemented offline as a preprocessing phase. The formulas are shown as below: Skill-Grading(i, s i ) = j N i {( P S i i P si i P si j P si j ) } P si i P i P si j P j ( j N i (P si i P i ) 2 ) ( j N i (P si j P j ) 2 ) (1) s SkillSet(i) P i = P i s. (2) SkillSet(i), grouping with respect to each required skill s 1, s 2 and s 3. Each group is surrounded by a dash curves. As Fig.3 illustrates, there are two connected groups for s 1, {b, c} and {i, j, k, m}. Node k belongs to connected groups of both s 1 and s 2. As same as k, n is qualified at both s 2 and s Skilled Grouping We discuss an efficient grouping scheme that not only decomposes a graph into a set of high correlated communities, but also retains the interconnection edges between the groups. Although the RarestFirst algorithm in [1] finds the team for a given task T, when the required task needs many skills or when the expert graph includes many interactions, assigning proper individuals to required skills may take much more time for the exploration in graph. Skilled Grouping method takes the given graph G(V, E) and individuals skill sets and outputs some groups containing only related individuals with best connection to assign to the task. Motivated by above, to derive an effective subgraph of related individuals, we present our three-stage Skilled Grouping method for generalized tasks which is shown in Algorithm 2. Algorithm 2. Skilled Grouping Algorithm Input: Graph G(V, E); individuals skill sets {X 1,..., X n} and task T. Output: A well-condensed induced subgraph of G[V ] as a grouping form. 1. Grp Expert grouping in the expertise network based on the required skills in T. 2. G(Grp, E ) Modeling the construction of the underlying interconnections among groups. 3. G[Grp ] Applying RarestFirst to G(Grp, E ). The algorithm first aggregates the nodes according to the required skills, considering the connection among nodes and outputs Grp, a set of some groups as abstract nodes including related individuals with best connections that can be used for discovery instead of the original graph. A group with respect to each required skill s i, say g si, is a strongly connected component including a maximal group of nodes that are mutually reachable and each node in g si belongs to support set of s i in a qualified manner. Fig.3 shows an example of skilled Fig.3. Skilled grouping of expert graph shown in Fig.1, circles illustrate individuals, directed lines show correlation among nodes, and dash-curves are interpreted as groups according to required skills. Considering most experts possess more than one required skill, the groups might have overlapping with each other. Skilled Grouping method applies the benefits of overlapping to decrease the size of the selected groups by modeling the nodes and interactions in the grouped graph as an abstract structure in step 2. In this structure, each group represents as an abstract node and the groups interactions are associated with a weighted edge (e A ) encoding the communication costs between corresponded groups and it can be calculated as (3), where k g si, l g sj and distance G (k, l) is computed by the Dijkstra s shortest path along k and l in the original graph G. Weight(e A ) = min{distance G (k, l)}. (3) This quality could decrease the cardinality of the selected team and helps to find the final team members in an effective manner. If there is a node not belonging to any group, it should be ignored, unless it was located in the interconnected path between two groups. For example in Fig.3, node q can be ignored while node o should be considered. Recall that the track of the mapping from group edges to the corresponding minimum shortest path between two groups must be retained. Hence, the interactions among groups can be defined by the abstract edges. For example, in Fig.3, s 1 (left) and s 2 (top) are connected through {c, p} with weight 0.15 and {c, o, d} with weight 0.1. Fig.4(a) shows the grouped graph for the original graph G in Fig.3. Group edges related to overlapped groups are indicated with

7 Farnoush Farhadi et al.: An Effective Framework for Fast Expert Mining 583 zero weights. The abstract graph allows us to find an effective induced subgraph of constructed groups for covering required skills, efficiently. Grouped graphing reduces the search space into a well-condensed form of abstract structure which provides efficiency and scalability. By modeling the construction of the underlying interconnections among groups through minimizing the cost of interconnected paths, group graphing leads to the reduced graph to prevent the proposed method from extra search, irrelevant individuals and also it reduces the team cardinality. In stage 3, we apply the RarestFirst algorithm to the abstract graph G(Grp, E ) for finding an effective subgroup of experts. Therefore, an effective subset of Grp as Grp including connected groups of experts for covering required skills and its induced subgraph G[Grp ] with minimum diameter will be derived. By continuing the example illustrated in Fig.4(a), its effective subgraph is shown in Fig.4(b). the effective abstract subgraph G[Grp ]. Hence, the subgraph of G(V, E) containing related experts and effective communication could be extracted easily. Fig.5 shows the reduced graph of original one as an input for Generalized Diameter algorithm to report a qualified team for given generalized tasks. Fig.5. Final graph as an input for Generalized Diameter algorithm Generalized Diameter Algorithm Fig.4. (a) Abstract group graph built from Fig.3. (b) Effective subgraph of abstract graph of Fig.4(a). Forming components of step 1 via implementation of formulas presented in Tarjan s algorithm [29] is of order O(q ( V + E )), where q is the number of required skills in task T which is often very small and for all the experiments studied in this paper is at most 20. Since we assume that all pairs shortest paths have been pre-computed and have been stored in a hash table, the computational complexity for modeling the abstract graph of groups that is described in step 2 is of order O( Grp E ), where E is the size of the captured interactions among the resulted groups from step 1 and Grp is the number of groups which is often more than q and in the worst case equals to n. The worst case analysis of step 3 gives us an estimation of O( Grp 2 ). Therefore, the worst case order of computational complexity of the proposed Skilled Grouping algorithm will be of order O(n 2 ). Until now, as Fig.4(b) is shown, we extract the connected subgraph of groups covering required skills with the smallest diameter and the individuals of the interconnected path between groups. Recall that we have recorded the mapping from original graph G(V, E) to A basic approach based on diameter as RarestFirst algorithm in [1], was proposed to solve the team formation problem for basic tasks. In this subsection, we will turn our attention to an improved greedy and incremental method to a generalized version for generalized tasks by presenting the Generalized Diameter algorithm in Algorithm 3. In general, it takes a graph G(V, E) where V = n, the skill set of individuals skill and a general task T including a set of some required skills s i, specific required individuals for each skill s i, n i, and the minimum skillfulness level for performing s i, m i. Here, it must be noticed that the reduced output subgraph of Skilled Grouping algorithm is considered as an input graph for Generalized Diameter algorithm. In this case, the size of V is too small in comparison with that of original graph. Hence, we apply the same notations and preliminaries as those used in the previous algorithms for laying stress on scalability and generality of the proposed algorithm even in large graphs. The algorithm works as follows. It first computes the support set S(s i ) for every skill s i T and then, it picks the s rare T with the lowest cardinality support set S(s rare ) which is formulized as (4): S(s rare ) = {ind s i X ind }, s rare = arg min S(s i ). (4) s i T For every individual ind belonging to S(s rare ), Algorithm 3 first checks whether the skillfulness level of individual ind regarding skill s rare (skill(ind, s rare )) is equal or greater than m (line 3). If the initial condition

8 584 J. Comput. Sci. & Technol., May 2012, Vol.27, No.3 is satisfied, in the next step, the algorithm initializes to construct a subgraph V for ind, incrementally, reads the skill set and for each skill s i (s i T and s i s rare ) finds its relevant support set (line 8). To cover a required number of experts for completing each s i, n i, at each round the individual t, supporter of s i who holds the l-th place of the minimum distance array to ind, is picked and the nodes along the corresponding shortest path are added to V (line 16), then, it is decreased by one from the number of required experts for skill s i (line 17). and extracts corresponding nodes as the final selected team (lines 22 23). By this computation, we guarantee that at least the specific required number of experts for covering each required skill in given task T is selected as an effective and condensed structure. Fig.6 shows the effective team for T = {(s 1, 2), (s 2, 4), (s 3, 2)}, based on Skilled Grouping and Generalized Diameter algorithms. Algorithm 3. Generalized Diameter Algorithm Input: G(V, E), the skill set skill = {X 1,..., X n} of individuals; task T = (S, N, M) = {(s i, n i, m i) 1 i q}. Output: Team V V and its induced subgraph G[V ]. 1. Calculate s rare and its support set as S(s rare) 2. For every ind S(s rare) do 3. If skill(ind, s rare) m rare 4. V ind 5. Read skill 6. For s i T and s i s rare do 7. l = 0 8. Supp S(s i) 9. For t Supp do 10. If skill(t, s i) m i 11. dist d(ind, t) 12. While (n i > 0) 13. l = l t l-th arg min t dist 15. Ri sind t 16. V V path(ind, t ) 17. n i n i For w X j, j path(ind, t )&j S(s i) 19. n w n w Remove w from skill set of j 21. R ind max Ri sind 22. ind arg min ind R ind 23. V ind {path(ind, S(s i) s i T )} Moreover, the path is checked to ensure if other required skills, w, are covered by the nodes along the added shortest path. In case of realization of covering other required skills by an individual j already added, it is reduced by one from the number of required experts for skill w and the skill set of individual j is updated to prevent the algorithm from reviewing the skill ability of j with respect to skill w in the next rounds (lines 18 20). The incremental construction of subgraph V is terminated when all the required skills and the minimum number of experts corresponding to each skill is satisfied. Then, the diameter of subgraph V is calculated as R i (line 21). Finally, among all subgraphs V constructed by S(s rare ), the algorithm picks the one that leads to the smallest diameter subgraph Fig.6. Effective team based on Skilled Grouping and Generalized Diameter algorithms for the graph shown in Fig.1. We assume that all pairs shortest paths have been pre-calculated and have been stored in a hash table. Then, for the graph G(V, E), the total running time of Generalized Diameter is O( S(s rare ) n (δ path(ind, t ) )). A worst case analysis suggests that the computations of S(s rare ) and δ path(ind, t ) can be done in O(n) time, since δ is the maximum specific required number of experts for each required skill and it is a constant. Therefore the worst case running time of the Generalized Diameter algorithm is O(n 3 ). Hence, S(s rare ) and path(ind, t ) are much smaller than n, and the total running time of the proposed implementation is much less than this worst case analysis in practice. Apparently, this running time is prohibitive for large graphs, but recall that we proposed already an efficient Skilled Grouping method in Subsection to reduce the search space and the Generalized Diameter algorithm only discovers on an effected subgroup of original graph. However, this running time is much higher than that RarestFirst suggests, (O( S(s rare ) n) = O(n 2 )), but it should be noticed that unlike RarestFirst, the Generalized Diameter algorithm rechecks the added path in each iteration ensuring whether other required skills are included or not. This improvement prevents Generalized Diameter algorithm from extra discovery to satisfy already required skills and leads to a subgraph with smaller cardinality and diameter and also it can be performed in an efficient run time Cost Modification In this study, we have concentrated on diameter as

9 Farnoush Farhadi et al.: An Effective Framework for Fast Expert Mining 585 a criterion of effectiveness of the selected team. The diameter definition refers to the longest shortest path between any pair of nodes in the graph. By considering the Dijkstra s algorithm as a method to compute the distance among nodes, the diameter computation only relies on the physical distance. Since effectiveness of the team is a relatively general term, we can generalize its meaning by redefining the magnitude of the diameter as an instantiation of the communication cost. Unlike Lappas et al. [1], we tend to optimize the diameter as a measure of both distance and skillfulness level of team members. Hence, according to this modification, a new definition of distance cost is presented. The cost of the path which is followed by individuals i to j for covering the required skill s i is composed of two components: the former defines the skillfulness of j regarding the mentioned skill s i and the latter includes the measurable distance between i and j. These two contributed components of the cost optimization assess the importance and effectiveness of the team members based on this improvement. A balanced parameter ε is used to control the importance of each of the skillfulness and distance components of individual j. It can be formulated as: Cost(i, j) si = [ε (1 Skill Grading(j, s i ))]+ [ (1 ε) d(i, j) num ], (5) where the parameter d(i, j) is calculated using the Dijkstra s shortest path along i and j and num is the number of nodes along their shortest path while it is used to normalize the cost which is returned by the value cost. Also, the balanced ε is set to 0.2 in our experiments. We apply this cost definition in our Generalized Diameter algorithm instead of using d(i, j) and everywhere is in need of distance calculation. 4 Experimental Evaluation In this section, we perform an experimental analysis of our proposed framework. The goal of the analysis is twofold: to quantitatively evaluate and compare the performance of the different algorithms for team formation problem over basic and general tasks and understand when each algorithm gives a better result; to qualitatively compare the algorithms by carrying out a case study. 4.1 Dataset For our experiments, we use the DBLP database as a benchmark dataset which is publicly available from the DBLP portal. The snapshot of dataset was taken on April 12, 2006 and the data is related to papers which are published in areas of database (DB), data mining (DM), artificial intelligence (AI), and theory (T) conferences. They are used in order to balance the need of covering the diverse fields (including 19 venues as follows: {SIGMODE, VLDB, ICDE, ICDT, EDBT, PODS, WWW, KDD, SDM, PKDD, ICDM, ICML, ECML, COLT, UAI, SODA, FOCS, STOC and STACS}). Such diversity is expected, since the team formation problem often needs to find a team of experts in a large diverse social network. It is referred to the set of selected papers as the DBLP dataset. We make up the expert social network using co-authorship graph. For the collection of skilled authors as experts, the authors that have less than three papers in DBLP dataset are pruned away to make the expert selection process meaningful. The skill set X i of each author i consists of the terms that appear in at least two titles of their papers in co-authorship DBLP. For the skill extraction, we use the terms which are extracted from Bibsonomy tag tools for the avoidance of noisy tags. In this corpus, each paper is assigned tags that are descriptive of its nature. Two authors are connected in the network if they have co-authored at least two papers. The weights on edges are computed as: w(i, j) = 1 p i p j p i p j, (6) where P i (resp. P j ), is a set of papers published by i (resp. j). The graph distance between two nodes in graph G dblp is computed by using the shortest path distance, as it was described in Subsection 3.2. Weights on edges represent pair-wise Jaccard distance among each pair of nodes in our connected graph. Totally, there are authors, distinct skills and edges. 4.2 Algorithms We consider eight different algorithms for two versions of team formation problems, basic and general versions. More specifically, we have the following algorithms: RarestFirst: Algorithm 1, an identical variation of team formation problem for basic tasks, presented by Lappas et al. [1], which is mentioned in Subsection 2.1. GS: the Generalized Enhanced Steiner algorithm, a generalized version of team formation problem for general tasks, which is presented by Li et al. [2] GSG: the Generalized Enhanced Steiner algorithm associated with a grouping-based approach presented by Li et al. [2] GD: the Generalized Diameter algorithm, a generalized version of team formation problem for general tasks, which is described in detail in Subsection

10 586 J. Comput. Sci. & Technol., May 2012, Vol.27, No.3 GDG: the Generalized Diameter algorithm with Skilled Grouping method. This is meant to first serve as a grouping method in order to reduce the search space and then, apply the Generalized Diameter algorithm. GDGG: the Generalized Diameter algorithm with Skill Grading and Skilled Grouping methods, a graded variant and generalized version of team formation problem for general tasks. This is meant to first determine the skillfulness level of experts (see Subsection 3.3.1), exploit the Skilled Grouping approach and finally, apply the Generalized Diameter algorithm. GDCG: the Generalized Diameter algorithm with new cost definition explained in Subsection and Skill Grading method, a graded variant and generalized version of team formation problem for general tasks. This is meant to first determine the skillfulness level of experts and then, apply the Generalized Diameter algorithm with adapted cost definition. GDCGG: our proposed framework which is presented in this paper and constructed of Generalized Diameter algorithm associated with Skill Grading, Skilled Grouping methods and also, new cost definition. 4.3 Experiments Design In this subsection the experiments that show the performance of our algorithm are conducted, including the communication cost, cardinality of the selected team and efficiency based on our proposed framework for generalized tasks. Each generated task T = (S, N, M) = {(s i, n i, m i ) 1 i q} is characterized by three parameters: 1) q the number of required skills in task T ; 2) f a fixed ratio that determines how the required number of experts for each skill in task T is specified; and 3) c a fixed ratio for determining the minimum skillfulness level for each required skill in task T. Specifically, a task T is generated as follows. First, q skills are picked randomly from the terms appearing in published papers belonging to 19 conferences as described in Subsection 4.1. In all experiments reported in this subsection, we use q {2, 4,..., 20}. If a skill s i supported by F individuals, it is rounded off by F f to specify the required number of experts for performing that skill as n i. Also, to determine the competence level for selecting related experts as m i, we find C, the maximum skillfulness level among all supporters of skill s i, and round off by C c to be the minimum requirement level of accomplishment skill s i. For every tuple (q, f, c), we generate 100 random general tasks for all algorithms and take the average performance achieved by different methods. For all runs, we set the values of f and c to be 0.02 and 0.5, respectively. We consider this to be a reasonable value that can give a thorough picture of a task. Recall that for basic tasks which are the special cases of the generalized tasks, we generate task T, without considering parameters N and M in our settings. 4.4 Quantitative Evaluation The goal of this subsection is to study how the proposed framework and other algorithms perform with respect to communication cost as the measure of effectiveness, team cardinality and efficiency and compare them against each other. Fig.7(a) compares the average communication cost and team cardinality of the algorithms on the same set of basic tasks. It is meant to select at least one expert for accomplishing each required skill. The following observations merge from the analysis of Fig.7(a). First, according to communication cost measure, all the algorithms apply grouping method to perform better than those that are not associated with (with the exception of GSG). As the number of required skills grows, the communication cost of the algorithms associated with grouping method is more considerable. It is due to fact that using advantages of overlapping groups and new definition of cost based on skillfulness and distance can affect effectiveness of the selected team to minimize the diameter. On the other side, the GSG algorithm tends to minimize the minimum spanning tree, intuitively. Hence, in the selected team reported by GSG, the diameter as an instantiation of communication cost and also effectiveness is more than those suggested by diameter-based algorithms. Second, although as the number of required skills grows, the RarestFirst algorithm tends to construct relatively large teams. It is noticeable that using grouping methods in diameter-based algorithms as well as GS algorithm leads to an enormous increase, which is explained by the fact that the grouping-based methods aim to select the subgroups of nodes including more overlapping. This limitation causes to an upward trend in team cardinality. Compared to GD and other diameter-base algorithms associated with grouping method, GS generally cares of finding small teams. This can be described by this fact that the diameterbased algorithms tend to minimize the diameter of the selected team, which is less likely to be affected by adding new individuals. On the contrary, the GS algorithm is a method based on minimum spanning tree measure, which is always increased by adding new individual to the team. Furthermore, to better quantify the difference between the different diameter-based algorithms in terms of team cardinality, we compare them against each other. As can be seen in Fig.7(b), compared to RarestFirst algorithm, Generalized Diameter

11 Farnoush Farhadi et al.: An Effective Framework for Fast Expert Mining algorithm generally, achieves a dramatic decrease in the size of selected team. This is due to the fact that in Generalized Diameter algorithm, for covering each skill, the organized team is checked whether the visited individuals satisfy other required skills or not. This improvement leads to a remarkable reduction in size and communication cost. Since the size of the selected team could be crucial, especially in the expense of the project, it has been included and compared in our experimental evaluations. Now let us concentrate on average performance of our algorithms for generalized tasks. The following main observations can be understood from the analysis of Fig.8. First, by the results demonstrated in Figs. 8(a) and 8(b), it can be observed that algorithms that are associated with Skilled Grouping method (e.g., GDG vs GD and GDCGG vs GDCG) outperform those that do not apply any grouping method, in terms of 587 communication cost and cardinality. This is explained by the fact that the Skilled Grouping method tends to select the subgroups of nodes including more overlapping. By taking the advantages of the overlapping, the algorithm aims to select individuals who are skilled in multiple skills. This implies the group-based algorithms are capable of finding more effective teams with smaller cardinality. Second, as see in Figs. 8(a) and 8(b), the performance of GDGG and GDCG is worse than that of GDG and GDCGG algorithms, respectively. It is due to necessity of finding only qualified individuals who satisfy a minimum level requirement in Skill Grading based algorithms. Using group-based method leads to an improvement in terms of effectiveness and team cardinality. Third, compared to basic version of team formation problem, algorithms associated with grouping method obtain better performance for generalized tasks in terms Fig.7. Average communication cost and cardinality of the teams reported by RarestFirst, GS, GSG, GD, GDG, GDGG, GDCG and GDCGG algorithms for basic tasks. (a) Average communication cost. (b) Average cardinality. Fig.8. Average communication cost and cardinality of the teams reported by GS, GSG, GD, GDG, GDGG, GDCG and GDCGG algorithms for the general tasks. (a) Average communication cost. (b) Average cardinality.

12 588 J. Comput. Sci. & Technol., May 2012, Vol.27, No.3 of communication cost (shown in Fig.8(b)). Notice that RarestFirst algorithm is a basic and identical version of team formation problems. Hence, in Fig.8 and Fig.9, we evaluate the experiments for general tasks, without taking the RarestFirst algorithm into consideration. significant improvement in terms of efficiency, compared with GDCG. In summary, it can be seen that our proposed framework, GDCGG, is capable of forming an expert team in a good manner. The analysis of the experiments imply that the GDCGG, our proposed framework, is the method of choice by offering promising performance on communication cost, team cardinality of the selected team and time efficiency for both basic and general tasks. 4.5 Qualitative Evidence Fig.9. Effective team based on Skilled Grouping and Generalized Diameter algorithms for graph shown in Fig.1. Fig.9 indicates the efficiency evaluation of different algorithms using DBLP dataset. We find the followings. First, as seen in Fig.9, the results tell us that all the algorithms associated with Skilled Grouping method, outperform other algorithms. The mean running time (in second) of the GSG, GDG, GDGG, GDCG and GDCGG have climbed gradually as the number of required skills grows. This is because Skilled Grouping method reduces the search space into a well-condensed form of abstract structure which provides efficiency and scalability. Also, GDCGG algorithm provides a In the previous subsection, we investigated how the proposed framework performs under different conditions and how it compares with other methods. We will now perform a case study to understand how reasonable and applicable the algorithm is in practice. We consider the experts in DBLP that were described in Subsection 4.1 and evaluate our algorithms on 10 distinct tasks. Each task includes some required skills which are defined via extracting the terms appearing in the title of an already published paper. The papers are selected from the Most Cited Computer Science Articles list, maintained by CiteSeerX (citeseerx.ist.psu.edu/stats/articles). Thus, 10 tasks by choosing the top-10 cited papers from the list which are published in the set of 10 conferences covered by the DBLP dataset are formed. We make 10 generalized tasks, each one including some required skills, specific number of experts and competence level for each skill. The titles of top-10 papers and the corresponding tasks are listed in Table 1. Table 1. List of the Top-10 Most Cited Papers According to CiteSeerX and the 10 Generalized Tasks ID Paper Title Task 1 The anatomy of a large-scale hypertextual Web search engine {(anatomy, 1, 0.2), (largescale, 1, 0.4), (hypertextual, 1, 0.3), (web, 16, 0.5), (search, 4, 0.4)} 2 Fast algorithms for mining association rules {(associationrules, 2, 0.3), (rulemining, 2, 0.5)} 3 Mining association rules between sets of items in large databases {(mining, 2, 0.5), (associationrules, 2, 0.3), (sets, 2, 0.3), (items, 1, 0.2), (database, 7, 0.5)} 4 Text categorization with support vector machines: Learning {(text, 2, 0.5), (categorization, 1, 0.4), (svm, 1, 0.5), (learning, with many relevant features 14, 0.5), (relevant, 1, 0.2), (features, 1, 0.3)} 5 Conditional random Fields: Probabilistic models for segmenting and labeling sequence data {(conditional, 1, 0.1), (random, 2, 0.5), (fields, 1, 0.2), (probabilistic, 3, 0.5), (models, 3, 0.5), (segmenting, 1, 0.2), (labeling, 1, 0.3), (sequence, 1, 0.4), (data, 15, 0.5)} 6 Mining frequent patterns without candidate generation {(mining, 6, 0.5), (frequent, 2, 0.5), (patterns, 2, 0.4), (candidate, 1, 0.3), (generation, 1, 0.2)} 7 A survey of approaches to automatic schema matching {(survey, 1, 0.3), (approaches, 4, 0.5), (automatic, 2, 0.5), (schema, 1, 0.4), (matching, 1, 0.3)} 8 Automatic subspace clustering of high dimensional data for {(automatic, 2, 0.5) (subspace, 1, 0.3), (clustering, 4, 0.5), data mining applications (high dimensional, 1, 0.5),(data, 15, 0.5), (mining, 6, 0.5} 9 Models and issues in data stream systems {(models, 3, 0.5), (issues, 2, 0.5), (datastream, 1, 0.5), (systems, 3, 0.5)} 10 NiagaraCQ: A scalable continuous query system for Internet {(scalable, 1, 0.5), (continuous, 2, 0.3), (query, 6, 0.5), (system, databases 3, 0.5), (internet, 3, 0.5), (database, 7, 0.5)} Note: Skills are extracted from the titles of corresponding papers. Required number of experts and minimum skillfulness level for each skill is calculated according to description in Subsection 4.3.

13 Farnoush Farhadi et al.: An Effective Framework for Fast Expert Mining 589 We illustrate the ability of handling skills dispersed in diverse and irrelevant fields for our proposed framework. We manually run our algorithms requesting teams of experts in order to be able to cover the required skills. The performance of the resulting teams in terms of communication cost and cardinality are exhibited in Fig.10. It can be observed that our proposed framework, GDCGG, returns better teams for more diverse list of skills. For example, as the number of required skills grows, GDCGG reports better performance in terms of communication cost and cardinality. This is reasonable, since the Skilled Grouping method applied by GDCGG, is capable of finding experts for covering required skills in diverse fields in an effective manner. As we mentioned in Subsection 4.4, since the RarestFirst algorithm is a basic version of team formation problems, we analyze the quality of teams returned by algorithms, without taking the RarestFirst algorithm into consideration. applies a graph grouping method for preventing from extra communication cost and unrelated expert assignment. As a generalized version for generalized tasks, we present Algorithm 3 as Generalized Diameter algorithm associated with an adapted definition of communication cost. We report our experiment on the DBLP dataset (its 2006 co-authored papers snapshot in some specific fields of study). The experimental results show that the effective team efficiency can be found while supporting the specific generalized task, and minimizing the communication cost. The reasonable performance in comparison to the other approaches is obtained. In our setting, for the skills, it is supposed that a task (project) requires a certain set of skills to be accomplished, without consideration of the special importance and priority that every skill might have for completion of the task (project). So, one possible way to continue improving team formation problem would be its hierarchical variant. In such a variant, the position of each required skill can be modeled by means of a hierarchical structure to find subgraphs which match a given query pattern. References Fig.10. Communication cost and cardinality of the teams reported by GD, GDG, GDCG and GDCGG algorithms for selected generalized tasks listed in Table 1. (a) Communication cost. (b) Cardinality. 5 Conclusions and Future Work In this paper, we have studied the problem of finding the optimized expert team which can satisfy the specific generalized task with minimum level requirement, while minimizing the communication cost among the team members. In regards to building a graded team, we first present a method to scale the skillfulness abilities of candidates in social network. Next, our framework [1] Lappas T, Liu K, Terzi E. Finding a team of experts in social networks. In Proc. the 15th ACM Int. Conference on Knowledge Discovery and Data Mining, June 28-July 1, 2009, pp [2] Li C, Shan M. Team formation for generalized tasks in expertise social networks. In Proc. the 2nd SocialCom/PASSAT, Aug. 2010, pp [3] Baykasoglu A, Dereli T, Das S. Project team selection using fuzzy optimization approach. Cybernetics and Systems, 2007, 38(2): [4] Chen S J, Lin L. Modeling team member characteristics for the formation of a multifunctional team in concurrent engineering. IEEE Transactions on Engineering Management, 2004, 51(2): [5] Wi H, Oh S, Mun J, Jung M. A team formation model based on knowledge and collaboration. Expert Syst. Appl., July 2009, 36(5): [6] Zakarian A, Kusiak A. Forming teams: An analytical approach. IIE Transactions, 1999, 31(1): [7] Fitzpatrick E L, Askin R G. Forming effective worker teams with multi-functional skill requirements. Journal of Computers & Industrial Engineering, May 2005, 48(3): [8] Gaston M, Simmons J, DesJardins M. Adapting network structures for efficient team formation. In Proc. the AA- MAS 2004 Workshop on Learning and Evolution in Agent- Based Systems, July [9] Kautz H, Selman B, Shah M. The hidden Web. In Proc. the 13th Conference on Uncertainty in Artificial Intelligence, Aug. 1997, pp [10] Cheatham M, Cleereman K. Application of social network analysis to collaborative team formation. In Proc. the Int. Symposium on Collaborative Technologies and Systems, May 2006, pp [11] Backstrom L, Huttenlocher D, Kleinberg J, Lan X. Group formation in large social networks: Membership, growth and

14 590 J. Comput. Sci. & Technol., May 2012, Vol.27, No.3 evolution. In Proc. the 12th ACM Int. Conference on Knowledge Discovery and Data Mining, Aug. 2006, pp [12] Boon B H, Siersksma G. Team formation: Matching quality supply and quality demand. European Journal of Operational Research, 2003, 148(3): [13] Korvin A D, Shipley M F, Kleyle R. Utilizing fuzzy compatibility of skill sets for team selection in multi-phase projects. Journal of Engineering and Technology Management, September 2002, 19(3-4): [14] Dereli T, Baykasoglu A, Das G S. Fuzzy quality-team formation for value added auditing: A case study. Journal of Engineering and Technology Management, December 2007, 24(4): [15] Liu N N, Yang Q. EigenRank: A ranking-oriented approach to collaborative filtering. In Proc. the 31st SIGIR, July 2008, pp [16] Candillier L, Meyer F, Fessant F. Designing specific weighted similarity measures to improve collaborative filtering systems. In Proc. the 8th IEEE Int. Conference on ICDM, Dec. 2008, pp [17] Faloutsos C, McCurley K S, Tomkins A. Fast discovery of connection subgraph. In Proc. the 10th ACM Int. Conference on Knowledge Discovery and Data Mining, Aug. 2004, pp [18] Tong H, Faloutsos C, Pan J Y. Fast random walk with restart and its application. In Proc. the 6th IEEE Int. Conference on Data Mining, Dec. 2006, pp [19] Tong H, Faloutsos C. Center-Piece subgraph: Problem definition and fast solution. In Proc. the 12th ACM Int. Conference on Knowledge Discovery and Data Mining, Aug. 2006, pp [20] Tong H, Qu H, Jamjoom H, Faloutsos C. ipog: Fast interactive proximity querying on graphs. In Proc. the 18th ACM Int. Conference on Information and Knowledge Management, Nov. 2009, pp [21] Tong H, Faloutsos C, Gallagher B, Eliassi-Rad T. Fast besteffort pattern matching in large attributed graphs. In Proc. the 13th ACM Int. Conference on Knowledge Discovery and Data Mining, Aug. 2007, pp [22] Kasneci G, Ramanath M, Sozio M, Suchanek F M, Weikum G. STAR: Steiner-tree approximation in relationship graphs. In Proc. the 25th IEEE Int. Conference on Data Engineering, March 29-April 2, 2009, pp [23] Cheng J, Ke Y, Ng W, Yu J X. Context-aware object connection discovery in large graphs. In Proc. the IEEE Int. Conference on Data Engineering, 2009, pp [24] Cheng J, Ke Y, Ng W. Efficient processing of group-oriented connection queries in large graph. In Proc. the 18th ACM Int. Conference on Information and Knowledge Management, Nov. 2009, pp [25] Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J. GroupLens: An open architecture for collaborative filtering of netnews. In Proc. the Conference on Computer Supported Cooperative Work, Oct. 1994, pp [26] Karypis G. Evaluation of item-based top-n recommendation algorithms. In Proc. the 10th ACM Int. Conference on Information and Knowledge Management, Nov. 2001, pp [27] Linden G, Smith B, York J. Amazon com recommendations: Item to item collaborative filtering. IEEE Internet Computing, 2003, 7(1): [28] Deshpande M, Karypis G. Item-based top-n recommendation algorithms. ACM Transactions on Information Systems, 2004, 22(1): [29] Tarjan R E. Depth-first search and linear graph algorithms. SIAM Journal on Computing, 1972, 1(2): Farnoush Farhadi received her B.Sc degree from Shahid Beheshti University, Iran in 2005 majored in software engineering. She is currently a graduate student of artificial intelligent in School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran. She is a member of the soft computing group of her school since September 2008 with Professors Sattar Hashemi and Ali Hamzeh. Her research interest is in large-scale data mining for graphs, query answering, task assignment and social networks. Her current task focuses on team formation in collaboration networks regarding team performance. Maryam Sorkhi received her B.Sc degree from Iran University of Science and Technology, Iran in Currently, she is a Master candidate in the School of Electrical and Computer Engineering, Shiraz University, Iran. She is a member of the soft computing group since September Her research interests include data mining, social network analysis and machine learning. Sattar Hashemi received his B.Sc degree in computer engineering from Isfahan University of Technology, in 1998 and his M.Sc and Ph.D. degrees in computer science from the Iran University of Science and Technology in conjunction with Monash University, Australia, in He is currently an assistant professor in the Electrical and Computer Engineering School, Shiraz University, Shiraz, Iran. His research interests include data stream mining, social networks, game theory and adversarial learning. Ali Hamzeh received his M.Sc. and B.Sc. degrees in computer engineering from Shiraz University in 2002 and 2000 and his Ph.D. degree in artificial intelligence from Iran University of Science and Technology, He is currently an assistant professor of artificial intelligence in the Electrical and Computer Engineering School, Shiraz University, Shiraz, Iran. His research interests include recommender systems, social networks, evolutionary algorithms and game theory.

Team Formation for Generalized Tasks in Expertise Social Networks

Team Formation for Generalized Tasks in Expertise Social Networks IEEE International Conference on Social Computing / IEEE International Conference on Privacy, Security, Risk and Trust Team Formation for Generalized Tasks in Expertise Social Networks Cheng-Te Li Graduate

More information

CSC200: Lecture 4. Allan Borodin

CSC200: Lecture 4. Allan Borodin CSC200: Lecture 4 Allan Borodin 1 / 22 Announcements My apologies for the tutorial room mixup on Wednesday. The room SS 1088 is only reserved for Fridays and I forgot that. My office hours: Tuesdays 2-4

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

On-Line Data Analytics

On-Line Data Analytics International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Multimedia Application Effective Support of Education

Multimedia Application Effective Support of Education Multimedia Application Effective Support of Education Eva Milková Faculty of Science, University od Hradec Králové, Hradec Králové, Czech Republic eva.mikova@uhk.cz Abstract Multimedia applications have

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

Mining Association Rules in Student s Assessment Data

Mining Association Rules in Student s Assessment Data www.ijcsi.org 211 Mining Association Rules in Student s Assessment Data Dr. Varun Kumar 1, Anupama Chadha 2 1 Department of Computer Science and Engineering, MVN University Palwal, Haryana, India 2 Anupama

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

University of Groningen. Systemen, planning, netwerken Bosman, Aart

University of Groningen. Systemen, planning, netwerken Bosman, Aart University of Groningen Systemen, planning, netwerken Bosman, Aart IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

Reinforcement Learning by Comparing Immediate Reward

Reinforcement Learning by Comparing Immediate Reward Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Transfer Learning Action Models by Measuring the Similarity of Different Domains

Transfer Learning Action Models by Measuring the Similarity of Different Domains Transfer Learning Action Models by Measuring the Similarity of Different Domains Hankui Zhuo 1, Qiang Yang 2, and Lei Li 1 1 Software Research Institute, Sun Yat-sen University, Guangzhou, China. zhuohank@gmail.com,lnslilei@mail.sysu.edu.cn

More information

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations 4 Interior point algorithms for network ow problems Mauricio G.C. Resende AT&T Bell Laboratories, Murray Hill, NJ 07974-2070 USA Panos M. Pardalos The University of Florida, Gainesville, FL 32611-6595

More information

Mining Student Evolution Using Associative Classification and Clustering

Mining Student Evolution Using Associative Classification and Clustering Mining Student Evolution Using Associative Classification and Clustering 19 Mining Student Evolution Using Associative Classification and Clustering Kifaya S. Qaddoum, Faculty of Information, Technology

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

A student diagnosing and evaluation system for laboratory-based academic exercises

A student diagnosing and evaluation system for laboratory-based academic exercises A student diagnosing and evaluation system for laboratory-based academic exercises Maria Samarakou, Emmanouil Fylladitakis and Pantelis Prentakis Technological Educational Institute (T.E.I.) of Athens

More information

NCEO Technical Report 27

NCEO Technical Report 27 Home About Publications Special Topics Presentations State Policies Accommodations Bibliography Teleconferences Tools Related Sites Interpreting Trends in the Performance of Special Education Students

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data Kurt VanLehn 1, Kenneth R. Koedinger 2, Alida Skogsholm 2, Adaeze Nwaigwe 2, Robert G.M. Hausmann 1, Anders Weinstein

More information

On the Combined Behavior of Autonomous Resource Management Agents

On the Combined Behavior of Autonomous Resource Management Agents On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1 Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial

More information

Discriminative Learning of Beam-Search Heuristics for Planning

Discriminative Learning of Beam-Search Heuristics for Planning Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Probability and Statistics Curriculum Pacing Guide

Probability and Statistics Curriculum Pacing Guide Unit 1 Terms PS.SPMJ.3 PS.SPMJ.5 Plan and conduct a survey to answer a statistical question. Recognize how the plan addresses sampling technique, randomization, measurement of experimental error and methods

More information

Seminar - Organic Computing

Seminar - Organic Computing Seminar - Organic Computing Self-Organisation of OC-Systems Markus Franke 25.01.2006 Typeset by FoilTEX Timetable 1. Overview 2. Characteristics of SO-Systems 3. Concern with Nature 4. Design-Concepts

More information

Attributed Social Network Embedding

Attributed Social Network Embedding JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, MAY 2017 1 Attributed Social Network Embedding arxiv:1705.04969v1 [cs.si] 14 May 2017 Lizi Liao, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua Abstract Embedding

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

The Strong Minimalist Thesis and Bounded Optimality

The Strong Minimalist Thesis and Bounded Optimality The Strong Minimalist Thesis and Bounded Optimality DRAFT-IN-PROGRESS; SEND COMMENTS TO RICKL@UMICH.EDU Richard L. Lewis Department of Psychology University of Michigan 27 March 2010 1 Purpose of this

More information

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Proceedings of 28 ISFA 28 International Symposium on Flexible Automation Atlanta, GA, USA June 23-26, 28 ISFA28U_12 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Amit Gil, Helman Stern, Yael Edan, and

More information

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic

More information

BMBF Project ROBUKOM: Robust Communication Networks

BMBF Project ROBUKOM: Robust Communication Networks BMBF Project ROBUKOM: Robust Communication Networks Arie M.C.A. Koster Christoph Helmberg Andreas Bley Martin Grötschel Thomas Bauschert supported by BMBF grant 03MS616A: ROBUKOM Robust Communication Networks,

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

(Sub)Gradient Descent

(Sub)Gradient Descent (Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

More information

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

CHAPTER 4: REIMBURSEMENT STRATEGIES 24 CHAPTER 4: REIMBURSEMENT STRATEGIES 24 INTRODUCTION Once state level policymakers have decided to implement and pay for CSR, one issue they face is simply how to calculate the reimbursements to districts

More information

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

USER ADAPTATION IN E-LEARNING ENVIRONMENTS USER ADAPTATION IN E-LEARNING ENVIRONMENTS Paraskevi Tzouveli Image, Video and Multimedia Systems Laboratory School of Electrical and Computer Engineering National Technical University of Athens tpar@image.

More information

Pedagogical Content Knowledge for Teaching Primary Mathematics: A Case Study of Two Teachers

Pedagogical Content Knowledge for Teaching Primary Mathematics: A Case Study of Two Teachers Pedagogical Content Knowledge for Teaching Primary Mathematics: A Case Study of Two Teachers Monica Baker University of Melbourne mbaker@huntingtower.vic.edu.au Helen Chick University of Melbourne h.chick@unimelb.edu.au

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas Exploiting Distance Learning Methods and Multimediaenhanced instructional content to support IT Curricula in Greek Technological Educational Institutes P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou,

More information

BADM 641 (sec. 7D1) (on-line) Decision Analysis August 16 October 6, 2017 CRN: 83777

BADM 641 (sec. 7D1) (on-line) Decision Analysis August 16 October 6, 2017 CRN: 83777 BADM 641 (sec. 7D1) (on-line) Decision Analysis August 16 October 6, 2017 CRN: 83777 SEMESTER: Fall 2017 INSTRUCTOR: Jack Fuller, Ph.D. OFFICE: 108 Business and Economics Building, West Virginia University,

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

A simulated annealing and hill-climbing algorithm for the traveling tournament problem

A simulated annealing and hill-climbing algorithm for the traveling tournament problem European Journal of Operational Research xxx (2005) xxx xxx Discrete Optimization A simulated annealing and hill-climbing algorithm for the traveling tournament problem A. Lim a, B. Rodrigues b, *, X.

More information

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing D. Indhumathi Research Scholar Department of Information Technology

More information

A Note on Structuring Employability Skills for Accounting Students

A Note on Structuring Employability Skills for Accounting Students A Note on Structuring Employability Skills for Accounting Students Jon Warwick and Anna Howard School of Business, London South Bank University Correspondence Address Jon Warwick, School of Business, London

More information

A Comparison of Standard and Interval Association Rules

A Comparison of Standard and Interval Association Rules A Comparison of Standard and Association Rules Choh Man Teng cmteng@ai.uwf.edu Institute for Human and Machine Cognition University of West Florida 4 South Alcaniz Street, Pensacola FL 325, USA Abstract

More information

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are:

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are: Every individual is unique. From the way we look to how we behave, speak, and act, we all do it differently. We also have our own unique methods of learning. Once those methods are identified, it can make

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma International Journal of Computer Applications (975 8887) The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma Gilbert M.

More information

Axiom 2013 Team Description Paper

Axiom 2013 Team Description Paper Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association

More information

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cl] 2 Apr 2017 Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,

More information

Radius STEM Readiness TM

Radius STEM Readiness TM Curriculum Guide Radius STEM Readiness TM While today s teens are surrounded by technology, we face a stark and imminent shortage of graduates pursuing careers in Science, Technology, Engineering, and

More information

A Case-Based Approach To Imitation Learning in Robotic Agents

A Case-Based Approach To Imitation Learning in Robotic Agents A Case-Based Approach To Imitation Learning in Robotic Agents Tesca Fitzgerald, Ashok Goel School of Interactive Computing Georgia Institute of Technology, Atlanta, GA 30332, USA {tesca.fitzgerald,goel}@cc.gatech.edu

More information

Visual CP Representation of Knowledge

Visual CP Representation of Knowledge Visual CP Representation of Knowledge Heather D. Pfeiffer and Roger T. Hartley Department of Computer Science New Mexico State University Las Cruces, NM 88003-8001, USA email: hdp@cs.nmsu.edu and rth@cs.nmsu.edu

More information

Disambiguation of Thai Personal Name from Online News Articles

Disambiguation of Thai Personal Name from Online News Articles Disambiguation of Thai Personal Name from Online News Articles Phaisarn Sutheebanjard Graduate School of Information Technology Siam University Bangkok, Thailand mr.phaisarn@gmail.com Abstract Since online

More information

Matching Similarity for Keyword-Based Clustering

Matching Similarity for Keyword-Based Clustering Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web

More information

Data Modeling and Databases II Entity-Relationship (ER) Model. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases II Entity-Relationship (ER) Model. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases II Entity-Relationship (ER) Model Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Database design Information Requirements Requirements Engineering

More information

Build on students informal understanding of sharing and proportionality to develop initial fraction concepts.

Build on students informal understanding of sharing and proportionality to develop initial fraction concepts. Recommendation 1 Build on students informal understanding of sharing and proportionality to develop initial fraction concepts. Students come to kindergarten with a rudimentary understanding of basic fraction

More information

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Marek Jaszuk, Teresa Mroczek, and Barbara Fryc University of Information Technology and Management, ul. Sucharskiego

More information

Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design

Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design Paper #3 Five Q-to-survey approaches: did they work? Job van Exel

More information

Efficient Online Summarization of Microblogging Streams

Efficient Online Summarization of Microblogging Streams Efficient Online Summarization of Microblogging Streams Andrei Olariu Faculty of Mathematics and Computer Science University of Bucharest andrei@olariu.org Abstract The large amounts of data generated

More information

This scope and sequence assumes 160 days for instruction, divided among 15 units.

This scope and sequence assumes 160 days for instruction, divided among 15 units. In previous grades, students learned strategies for multiplication and division, developed understanding of structure of the place value system, and applied understanding of fractions to addition and subtraction

More information

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS L. Descalço 1, Paula Carvalho 1, J.P. Cruz 1, Paula Oliveira 1, Dina Seabra 2 1 Departamento de Matemática, Universidade de Aveiro (PORTUGAL)

More information

The Importance of Social Network Structure in the Open Source Software Developer Community

The Importance of Social Network Structure in the Open Source Software Developer Community The Importance of Social Network Structure in the Open Source Software Developer Community Matthew Van Antwerp Department of Computer Science and Engineering University of Notre Dame Notre Dame, IN 46556

More information

A Pipelined Approach for Iterative Software Process Model

A Pipelined Approach for Iterative Software Process Model A Pipelined Approach for Iterative Software Process Model Ms.Prasanthi E R, Ms.Aparna Rathi, Ms.Vardhani J P, Mr.Vivek Krishna Electronics and Radar Development Establishment C V Raman Nagar, Bangalore-560093,

More information

Grade 6: Correlated to AGS Basic Math Skills

Grade 6: Correlated to AGS Basic Math Skills Grade 6: Correlated to AGS Basic Math Skills Grade 6: Standard 1 Number Sense Students compare and order positive and negative integers, decimals, fractions, and mixed numbers. They find multiples and

More information

Lecture 10: Reinforcement Learning

Lecture 10: Reinforcement Learning Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina

More information

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses Thomas F.C. Woodhall Masters Candidate in Civil Engineering Queen s University at Kingston,

More information

Modeling user preferences and norms in context-aware systems

Modeling user preferences and norms in context-aware systems Modeling user preferences and norms in context-aware systems Jonas Nilsson, Cecilia Lindmark Jonas Nilsson, Cecilia Lindmark VT 2016 Bachelor's thesis for Computer Science, 15 hp Supervisor: Juan Carlos

More information

Evidence for Reliability, Validity and Learning Effectiveness

Evidence for Reliability, Validity and Learning Effectiveness PEARSON EDUCATION Evidence for Reliability, Validity and Learning Effectiveness Introduction Pearson Knowledge Technologies has conducted a large number and wide variety of reliability and validity studies

More information

GACE Computer Science Assessment Test at a Glance

GACE Computer Science Assessment Test at a Glance GACE Computer Science Assessment Test at a Glance Updated May 2017 See the GACE Computer Science Assessment Study Companion for practice questions and preparation resources. Assessment Name Computer Science

More information

Matrices, Compression, Learning Curves: formulation, and the GROUPNTEACH algorithms

Matrices, Compression, Learning Curves: formulation, and the GROUPNTEACH algorithms Matrices, Compression, Learning Curves: formulation, and the GROUPNTEACH algorithms Bryan Hooi 1, Hyun Ah Song 1, Evangelos Papalexakis 1, Rakesh Agrawal 2, and Christos Faloutsos 1 1 Carnegie Mellon University,

More information

arxiv: v1 [math.at] 10 Jan 2016

arxiv: v1 [math.at] 10 Jan 2016 THE ALGEBRAIC ATIYAH-HIRZEBRUCH SPECTRAL SEQUENCE OF REAL PROJECTIVE SPECTRA arxiv:1601.02185v1 [math.at] 10 Jan 2016 GUOZHEN WANG AND ZHOULI XU Abstract. In this note, we use Curtis s algorithm and the

More information

Introduction to Simulation

Introduction to Simulation Introduction to Simulation Spring 2010 Dr. Louis Luangkesorn University of Pittsburgh January 19, 2010 Dr. Louis Luangkesorn ( University of Pittsburgh ) Introduction to Simulation January 19, 2010 1 /

More information

Applications of data mining algorithms to analysis of medical data

Applications of data mining algorithms to analysis of medical data Master Thesis Software Engineering Thesis no: MSE-2007:20 August 2007 Applications of data mining algorithms to analysis of medical data Dariusz Matyja School of Engineering Blekinge Institute of Technology

More information

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and

More information

BENCHMARK TREND COMPARISON REPORT:

BENCHMARK TREND COMPARISON REPORT: National Survey of Student Engagement (NSSE) BENCHMARK TREND COMPARISON REPORT: CARNEGIE PEER INSTITUTIONS, 2003-2011 PREPARED BY: ANGEL A. SANCHEZ, DIRECTOR KELLI PAYNE, ADMINISTRATIVE ANALYST/ SPECIALIST

More information

Finding Your Friends and Following Them to Where You Are

Finding Your Friends and Following Them to Where You Are Finding Your Friends and Following Them to Where You Are Adam Sadilek Dept. of Computer Science University of Rochester Rochester, NY, USA sadilek@cs.rochester.edu Henry Kautz Dept. of Computer Science

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com

More information

Comment-based Multi-View Clustering of Web 2.0 Items

Comment-based Multi-View Clustering of Web 2.0 Items Comment-based Multi-View Clustering of Web 2.0 Items Xiangnan He 1 Min-Yen Kan 1 Peichu Xie 2 Xiao Chen 3 1 School of Computing, National University of Singapore 2 Department of Mathematics, National University

More information

elearning OVERVIEW GFA Consulting Group GmbH 1

elearning OVERVIEW GFA Consulting Group GmbH 1 elearning OVERVIEW 23.05.2017 GFA Consulting Group GmbH 1 Definition E-Learning E-Learning means teaching and learning utilized by electronic technology and tools. 23.05.2017 Definition E-Learning GFA

More information

Visit us at:

Visit us at: White Paper Integrating Six Sigma and Software Testing Process for Removal of Wastage & Optimizing Resource Utilization 24 October 2013 With resources working for extended hours and in a pressurized environment,

More information

Ohio s Learning Standards-Clear Learning Targets

Ohio s Learning Standards-Clear Learning Targets Ohio s Learning Standards-Clear Learning Targets Math Grade 1 Use addition and subtraction within 20 to solve word problems involving situations of 1.OA.1 adding to, taking from, putting together, taking

More information

Major Milestones, Team Activities, and Individual Deliverables

Major Milestones, Team Activities, and Individual Deliverables Major Milestones, Team Activities, and Individual Deliverables Milestone #1: Team Semester Proposal Your team should write a proposal that describes project objectives, existing relevant technology, engineering

More information

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,

More information

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand Texas Essential Knowledge and Skills (TEKS): (2.1) Number, operation, and quantitative reasoning. The student

More information

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

Learning and Transferring Relational Instance-Based Policies

Learning and Transferring Relational Instance-Based Policies Learning and Transferring Relational Instance-Based Policies Rocío García-Durán, Fernando Fernández y Daniel Borrajo Universidad Carlos III de Madrid Avda de la Universidad 30, 28911-Leganés (Madrid),

More information