Dynamics of Galois Lattices The case of epistemic communities

Save this PDF as:

Size: px
Start display at page:

Download "Dynamics of Galois Lattices The case of epistemic communities"


1 Dynamics of Galois Lattices The case of epistemic communities Camille Roth & Paul Bourgine CREA Centre de Recherche in Applied Epistemology CNRS / Ecole Polytechnique - Paris, France. Sunbelt XXV, Redondo Beach, CA, USA - Feb 16-20th 2005

2 Objective Describe communities of knowledge, in particular scientific communities, and their taxonomy: e.g. trends/subfields within a paradigm. Epistemic community = group of agents who share a common set of topics, concerns, problems; who share a common goal of knowledge creation. (Haas (1992), Cowan et al. (2000), Dupouet et al. (2001)). Definition used here: «an epistemic community is the largest set of agents that share a given concept set»

3 Formal framework Definitions Consider the bipartite graph on R Intent of an agent set S: all concepts used by every agent in S Epistemic group: pair (S, C), where C is the intent of S. Epistemic community (based on a concept set C): the maximal epistemic group based on C. Dual notions examples: ({A,B,C,E}, {McB}) ({B,C}, {McB,EmG})

4 Formal framework Good news: the extent of the intent of an agent set yields its epistemic community. e.g.: from {C,D}, whose intent is {EmG}, whose extent is {B,C,D}, we get: ({B,C,D}, {EmG}) epistemic community Pb: there may be many such communities

5 Formal framework

6 Categorization Hypotheses on scientific communities: they are structured (i) into fields, with common concerns, and (ii) hierarchically, through generalization/specialization relations. We need a categorization method that allows overlap. The is the ordered set of all epistemic communities (closed couples), provided with the natural partial order on sets.

7 Categorization «basic-level» more general more specific

8 Closed couple relevance & empirical results Try to find a relevant level of generality/precision for the closed sets so that the lattice is manageable. Given the assumptions, first criterion = fields = agent set size. Very poor linguistic assumptions: small stop-word list, basic lemmatization, no contextual processing, no homonymy, synonymy, syllepsis, nominal groups Computation of the lattice for a relation from MedLine data on zebrafish, (6 years).

9 Empirical results on «zebrafish» community: density of closed sets against extension sizes (author sets) as a proportion of agents of the whole community (200 authors) (1800 concepts)

10 Empirical results Large ECs: remarkable stylized fact of the data. Partial real lattice successfully checked by domain experts:

11 Selection -> Improve selection criteria, since agent set size is: (1) Over-selective: Large yet less significant sets. Additional criterion: Ratio between set and superset sizes. (2) Under-selective: Small yet significant sets. Additional criterion: Distance from the top.

12 Selection and dynamics Three 6-year periods: 90-95, and Selection on 70 words. Booming community: from 1000 authors at the end of 1995, to 9700 by 2004 (and 3700 in 1999). Selection criteria: (1) catch large communities: Size/distance *#attributes (2) catch isolated communities: Size/distance *number of sons

13 Dynamic Partial Lattice

14 thanks to be continued on