Computational methods for identifying and analyzing academic genealogy graphs
The impact that academics have on science has often been analyzed from the perspective of the production of scientific publications. Few studies address the quantification/qualification of human resources training as an integral part of assessing the performance of academics, that is, considering the production of new scientists through the academic mentoring process. In this sense, areas of knowledge have made efforts to create databases of academics and their mentoring relationships, using academic genealogy to document and organize this information through graphs. However, the existing academic genealogy databases present problems such as redundancy, absence, and imprecision of information. Still, few scientific endeavors have been carried out to study globally and evolutionarily the academic influence between researchers and their fields of knowledge.
In this work, we developed and applied computational methods for the identification and analysis of academic genealogy graphs obtained from a source of information that contains information with regards to advisor-advisee relationships
We developed strategies to identify the academic genealogy graphs using vertex and edge disambiguation techniques. And the graphs obtained were analyzed according to different views, such as the growth of the hierarchical structures, the academic impact of ancestors in the descendants, and the career according to the roles played by academics.
In the process of identifying academic genealogy graphs, 6.3 million résumés from the Lattes Platform were prospected, which results in a graph with 1.2 million vertices and 1.4 million edges representing the masters and doctors who work or have worked in the science Brazilian. The disambiguation techniques were able to reduce the amount of noise in this graph.
The analyzes allowed to show patterns in the context of the graph as a whole, as well as in the subgraphs that represent the major areas of mentoring and academic lineages in the areas of Computer Science and Information Science. We identified the actors that play or played a fundamental role in the Brazilian context and knowledge were generated for the development of academic education indicators.
The main scientific contributions of this doctoral work are: (i) the documentation and analysis of the academic influence exercised by ancestors in descendants and between fields of knowledge, (ii) a method to identify academic genealogical graphs that can be replicated to other databases and (iii) a population growth model capable of tracking the academic career of scientists throughout the training and mentoring processes. Finally, the main technological contribution of this doctorate is the conceiving and availability of the Brazilian academic genealogy graph on the platform called Acácia (http://plataforma-acacia.org).