Prediction in Science of Science: Explanations of Machines for Predictions of Future Impact of Young Scientists
Bibliometric indicators have been broadly used by governments, government agencies, and other actors to measure the performance of researchers, aiming to guide tenure decisions and as a criterion for career progression, research funding, selection of editorial board members, among other applications. The rationale behind the applicability of performance-based indicators as decision support tools in these contexts is the predictive capability that they supposedly carry.
For this range of applications, the potential for the future impact of an appraisee is the principal concern. Alternative indicators (e.g., future index-h) seem to have a clear advantage over traditional indicators. However, diverse preferences found in these models for predicting future indicators and the need for explanations for their decisions have negatively impacted their use in real applications, mainly in contexts where reasoned assessments are a need. This thesis focuses on how to solve this barrier. In an attempt to increase the reliability of models, we propose novel interpretable models and compare their accuracy and explanations for their decisions against other machine learning and analytic models. We found that these models are reliable in estimating a researcher’s future impact. However, on the one hand, these tests revealed that machine learning algorithms unintentionally discriminated against people. On the other hand, they also showed that explanations for model decisions alleviated this obstacle. As expected, our experiments show that the future performance of junior researchers can, to a large extent, be predicted, even with bounded data. In general, the reasons given by models for their decisions are reasonable. However, the final decision must always be of human beings because there is always a risk embedded into a prediction.
Additionally, we propose the Q for journals, a novel measure complementary to the traditional journal impact measures. Its main advantage over others it is a non-cumulative measure producing an immutable ranking.