Clustered Spearman Markowitz: A Feature Selection Method for Credit Scoring Model
Innovation in feature selection methods is essential for improving prediction systems adopted by financial institutions, especially in financial services involving credit risk analysis and the investment decisions necessary for loan approval. Careful feature selection not only enhances the performance of predictive models but also reduces potential financial costs associated with unnecessary data acquisition. In this context, this work presents the Clustered Spearman Markowitz (CSM), a novel method that combines Spearman rank correlation to group similar features, and Markowitz's asset allocation theory to determine the weights of these groups (clusters), interpreting the accuracy of the models generated by each cluster as return and their variance as risk. The Risk Return relationship associated with each group of features is then used to select only the features truly necessary for fitting the credit scoring model. The central goal of the CSM is to eliminate irrelevant or redundant features, contributing to more efficient credit scoring prediction models. The validity and the efficiency of the method were proven through extensive simulation tests and application to a widely used real dataset, as the Lending Club Loan Data. The performance of the CSM was compared to standard techniques and recent methodologies, demonstrating superiority in accuracy, F1 Score, and a significant reduction in the number of selected features, establishing itself as an important contribution to innovation in credit risk analysis and predictive modeling for financial institutions.