Feature Selection in Graph Attributed Data
Graphs serve as fundamental data structures for representing relationships and interactions
among entities. Utilizing graphs to represent data helps in uncovering complex relationships
and patterns that could be missed in models concentrating on individual data points.
Attributed graphs enhance this capability by associating data features with vertices or
edges of the graph. However, when addressing complex real-world problems, datasets often
involve numerous features. Feature selection emerges as a critical technique that aims to
identify a pertinent subset of features for specific tasks such as classification, prediction,
or anomaly detection. Nevertheless, the computational demands of feature selection are
heightened by the size and complexity of these datasets. Furthermore, the domain of
attributed graphs faces a deficiency in adequate feature selection methods, leading to
suboptimal outcomes in various data analysis tasks. This study attempts to address this
challenge by framing the selection of features in attributed graph data as a graph similarity
problem. We evaluated the applicability of our graph-based feature selection approach
through three case studies utilizing a graph constructed from Brazilian census data. The
first case study focused on identifying key census features associated with CO2 emissions in
Brazil. The second case study aimed to uncover socio-economic determinants, derived from
census data, of Brazilian homicide rates. Finally, it explores the key features influencing
voting patterns in the Brazilian presidential runoff election