Stance Detection and Automated Labeling of Twitter Users for Analysis of Controversial Political Events
Social media has become a relevant platform for the expression and debate of political ideas in today's society. These platforms allow users to share their opinions, comments, and critiques on political subjects, resulting in a vast amount of generated data. Analyzing this data using computational and statistical methods can help measure public opinion on these subjects, providing valuable insights for government institutions, organizations, and researchers. Stance detection is an automated process that identifies the alignment of individuals or groups towards specific topics using text analysis and machine learning techniques, making it an essential tool for understanding preferences and opinions. However, existing methods in the literature often rely on manual labeling and fail to label the majority of users participating in the discussion. Additionally, many existing models in the literature address each topic in isolation, disregarding the potential interdependence between them, which can be significant in various applications and over time.
Considering these limitations, the main objective of this thesis was to develop and evaluate an automated computational method for stance detection and labeling of Twitter users regarding controversial political topics discussed in Brazil, with minimal human intervention, regardless of the user's level of participation in the discussion, over a specific period, and considering the potential interdependence between topics.
The developed method integrated unsupervised and minimally supervised computational approaches, considering social factors such as homophily and network structure. It consisted of three main steps: (i) adapting an unsupervised technique for stance detection to clusters of Brazilian Portuguese-speaking Twitter users regarding a controversial and polarized topic; (ii) individual automatic labeling of hundreds of thousands of users using a combination of label assignment and valence score calculation; (iii) measuring engagement levels and balance to characterize the behavior of labeled users in the events.
The method was successfully applied and evaluated on three politically controversial topics: the COVID-19 Parliamentary Inquiry, distrust in the security of electronic voting machines, and the 2022 Brazilian presidential elections. Our results showed that the proposed method is highly effective, assigning labels to over 90% of the users in the evaluated datasets. Furthermore, by analyzing the temporal dimension of the collected data and the users' stances over time, it was also possible to characterize the behavior of individuals belonging to each opposing group.