Using LinkedIn data and graph modeling to understand the profile of student groups
This dissertation leverages LinkedIn data to explore the career trajectories of university students and alumni. Considering ethical and legal concerns, this study employs cost-effective web scraping and data parsing techniques to acquire LinkedIn data. Since job titles are not standardized between different people, a manual labeling step and a Machine Learning classification model are used to standardize more than 35,000 jobs.
Then, a graph model is developed to capture temporal trends that cannot be captured with current occupational mobility network models. Results confirm the data acquisition of 9,245 profiles, with a coverage ratio of 74.4%, demonstrating the viability of the proposed ethical scraping method. Applying machine learning for job title standardization achieved performance similar to private, paid solutions, with 71% precision, accuracy, and recall.
Finally, constructing a graph model enabled the identification of emerging career trends and transition patterns among students and alumni of UFABC. The graph model and an online visualization tool are available on the author's GitHub.