Algorithm for scraping public data related to physical activity information for creating a database and its analysis
In this work, we report the generation of a public dataset, containing data from multiple sports practiced by long distance runners. Through web scraping techniques, we extracted information related to 37 sports in the months of 2019 and 2020 from the Strava platform. 14,644,391 activities were extracted from 37,595 athletes from all over the world. We focused on the analysis of the running data, in the individual context and in groups. Additionaly, analyses of the weekly training volumes compared to the average times of the marathon conclusion were performed. We assessed how runner training was affected by the COVID-19 pandemic, restricting the dataset to 10,703,690 running activities by 36,412 athletes. In 2020, compared to 2019, there was a 7 % decrease in the volume of running training and a 7 % decrease in the number of runners. We also observed large variations in these variables throughout 2020, reaching 35 % less running volume in September 2020.