Automatic medical specialties classification of scientific articles about COVID-19 in Brazilian Portuguese
This work proposes to study automatic classification of scientific articles from a corpus about COVID-19 in Brazilian Portuguese accordingly to their medical specialties. The corpus was previously extracted from the Pubmed database using Natural Language Processing (NLP) methods and was manually annotated for medical specialties. The annotation process was based on indicators such as: title of articles, name of journals, keywords and vocabulary of abstracts. Well-rated classifiers can reduce the manual annotation effort and may contribute to update the corpus. Five classifiers were tested in texts from the most frequent specialties. The Support Vector Classifier (SVC) and eXtreme Gradient Boosting (XGBoost) had the best results.