Automatic Detection of Complex Expressions in Health Domain Texts in Portuguese language
This research deals with the automatic detection of complex expressions in texts in the field of Health in Portuguese. The detection of these lexical elements comprises a first stage of the automatic lexical simplification process. The use of lexical elements of this type in content tends to enhance the levels of textual complexity, causing access to knowledge for a portion of non-specialist readers in the domain. To this end, in the course of the research, a comparison of the sample of corpora with different textual genres was made, one related to the journalistic genre and the next related to the scientific genre. The analysis of different textual genres, made it possible to observe specific and common particularities to each textual set, as well as to quantitatively analyze the presence of certain lexical traits in these contexts. Furthermore, he collaborated to build a biomedical corpus, called Covid-19 UFABC in this research. The analysis of these different textual genres with the support of the NLTK tool was submitted to text comparators about the Coronavirus. The sample preparation of the corpora, also, extract resources that a posteriori will feed artificial intelligence algorithms in the automatic detection process of complex expressions.