Portal de Programas de Pós-Graduação (UFABC)

SIGAA - Sistema Integrado de Gestão de Atividades Acadêmicas

PPGCCM PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO FUNDAÇÃO UNIVERSIDADE FEDERAL DO ABC Téléphone/Extension: Indisponible E-mail: poscomp@ufabc.edu.br http://propg.ufabc.edu.br/ppgccm

Banca de QUALIFICAÇÃO: CRISTIANO OLIVEIRA GONÇALVES

Uma banca de QUALIFICAÇÃO de MESTRADO foi cadastrada pelo programa.
DISCENTE : CRISTIANO OLIVEIRA GONÇALVES
DATA : 10/02/2021
HORA: 14:00
LOCAL: Online
TÍTULO:

Judicial sentence representation for clustering

PÁGINAS: 76
GRANDE ÁREA: Ciências Exatas e da Terra
ÁREA: Ciência da Computação
RESUMO:

The digitization of documents in the Brazilian judicial sector facilitates access to information of public interest. However, to be able to gather useful metrics from this growing informational repository, documents must be organized in a way that facilitates the retrieval of relevant information, and machine learning techniques can reduce human effort in organizing a large corpus. In this work, we analyze different machine learning techniques with regards to how well they associate legal terms according to human experts. To this end, a database extracted from the e-Saj website, composed of 40,009 documents, was created. Then, the techniques Word2Vec, FastText, and GloVe were trained using these documents, and the models they produced were compared with counterparts trained in the general domain of the Portuguese language. The Legal Thesaurus of the Portuguese Language was used as a reference for specialist knowledge. Preliminary experiments show that the FastText technique produced models whose association between terms most closely resembles that observed in the Thesaurus, and models trained in the general domain of the Portuguese language performed better in most of the term categories, although this difference is small in some categories. These preliminary results suggest that increasing the number of documents of the legal corpus is a promising solution to achieve model performance that is better than what was observed in models trained in the general context of the Portuguese language, even if the legal corpus is smaller than that used in the general domain.

MEMBROS DA BANCA:
Presidente - Interno ao Programa - 2376122 - THIAGO FERREIRA COVOES
Membro Titular - Examinador(a) Interno ao Programa - 1673092 - RONALDO CRISTIANO PRATI
Membro Titular - Examinador(a) Externo à Instituição - NÁDIA FÉLIX FELIPE DA SILVA - UFG
Membro Suplente - Examinador(a) Interno ao Programa - 1934625 - JESUS PASCUAL MENA CHALCO

Notícia cadastrada em: 06/01/2021 11:20