PPGCCM PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO FUNDAÇÃO UNIVERSIDADE FEDERAL DO ABC Phone: 11 4996-8337 http://propg.ufabc.edu.br/ppgccm

Banca de QUALIFICAÇÃO: LUCAS KENZO KUROKAWA

Uma banca de QUALIFICAÇÃO de MESTRADO foi cadastrada pelo programa.
DISCENTE : LUCAS KENZO KUROKAWA
DATA : 13/06/2022
HORA: 19:00
LOCAL: Google Meet
TÍTULO:

Weak Supervision for Named Entity Recognition in Brazilian Legal Documents


PÁGINAS: 61
GRANDE ÁREA: Ciências Exatas e da Terra
ÁREA: Ciência da Computação
RESUMO:

The use of modern machine learning models in real applications is mainly limited by the lack of annotated data. These models require a considerable amount of annotated data, which is rarely available. This unavailability occurs because it is a very specific task, where data were never recorded or recorded but cannot be made available due to their content. Thus, it is common to resort to manual data annotation, which for specific domains requires a Subject-Matter Expert (SME). This method consists of the SME analyzing each data individually and annotating it, a slow and repetitive process. In this way, weak supervision has gained relevance in optimizing this process. The SME creates a set of programmable labeling functions that vote on labels for each data. These votes are aggregated by a probabilistic model called Label Model, generating the annotated database. This database is used to train a End Model that executes the task in real applications. In the case of the legal domain, with the growth of the digitization of processes, we have a lot of unannotated data available to be annotated. With the annotation of this data, it is possible to create several AI tools for real applications, aiming to modernize processes that still consist of repetitive manual work. One tool created from manual annotation and now available is the legal named entity recognizer called LeNER-Br. It identifies entities of jurisprudence (references to other legal cases) and legislation (references to laws), in addition to entities of person, time, place, and organization. Finally, this work aims to verify that simulating the annotation of the LeNER-Br dataset, using weak supervision, allows us to achieve competitive performance in terms of F1-score compared to manual annotation.


MEMBROS DA BANCA:
Presidente - Interno ao Programa - 334.489.048-48 - THIAGO FERREIRA COVOES - NÃO INFORMADO
Membro Titular - Examinador(a) Interno ao Programa - 3008017 - DENIS GUSTAVO FANTINATO
Membro Titular - Examinador(a) Externo à Instituição - NÁDIA FÉLIX FELIPE DA SILVA - UFG
Membro Suplente - Examinador(a) Externo ao Programa - 2364326 - ALEXANDRE DONIZETI ALVES
Membro Suplente - Examinador(a) Externo à Instituição - ANDERSON DA SILVA SOARES - UFG
Notícia cadastrada em: 10/05/2022 21:06
SIGAA | UFABC - Núcleo de Tecnologia da Informação - ||||| | Copyright © 2006-2024 - UFRN - sigaa-2.ufabc.int.br.sigaa-2-prod