PPGCCM PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO FUNDAÇÃO UNIVERSIDADE FEDERAL DO ABC Phone: 11 4996-8337 http://propg.ufabc.edu.br/ppgccm

Banca de DEFESA: CLARISSA SIMOYAMA DAVID

Uma banca de DEFESA de MESTRADO foi cadastrada pelo programa.
DISCENTE : CLARISSA SIMOYAMA DAVID
DATA : 04/06/2020
HORA: 14:00
LOCAL: por participação remota + https://conferenciaweb.rnp.br/webconf/debora-29
TÍTULO:
Detection of implicit structures in textual data using hard clustering

PÁGINAS: 50
GRANDE ÁREA: Ciências Exatas e da Terra
ÁREA: Ciência da Computação
RESUMO:

With the great increase in the availability of data from several areas, there is a growing interest in search patterns in datasets. Those patterns can be used to perform tasks such as clustering and classification. Machine Learning (ML) research area presents several algorithms aiming to accomplish those tasks. However, some sources of data can bring unnecessary variables (or features) that can compromise the quality of the extracted patterns and can, for example, impair classification tasks, interfering with the accuracy value obtained by the classifier. In this work, a representation of textual data is proposed, incorporating rates of occurrences of words associated with their syntactic functions using Natural Language Processing (NLP) tasks, such as POS-Tagging. Based on this data structure obtained, it is proposed to assign importance to the clusters of these features to represent the texts. First using Unsupervised Learning, the words are clustered hard with the K-means algorithm, reducing the complexity of the dataset without losing important information, and after defining the ideal number of clusters, weights are assigned to the word clusters. With the Supervised Learning approach, classification is applied to the texts, initially with the features being the words previously tagged, with a step of optimizing the weights of the features with the aid of a population-based optimization algorithm. The results show that with this data structure and with the approach of attributing weights to the features, there was a significant improvement in relation to the accuracy value in the classification task.


MEMBROS DA BANCA:
Presidente - Interno ao Programa - 1918407 - DEBORA MARIA ROSSI DE MEDEIROS
Membro Titular - Examinador(a) Interno ao Programa - 1722875 - DAVID CORREA MARTINS JUNIOR
Membro Titular - Examinador(a) Externo à Instituição - MARCIO BASGALUPP - UNIFESP
Membro Titular - Examinador(a) Externo à Instituição - ANDRÉ CARLOS PONCE DE LEON FERREIRA DE CARVALHO - USP
Membro Suplente - Examinador(a) Interno ao Programa - 1934625 - JESUS PASCUAL MENA CHALCO
Notícia cadastrada em: 21/05/2020 09:45
SIGAA | UFABC - Núcleo de Tecnologia da Informação - ||||| | Copyright © 2006-2024 - UFRN - sigaa-2.ufabc.int.br.sigaa-2-prod