Strategies for combining multiple text representations to improve document classification
Machine learning and natural language processing algorithms are capable of learning patterns in texts and extracting information. However, they depend on the way in which texts are computationally represented in order to be processed. A circumstance in which it may be a problem if the method used to represent the documents is not capable of condensing all the characteristics contained in the text. On the other hand, learning with multiple views, for text classification problems, aims to explore the information contained in the different ways of representing a document, assuming that each of them can extract one or more characteristics of a document. However, efficiently exploring complementary information between different views presents a challenge to the area of multiview learning both in terms of computational efficiency and information acquisition capacity. Using combination optimization algorithms, it is possible to identify that there are combinations of different views capable of increasing the accuracy of text classification. Thus, the following question arises: What makes different views complementary? How can two views contribute to an increase in accuracy? Therefore, the present work proposes strategies for identifying combinations of multiple views, with the aim of improving the accuracy of text classification.