Active learning for constrained clustering
Interest in semi-supervised learning has grown due to the high cost of labelling data for
analysis. At the same time, Active Learning (AL) aims to minimize the cost of creating
labelled datasets, trying to identify which data are more relevant to learning, considering
what is already available. This project analyses the combination of AL with semi-supervised
learning, specially constrained clustering, which is a type of learning that does not rely on
class labels for a group of objects. Instead, there is only information if some pairs of objects
must be in the same group or in different groups. In some applications, identifying such
constraints involves reduced cost since it is less information than a class label. Initially, we
will evaluate the combination of different AL approaches with Gaussian Mixture Models.
Additionally, we will develop a case study relative to the plankton classification problem.
Despite its high labelling cost, this problem has not been much explored in the context
of AL. The objective of this case study is to evaluate the proposed methods in a real
application.