Grouping of instances in equilvalence classes to deal with the curse of dimensionality in gene networks inference
The inference of gene interaction networks from expression profiles is one of the relevant problems in systems biology, being considered an open problem. Several mathematical, statistical and computational techniques have been developed to model, infer and simulate gene regulation mechanisms, whereas the inference problem is the focus of this work. Our proposal is a continuation of the research conducted during the masters, which involved the study of gene networks inference based on feature selection (selection of the best subset of genes for predicting the behavior of a given target in terms of their temporal mRNA expressions), proposing alternatives to increase the statistical estimation power in typical situations where the set of samples with gene expression profiles is very limited and presents high dimensionality (number of genes). More concretely, during the masters we proposed methods to alleviate the curse of dimensionality in Boolean Networks inference, through Boolean lattice partitions induced by a linear combination of the predictor genes values (predictor instances). Each linear combination value determines an equivalence class between the predictor instances. In this work, the problem of instances grouping was reformulated as a partition lattice search problem, besides idealizing search strategies in this lattice based on prior information (eg. gene networks tend to be mostly composed by linear and canalizing functions) to examine a partition subspace potentially relevant without forgetting computational efficiency. Preliminary results indicate that the developed methods, especially the method which searches for canalizing functions, achieves competitive networks considering both topology and gene expression dynamics generated by the inferred networks. The main advantage of these methods is the superior capacity of generalization to predict the next system state based on randomly chosen initial states which are not in the training set. Besides, we developed a method which transfers the supervised learning achieved from randomly generated (synthetic) networks inference aiming to estimate the correct dimension of the predictor genes set for the corresponding target genes. Such a strategy benefits all gene network inference methods considered, including the original method which does not group instances.