Computational Approaches for Analyzing Structured Data in Biological Systematics and Biogeography
Biological sciences use connected acyclic graphs -or trees- to present evolutionary relationships and biogeographical history. The base of trees might be genetic, molecular, morphological, geological, behavioral, or geographical pieces of evidence. In large-scope studies, especially when dealing with many different biological groups or complex biogeographical history, manual data processing becomes infeasible and costly (in terms of time and money). Also, the results might have many errors. Evolutionary biologists already have their methods to process their data. However, applying these methods manually on large data sets is proven not to be practical, which is a significant burden in front of having more understanding of the data at hand. Despite the availability of some approaches for structured data processing, they are either outdated or hard to use (such as command-line tools). This thesis focuses on understanding the very basis of the philosophy of computer science in the context of biological and evolutionary problems, then, based on this understanding, to facilitate evolutionary biologists' work by automating the workflow they follow, rather than changing it. Here, we present four computational approaches, three of which are concerned with building a combined MRP-matrix using input trees in a parenthetical format. These approaches are named as follows: “Generating combined MRP-matrices”, “Generating Topographic-Units MRP-matrix”, and “Generating Combined Areagram MRP-matrix”. While the fourth uses information visualization to facilitate phylogenetic trees comparison named “interactive Phylogenetic trees Comparison (iPhyloC)”. We encapsulated all of the computational approaches in this thesis in easy-to-use web-based frameworks that are available for free online. We developed the four approaches in this thesis in close collaboration with domain experts. The testing results of all the approaches show they are reliable, easy to use and deliver correct results. They indeed help evolutionary biologists to focus on driving results from the data at hand rather than spending time processing the data.