I'm very excited that I'm accepted for this year's Google Summer of Code(GSOC). In recent days, I have been busy preparing my master's thesis and defense. This news is like good stress reliever for me. The project I'm going to work on is "Phylogenetics in Biopython: Filling in the gaps", which is to implement some phylogenetics algorithms for Biopython. I believe it will be an exciting coding experience.

Get to Know GSOC

The first time I got to know the GSOC was from Biojava homepage when I was trying to use Biojava for my own bioinformatics work. As I thought most of the applicants and biojava contributors might be from the computer background, I never had the courage to apply this. Last September, I got the chance to know Professor Allen and Karen when they were visiting our lab. And Karen told us more details about the GSOC and also the NESCent, and that they had been the mentoring organization for several years. I must say this finally inspired me to apply the GSOC this year.


The application is certainly through the Phyloinformatics Summer of Code from NESCent. I originally wanted to apply the project of "Discovering links to ToLWeb content from a tree in the Open Tree of Life's software system". This project is based on several existing Java projects and also need some knowledge of HTML, XML, Javascript and Python. As my first programming language is Java and I know other related languages and techniques, this project is good for me. After the Biopython projects being added in, I found the current project was more suitable for me. Because most of the algorithms in this project are implemented in BlastGraph, a software I wrote in Java. I'm very familiar with those algorithms. Also, the former project has another applicant, while this one did not have any. As the project can only have one student and every student can only work on one project, maybe it's better to avoid the competition so that everyone can have a higher chance to be selected. Another major reason to choose this project is that I want to improve my python programming skill, which I use far less than Java before.

Project Description

As the name implies, this project is to implement some phylogenetic algorithms that are currently absent in the Biopython.Phylo package. In this package, some basic phylogenetics functions, such as tree operations, parsers for Newick, Nexus and PhyloXML, and wrappers for Phyml, Raxml and PAML, are already implemented. While there are some important components that remain to be filled in to better support phylogenetic workflows. These include simple tree construction algorithms, consensus tree searching, tree comparison and visualization. In this project, I will focus on the first two functions: tree construction and consensus tree searching. The tree construction part includes three algorithms: UPGMA, Neighbor Joining, and Maximum Parsimony. And the consensus tree part includes another three: Strict, Majority-rule and Adams Consensus. So after this project, there will be two separate modules providing those algorithms in Biopython.Phylo package.

Works for the Next Two Weeks

The coding time will start on June 17. So during the next two weeks, I will read related source code in Biopython and trying to design two draft modules for both two parts.