First week summary

Last week, I designed the TreeConstruction module implemented the DistanceMatrix and DistanceCalculator classes, almost the same as planed. From the original DistanceMatrix plan, I extracted a Matrix base class so that it can be used for scoring matrices or be extended later.

Usage demo:

from Bio import AlignIO
from Bio.Phylo.TreeConstruction import DistanceMatrix
from Bio.Phylo.TreeConstruction import DistanceCaluculator

# get a multiple alignment
alignment = AlignIO.read(open('msa.phy'), 'phylip')
# construct a distance calculator from the alignment and the given scoring matrix name(DNA: identity, blastn, trans; Protein: blosum40/62/90, pam90/120/250) 
calculator = DistanceCaluculator(alignment, 'identity')
# get the distance matrix
dm = calculator.get_distance()
# print a lower triangular format of the distance matrix
print str(dm)
# get the distance from sequence 'Alpha' to 'Beta'(the id from the SeqRecord of the MSA object)
print dm['Alpha', 'Beta']
# delete a element from the distance matrix
del dm['Alpha']
# insert a element with the distances at the position 1 
dm.insert('Alpha', [1, 0, 2, 4], 1)

A unittest was also written in the Tests directory.

Hope I can get feedbacks to improve my python coding.

Plan for this week

Implement the UPGMA and NJ algorithms. This should be easy as I wrote both of them in Java before.

Problems

One common operation in both algorithms is to delete and insert elements in the DistanceMatrix object. This may cause unexpected error if there are other operations on the original DistanceMatrix object after any of the algorithm. I think one solution is to use the deepcopy to make another copy of the DistanceMatrix object at the beginning of the algorithm. A little slower.