First week summary
Last week, I designed the TreeConstruction module implemented the
DistanceCalculator classes, almost the same as planed. From the original
DistanceMatrix plan, I extracted a
Matrix base class so that it can be used for scoring matrices or be extended later.
from Bio import AlignIO from Bio.Phylo.TreeConstruction import DistanceMatrix from Bio.Phylo.TreeConstruction import DistanceCaluculator # get a multiple alignment alignment = AlignIO.read(open('msa.phy'), 'phylip') # construct a distance calculator from the alignment and the given scoring matrix name(DNA: identity, blastn, trans; Protein: blosum40/62/90, pam90/120/250) calculator = DistanceCaluculator(alignment, 'identity') # get the distance matrix dm = calculator.get_distance() # print a lower triangular format of the distance matrix print str(dm) # get the distance from sequence 'Alpha' to 'Beta'(the id from the SeqRecord of the MSA object) print dm['Alpha', 'Beta'] # delete a element from the distance matrix del dm['Alpha'] # insert a element with the distances at the position 1 dm.insert('Alpha', [1, 0, 2, 4], 1)
A unittest was also written in the Tests directory.
Hope I can get feedbacks to improve my python coding.
Plan for this week
Implement the UPGMA and NJ algorithms. This should be easy as I wrote both of them in Java before.
One common operation in both algorithms is to delete and insert elements in the DistanceMatrix object. This may cause unexpected error if there are other operations on the original DistanceMatrix object after any of the algorithm. I think one solution is to use the
deepcopy to make another copy of the DistanceMatrix object at the beginning of the algorithm. A little slower.