Bioinformatics I: Bioinformatics Week 6

Coursera Week 5: Phylogenetics
To study Phylogenetics you create visual comparisons between DNA sequences or proteins in the form of a tree.

Mutation leads to speciation

There are rooted trees and unrooted trees. Rooted trees are the ones above, they expand from one point in one direction whereas unrooted trees can go off in many directions.

Homeoplasy is when 2 divergent species share a similar characteristic. There are different types of homeoplasy.

In order to conduct experiments involving phylogenetic you must have good sampling. Some of your samples need to be homologous, independent and variants of the original specimen which the tree is being based off of. Lastly, you need sequence alignments, and statistical support for their arrangement on the tree.

There are two tree building methods:
Distance methods

UPGMA
Neighbour Joining: Using blosum or PAM matrix to compare, then create a system to rate and scale the distance between the species based on their matrix score.
Good things: They are computationally fast, and there is a singular best tree found in the end.
Bad things: Sometimes there isn't a single best tree

Character based (discrete) methods

Maximum parisomony
Maximum likelihood: Evaluates the likelihood of every possible mutation that could occur within a phylogenetic tree for a species to arrive at where it currently is. Then it uses statistical analysis to figure out which has the highest likelihood and assumes that's the correct tree. There are 4 base pairs so in an unbiased model there is a .25 likelihood for one of the 4 base pairs to change to another base pair. Then you multiply it to the 10th with the power of how many nucleotides there are within the sequence you are analyzing and that is the likelihood of a certain mutation. Say you have a sequence 20 base pairs long, and you can say that a certain G substitution you are studying has a .25*10^20 chance of occurring. Then after that it calculates the chances of the this change occurring over time in this fashion (process portion). Advantages: Produces clear results, you can statistically analyze the results you receive, it also gives you the other likely options that it produces. Disadvantages: It is computationally intensive and cannot be applied to large datasets.

There is something called bootstrapping where you take all possible versions of your phylogenetic tree and then you calculate how many times certain species are grouped together.

Here we see that A and B have been grouped together 100% (this number is arbitrary) and C and D have been grouped together 75%. So it is very likely that A and B and then C and D diverge from a more recent ancestor. 70-90% the relationship is very probable. Anything less means it is a less probable relationship.

Bioinformatics I

Friday, September 2, 2016

Bioinformatics Week 6

No comments:

Post a Comment