Homologous
sequences share a common ancestor. Since they diverged from this ancestor,
both sequences have undergone changes. The number of these changes and,
therefore, their degree of similarity are correlated with the number of
generations that have passed since the two sequences diverged. If sequences
are very close they are likely to be very similar. If sequences are very
similar they might be very closely related. If sequences have diverged very
far in the past, they might be quite different. In other words, sequences
that are highly different might not be homologous at all. Or they might
be homologous, except one might not be able to determine that by examining
sequence similarity. In the following example determine the similarity between
genic sequences for proteins that have the same function but are derived
from different organisms. The sequences are human alpha globin 1, mouse
alpha globin 1, and lupine leghemoglobin, a plant derived oxygen-binding
protein.
- Determine the degree of identity among these two sequences by calculating
the percentage of identical nucleotides:
Hs hba >gcctggggtaaggtcggcgcgcacgctggcgagtatggtgcggaggccctggagagg<
|||||||| ||| | || | || | || || ||||| || || |||||||| |||
Mm hba >gcctgggggaagattggtggccatggtgctgaatatggagctgaagccctggaaagg<
|
- Do you think these two genes share a common ancestor?
- Now calculate the identity among these two genic sequences:
Hs hba >gcgagtatggtgcggaggccctggagaggtgaggctccctcccctgctcc<
|| | | ||| | | | | | || | ||
L lhb >aagaatttaatgcaaatattcctaaaaacacccaccgtttcttcaccttg<
|
- Do you think these two genes share a common ancestor?
- How does it change your thinking looking at this alignment of the
two proteins for the two genes?
- The occurence of hemoglobin is not limited to red blood cells. Legume plants such as clover, pea, beans, and
many others are able to synthesize a form of hemoglobin when they undergo a symbiosis with nitrogen-fixing bacteria,
Rhizobia. Rhizobia are able to capture atmospheric nitrogen, N2, by reducing it to ammonia, NH3. During the establishment of the symbiosis, the plant host
develops new organs, nodule-like structures on its roots, within which it isolates and houses the bacteria. The plant
receives nitrogenous compounds from the bacterial partner, and can grow independently from fertilizer. The
bacterial partner receives carbon compounds from the plant host. The bacterial enzyme that reduces atmospheric nitrogen
is called nitrogenase; it is extremely oxygen-sensitive. On the other hand, nitrogen-fixation requires
a large amount of energy, ATP, which depends on the availability of oxygen. In order to accomodate this paradox,
legumes synthsize in their nodules a form of hemoglobin leading to a
pinkish to dark red
hue within the nodules. This form of hemoglobin is called leghemoglobin since it is synthesized in legumes.
Leghemoglobin binds the free oxygen around the bacteria in the nodules and protects their nitrogenase from
being destroyed. On the other hand, it presents the oqygen to the bacterial symbionts, which use it to satisfy their ATP needs.
- Comparing nucleotide sequences does not always give you a good idea about the relatedness of two different
functional structures such as hemoglobin. Comparing the protein structures gave a much more accurate clue
about the similarity between the proteins. Therefore, in order to understand the relatedness of proteins you
have to not only look at the genes but also at the amino acid sequences to determine their similarity.
What is the percentage of identical amino acids in this alignment between mouse and human hemoglobin?
Human alpha globin 1 >GKVGAHAGEYGAEALER<
|| | | |||||||||
Mouse hemoglobin >GKIGGHGAEYGAEALER<
|
- Amino acids are different from nucleotides in that similarity and
identity are differentiated due to the fact that amino acids can be
grouped according to their physicochemical properties such as size,
charge, hydrophobicity etc. (see image,
web
page). By just looking at the image below, it is obvious that leucine
and valine are more similar than histidine.

Thus, amino acid sequence alignments are analyzed by a) determining
the percentage of identical amino acids as % identity. Then b) by determining
how many amino acids are identical plus how many represent substitutions
against similar ones and expressing the result as % similarity. Groups
of similar amino acids are as follows (as provided by ClustalW site
at European Bioinformatics Institute EBI):
Small + hydrophobic + aromatic: A,V,F,P,M,I,L,W;
Acidic: D,E;
Basic: R,H,K;
Hydroxyl + Amine + Basic: S,T,Y,H,C,N,G,Q.
- How similar are the two sequences if similar amino acids are labeled
with a '+'?
Human alpha globin 1 >GKVGAHAGEYGAEALER<
||+| | |||||||||
Mouse hemoglobin >GKIGGHGAEYGAEALER<
|
- Now determine the degree of identity among the human and the legume
sequence:
Hs Hba >VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPH<
| || | | | | |
L Lghb >ALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSF<
|
- How similar are the two sequences if similar amino acids are labeled
with a '+'?
Hs Hba >VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFP<
+|+ || | +|+ + + ++ + | | |
L Lghb >ALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSF<
|
- Since human proteins are composed of 20 different amino acids one would expect
a random similarity of about 5% between two entirely different amino acid sequences.
The amino acid sequences of human and legume hemoglobins are significantly more
similar than 5% and it can be safely assumed that the two sequences are derived
from a common ancestor; these two sequences are homolog. The amount of similarity
between two sequences can be used to estimate the point in time when they split
from each other: the more different two related sequences are the longer ago
they split.
|