|
The genomes of two humans are to about 99.9% identical - yet,
the 'Human Genome Project', implying existence of a single
human genome, has always been somewhat a
misnomer. Of course, every person - with the exception of
identical twins - has a unique genome. Even though two genomes
are roughly 99.9% identical, the remaining difference of 0.1%
leaves roughly 3,200,000 differences among the 3.2 billion
base pairs comprising each individual's (haploid) genome.
It is precisely these differences, or polymorphisms, that
account for the heritable variation among individuals, including
susceptibility to diseases and responsiveness to
cures.
Each somatic cell within a human being has two sets of autosomal
chromosomes, one maternal, the other paternal. Thus all of
us have two different forms of each chromosome (be aware
of the different situation for sex chromosomes - X and
Y - and for the genetic information stored on the DNA in the
"powerplants" within our cells, the mitochondria).
The two sets of each genetic location are called "alleles".
The paternal and the maternal allele of each genetic locus
are usually not identical but bear differences due to the introduction of mutations
in these loci during the development of the different lineages
that have given raise to father and mother, respectively.
In addition, the alleles of different genetic loci
within an individual's genome have been recombined
during meiosis, so that offspring do not express pure traits
of their parents and grandparents but recombined mixes
thereof.
Differences between alleles which are passed on to off-spring
in a Mendelian fashion, are called polymorphisms. While polymorphisms
can consist of a variety of different types (e.g. insertions,
deletions, inversions, duplications) academic and industrial research are currently
focusing on the most prevalent form of variation in the human
genome: differences in single nucleotides or Single Nucleotide
Polymorphisms (SNPs). Especially the pharmaceutical industry
expects that the identification of meaningful SNPs will lead
to breakthrough developments of new diagnostic tools,
optimizations of drug discovery processes, and to the development of "individualized"
drugs.
In the era of molecular biology genetic analysis is important
to find out where on the DNA to look to find information of
relevance to particular phenomena. Some genes have very complex
phenotypic consequences, and we may never have the ability
to look at a DNA sequence and to infer, directly, that it
regulates some aspect of facial features or mathematical reasoning
ability. Genetic analysis, however, offers a totally independent approach
to determining the location of genes responsible for inherited
traits.
In most organisms, genetics is carried out by breeding specific
pairs of parents and examining the characteristics of their
offspring. Clearly, this approach is not practical in the
human. Instead, what must be done is to perform retrospective
analyses of inheritance in families.
Instead, statistical analysis
of the pattern of inheritance is used in place of direct genetic
manipulation to test hypotheses about the genetic mechanism
underlying particular traits in humans.
Linkage is the tendency for two observable genetic traits,
called markers, to be coinherited if they lie near each other
on the same chromosome. To be distinguishable genetically,
markers must occur in more than one form -alleles- (e.g. eye
color) in different members of the population. Several factors
cause large problems for the ability to identify traits that
are linked to certain phenotypes (e.g. the susceptibility
to a certain disease):
- The size of human families is generally too small to allow
identification of linkage between genes that are closer together
than 10 MB (if two markers are 1MB apart there is roughly a chance
of 1% for recombination to occur between them and, statistically,
it would take families with 100 offspring to detect a single
recombination event that would place two genes 1 Mb apart)
- The low resolution of linkage analysis often requires the
research to be performed on several families. This type of research is most
promising if populations within families can be examined which
have been somewhat isolated and, therefore, established some
kind of relationship even between different families
(e.g. families on islands like
Iceland, Sicily, Corsica; the Azkerbaijkan Jews, etc.). If genetic
analysis is performed on unrelated families it is very difficult
to detect linkage.
- If both chromosomes of a parent contain the same marker
A, there is no way to tell from this marker alone which chromosome
the offspring received. In order to be able to distinguish
successfully among the four homologous chromosomes originally
carried by the two parents, a large number of closely associated,
polymorphic markers needs to be available.
- The possibility that individuals may be excluded/included
in linkage analysis based on misdiagnosis can obscure respective
findings by either leading to no clear linkage or to linkage
between markers that have nothing to do with the disease.
- Mispaternity may lead to falsely assume men to be fathers
(and/or falsely exclude true fathers) in respective studies.
Recent efforts by the SNP Consortium, the International SNP
Map Working Group, as well as a number of individual companies have
led to the identification of about 1.4 million SNPs. On average,
two haploid human genomes differ in 1 nucleotide per 1330
bp, a rate that is expected to vary somewhat between ethnic
groups.
There are two basic types of SNPs, those within the
coding regions of genes which are called cSNPs, and those
outside of genes. Previous studies have found that, on average,
genes contain about four SNPs per gene. This observation,
together with the estimate of a total of ca. 30,000 genes
in the human genome, indicates the existence of roughly 120,000
cSNPs. Of these, about 60% are estimated to be
synchronous,
i.e. to not lead to changes in amino acids, and 40% are
expected to change an amino acid. These 50,000
non-synchronous
cSNPs, together with an unknown number of regulatory and other
non-coding but functional polymorphisms, comprise the bulk
of common molecular variation with potential phenotypic consequences.
In addition to these causal SNPs, linkage studies try to identify
SNPs which are associated with the inheritance of metabolic
phenotypes, yet located outside of genes and not involved
in directly causing changes in phenotype.
SNPs appear to occur at different frequencies in different DNA sequences. For
example an analysis of a 1Mb Interleukin gene cluster on the human
chromosome 5 (5q31), analyzed in
40 individual Northern Europeans, yielded the following results
(Banerjee at al., 2001, CSHL Genome Meeting):
| |
Bp sequenced |
# SNPs |
Frequency |
| Coding regions/exons |
1977 |
4 |
1/494 |
| Introns |
14700 |
40 |
1/367 |
| UTRs |
1105 |
4 |
1/276 |
| Conserved non-coding sequences |
11295 |
28 |
1/403 |
| Intergenic |
13998 |
36 |
1/389 |
| Total |
31778 |
85 |
1/374 |
Due to their genetic make-up individuals differ in how they
react to disease-causing events, what course and form a disease
may take for them, and how they react to specific medications.
E.g. while smoking in most people increases the risk to develop
lung cancer, some heavy smokers live to very high ages. Also,
while most people who suffer from pain can find relief through
codeine treatment, some can not transform codeine into the
corresponding morphine-like structure and will not experience
efficient pain relief upon being treated with codeine. In
other cases, differential reactions to certain drugs can mean
the difference between healing and suffering and,
often, death.
The sequencing of the human genome has lead to the discovery
of a bounty of SNPs; on average, two human chromosomes entail
one SNP per 1300 nucleotides. This is true for the comparison
of chromosomes among humans as well as for the comparison
of the two copies of each chromosome within each of our body cells.
Several groups are involved in SNP discovery, the
most prominent of which are the SNP Consortium and Celera.
The SNP Consortium consists of several corporations vested
in pharmacy as well as in informatics
(see
http://snp.cshl.org/about/members.html).
The Consortium works with four major academic centers for
molecular genetics and stores its data in its database
at Cold Spring Harbor Laboratory.
|