SNP Exploration
The sequencing of the human genome has lead to the discovery
of a bounty of SNPs; on average, two equivalent human chromosomes entail
one SNP per 1300 nucleotides. Several research institutions are involved in SNP discovery, the
most prominent of which is The SNP Consortium.
The SNP Consortium consists of several corporations vested
in pharmacy as well as in informatics, and four major academic centers for
molecular genetics and genomics
(see
http://snp.cshl.org/about/).
|
|
SNP Exploration
The following exercises will familiarize you with SNPs, SNP phylogeny, and the SNP database of The SNP consortium.
- Determine the differences between the following six sequences utilizing Sequence Server:
- agctggctgaatgctatctgcgtcgcgcgaaataaacgtcagcattcgttacatctctctagggc
- agctggctgaatgctatctgcctcgcgcgaaataaacgtcagcattcgttacatttctctagggc
- agctggctgaatgctatctgcgtcgcgcgaaacaaacgtcagcattcgttacatttctctagggc
- agctggctgaatgctatctgcgtcgcgcgaaataaacgtcagcattcgttacatttctctagggc
- agctggctgagtgctatctgcctcgcgcgaaataaacgtcagcattcgttacatttctctagggc
- agctggctgaatgctatctgcgtcgcgcgaaataaacgtcagcgttcgttacatctctctagggc
- Haplotypes are groups of SNPs that occur linked together in an allele. Determine how the following six haplotypes could be derived from each other. Draw a phylogenetic tree, assuming that at each branching point only one nucleotide is being altered. Thus, fromout each branching point two lineages proceed: one carrying the parental haplotype and one carrying the haplotype that contains a change in one nucleotide position.
- --A--G--T--A--C--
- --A--C--T--A--T--
- --A--G--C--A--T--
- --A--G--T--A--T--
- --G--C--T--A--T--
- --A--G--T--G--C--
- The SNP Consortium
Database at Cold Spring Harbor Laboratory contains roughly 1.5 million entries. This file is to big to download so, please, don't try it on a course computer!!!!
However, you can view the content of the CSHL SNP database here.
- In order to develop an understanding of the nature of the SNP database it is sufficient to look into small excerpts of the entire database. Below find a number of smaller packages derived from the entire database.
- Select a package and analyze its content.
- How many SNPs does your database excerpt list? How many the entire database?
(Hint: look at the last package in the list to answer this question)
- What entries (columns) are provided for each SNP?
- Select a few SNPs and list their genotypes.
(hint: use the GenBank accession in the databank to find out what chromosome the SNPs are
located on).
- List the different polymorphisms in your package. Which combinations did you find?
- In what percentages do the different polymorphisms occur in your package?
List reasons that may lead to the great variation in frequencies. Hint: think about how the different alleles that have been found for a single locus may have been derived from each other. (Hint: examine the structures of the four different nucleotides in this listing,
this website or
this image)
|
- Now analyze the entire database! Just kidding! Just kidding! But look at it, anyway.
- Can you think of a computer program that would have made the previous tasks
easier? What features would this program have to entail? How about a program
that would allow you to answer the following questions and still be
home in time for dinner?
- One possible computer program can be viewed here. Please have a look at it and try to recognize some of the processes that you utilized during your manual analysis.
- Applying the program, how many different polymorphisms can you identify in the
entire database?
- List all possible different allele combinations.
- Determine the ratios for the occurence of different polymorphisms
in the entire database?
- Compare and contrast your
results with your previous findings. Why does the databank contain a different
number of SNPs than you had estimated previously (question 1) above)
Some more TSC SNP (CSHL)
- Go to SNP Consortium
- Check out 'About'
- Find SNPs in and/or adjacent to Human Cadherin Gene (Heart)
- 'About'; 'Gene Search'; type in search string; 'Search'; check respective box; 'Dump table'; view results
- Find genes around particular SNP
- Choose a SNP ID; open Blast Genome Search; perform search
- Record Request ID for back later; 'Format'
- In result window select 'Genome View'
- Select record for 'Map element'
- Make sense of result
- Identify SNPs in your sequence
- 'About'; 'Blast'; Paste in sequence
- Select 'View features in this region'
- How many SNPs were identified in your sequence?
SNPs in other databases
Examine the distribution of SNPs over human DNA contigs
- Go to http://www.ensembl.org/
- Explore the 'How do I ...?' section
- Select a chromosome under 'Browse a Chromosome' (e.g. Girls go to Y and guys to X)
- Examine the sequence visually and discuss whether SNPs appear to be randomly
distributed over the genome or not
- Run cursor over the different feature bars and larn how to navigate
the display by utilizing the information provided in the pop-up windows,
especially the function that lets you zoom in
- Can you identify a correlation between occurence of SNPs and GC content and prevalence of CpG islands?
- Do SNPs appear to more prevalent in genes or in intergenic regions?
- Examine 10 intergenic and 10 intragenic SNPs in more detail. What alleles
can you identify for each SNP? (Make sure that you record the exact location of each SNP,
so that you can find it back after you closed the image)
- Select a region that you want look further into and zoom into the image
until you can identify the nucleotide sequence for this region
Examine the occurence of SNPs in different types of DNA
sequences
There are two basic types of SNPs, those within the
coding regions of genes which are called cSNPs, and those
outside of genes. Previous studies have found that, on average,
genes contain about four SNPs per gene. This observation,
together with the estimate of a total of ca. 30,000 genes
in the human genome, indicates the existence of roughly 120,000
cSNPs. Of these, about 60% are estimated to be
synchronous,
i.e. to not lead to changes in amino acids, and 40% are
expected to change an amino acid. These 50,000
non-synchronous
cSNPs, together with an unknown number of regulatory and other
non-coding but functional polymorphisms, comprise the bulk
of common molecular variation with potential phenotypic consequences.
In addition to these causal SNPs, linkage studies try to identify
SNPs which are associated with the inheritance of metabolic
phenotypes, yet located outside of genes and not involved
in directly causing changes in phenotype.
SNPs appear to occur at different frequencies in different DNA sequences. For
example an analysis of a 1Mb Interleukin gene cluster on the human
chromosome 5 (5q31), analyzed in
40 individual Northern Europeans, yielded the following results
(Banerjee at al., 2001, CSHL Genome Meeting):
| |
Bp sequenced |
# SNPs |
Frequency |
| Coding regions/exons |
1977 |
4 |
1/494 |
| Introns |
14700 |
40 |
1/367 |
| UTRs |
1105 |
4 |
1/276 |
| Conserved non-coding sequences |
11295 |
28 |
1/403 |
| Intergenic |
13998 |
36 |
1/389 |
| Total |
31778 |
85 |
1/374 |
- Examine the occurence of
SNPs in different types of DNA sequences. How would you explain the
differences in frequencies? Why do the authors report SNPs to occur in
higher frequencies than the stated number of 1,300 SNPs that occur on
average between any two human alleles?
|
|
Follow this link to Chapter 2: SNPs in
Biomedicine
|
|
|