|
Exercise 2 - Characteristics of Coding Sequences (CDS)
Genes provide the building plan for proteins. The genetic code describes a relationship between
nucleotide triplets and amino acids, whereby 64 possible nucleotide combinations code for the
placement of 20 amino acids and 3 stop codons. DNA sequences which are translated into amino acid
sequences are called coding sequences or CDS, all other DNA sequences are summarized as non-coding DNA.
In prokaryotes genes consist entirely of coding DNA, in eukaryotes, coding DNA is located in exons.
Non-coding DNA in eukaryotes entails introns, non-translated 5'- and 3'-DNA exon stretches
(5'-UTR, 3'-UTR), and promotors. In addition to these "genic" non-coding DNA sequences there is a
wide array of non-coding DNA that is located in intergenic regions, centromers, and telomers.
This exercise will help you to better understand the concepts of reading frames, open reading
frames (ORFs), and how they can be utilized for the discovery of genes.
-
Manual ORF detection
Determine for
this nucleotide sequence
- the three possible reading frames;
- the translation (find a genetic code table here and in your binder);
- the four open reading frames
- the prospective gene and prospective exons
If this sequence represents the "Watson"-strand, what would the "Crick"-strand look like?.
-
ORFs in exons, introns, intergenic and random DNA
Software to detect ORFs and to translate nucleotide sequences is available at several web sites including
the DNALC Sequence Utilities toolbox. Find ORFs in your assigned set of nucleotide sequences utilizing
the NCBI ORF Finder.
- Paste your sequence into the input window and activate the tool selecting the 'ORFFind' button.
- ORFs are shown in green. What do the six different boxed rows represent?
- What happens if you click on an ORF?
- In which sequences do you find ORFs?
- Does any sequence contain ORFs that extend through the entire sequence?
- Which one's? In what reading frame?
-
Other ORF-analysis software
Run a few of sequences through these two tools:
How does the output of these tools differ from NCBI ORFFind?
Annotate a genome through ORF analysis
Annotate a simple genome by locating its open reading frames.
Detect the genes in this genome
- Paste genome sequence
into the NCBI ORF Finder.
- Select 'ORFFind'.
- On the result page change sensitivity from 100 to 300. Select 'Redraw'.
- Locate the exact start and end points for each predicted ORF by clicking it.
Record the nucelotide positions for each ORF.
Identify the nature of the genes
- Click on an ORF.
- Highlight and copy the nuclceotide sequence for one of the ORFs.
- Open a new Word document, paste the sequence into it, and clean it from any symbols representing amino acids
- Go to the NCBI Blast site
- Select 'Blastn' to perform a search for nucleotide sequences.
- Paste the cleaned sequence into the search window.
- Select 'Blast'.
- Select 'Format'
- Wait for your search results to be displayed, If this takes too long for you you may write down the 'Request ID' and return later to retrieve your results.)
- Check your results and determine what gene(s) they are.
Repeat this search process with another of the ORFs and determine what this gene might be.
What organism is the genome derived from?
Does this map confirm your suspicion? Which genes do the ORFs you found before represent? Which genes did you miss?
Re-run the NCBI ORFFinder but this time set the sensitivity on the output page to 100 and to 50. Select 'Redraw' and try to detect ORFs for the genes that you were unable to detect before.
Consult these other resources for additional information on retroviruses:
|
|