Exercise 4 - Gene Prediction Through Sequence Similarity
If parts or all of an un-annotated sequence are already stored in some database, searches for
similar sequences can significantly minimize the effort to identify genes in un-annotated sequences.
Hereby all or a part of the sequence is utilized to search databases.
Find the gene in this sequence by performing a Blast search on GenBank through NCBI.
Save your results in a word processor, as entire web pages, and/or as screen shots. (Should you experience problems with the program, see here shots of Blast options,
a
Blast input,
Blast output,
and a
GenBank entry
for a search 'hit'.)
Answer the following questions utilizing your notes, the GenBank entries, and appropriate tools from :
What characteristics does this gene have? (Length, exons, introns, splice sites, promoter, etc.)?
What do the mRNA and amino acid sequences of the gene and its product look like?
Can you identify any 'unusual' features (alternative splicing, genes-in-genes, genes-ad-genes, etc.).
Can you identify any variations within the gene? (SNPs, insertions, etc.)
Can you find any pseudogenes of your gene?
Can you identify similar genes in the same organism? In other organisms? Are these orthologs or paralogs?
Can you find any references to the experiments that lead to the identification of your gene?
Can you find information about the biology of the gene?
Can you identify the protein? What kind of a protein is it? Are there other proteins with similar functions?
Can you identify PCR primers that could be used as markers for your gene?
What does your gene look like on a cytogenetic map?
Compare your results with those of your predictions in the first three exercises from Genes in DNA Sequences.
Redo the search using as search sequences a) only the sequence of the internal exon, and
b) intron sequence from your predictions in the first three exercises from
Genes in DNA Sequences. Does this search lead
to different results than using the entire FASTA clone?