A few sequence features alone are usually quite unreliable in
locating genes in genomic sequences. Most sequence annotation
programs utilize a variety of different strategies to successfully
and comprehensibly annotate genomic sequences.
Sensitivity of gene prediction programs: How many exons were correctly identified?
Sensitivity = True Positives divided by sum of True Positives and False Negatives
Sens. = TP / (TP + FN)
Specificity of gene prediction programs: How many predicted exons are indeed exons?
Specificity = True Positives divided by sum of True Positives and False Positives
Spec. = TP / (TP + FP)
The following is a sample of articles that analyze the success rate of gene prediction and genome annotation tools:
In general, automatic exon and gene prediction is bad (large number of false
positives and missed genes. The sensitivity of automatic gene prediction is between 20% and 70%, the specificity between 5% and 57%).
~ 50% of exon predictions are false
Short exons are poorly predicted
Conflicting predictions
Accuracy is species-specific
Use of supporting evidence (ESTs, cDNAs. protein homology) dramatically increases
annotation success!