Main Article Content
A hidden Markov model (HMM) is used for exon prediction on DNA of genes Plasmodium falciparum that has a model structure based on exon region structure in coding sequence (CDS). The objective research was to develop a new structure model to predict exon on DNA-genes of Plasmodium falciparum based on CDS structure using the HMM system.
Model design in CDS, between two exon regions can be found one intron region and the model state number is used for its region. Its state number is used by separating start codon from first exon region and stop codon from the last exon region up to 9. The Viterbi algorithm and the backward-forward method for transition as well as emission states are used for training process. Furthermore, Viterbi and Baum-Welch algorithms are used for the testing process. The correlation coefficient (CC) was used as performance indicator, as the ratio of the estimated state in the output and the original state in the input of the model.
The simulation results has shown that the CC values depend on the given of the backward-forward transition state values randomly. The model with state number 9 showed the highest average of CC values of 0.7289 for Viterbi algorithm, and is 0.7166 for Baum-Welch algorithm. However, the lowest average of CC values has been found for the model with state number five. Its values are 0.6735 by using Viterbi algorithm and 0.6661 by using Baum-Welch algorithm.
The new structure model based on HMM system was valid to predict exon on DNA-genes of Plasmodium falciparum.
Samatova N.F. Computational gene finding using HMMs. UT-Battelle Information Center, 1201 Oak Ridge Turnpike, Suite 100, Oak Ridge, TN 37830. Institute Oak Ridge National Laboratory; 2003.
Lawrence R. A tutorial on hidden Markov models and selected applications in speech recognition. Proceeding of The IEEE. 1989; 77: 257-68.
Henderson J, Salzberg S, Fasman K.H. Finding genes in DNA with a hidden Markov model. J Comput Biol 1997; 4: 127-42.
Krogh A. An introduction to hidden Markov models for biological sequences. In Computational methods in molecular biology. Salzberg SL, Searls DB, Kasif S. editors. Denmark. Center for Biological Sequence Analysis, Technical University of Denmark. 1998. p. 45-63.
Nicorici D, Astola J, Tobus I. Computational identification of exons in DNA with a hidden Markov model. Tampere International Center for Signal Processing, Tampere University of Technology, Finland. Available at: http://www.gensips.gatech.edu/processings/contributed/CP2-06.pdf. Accessed May 12, 2005.
Anantharaman T. Finding genes in genomic DNA The GENESCAN System. Available at : http://www.biostat.wisc.edu/bmi776. Accessed June 6, 2005.
Vaisman I. Bioinformatics and gene discovery. Bioinformatics Tutorial. North Carolina, United State: University of North Carolina at Chapel Hill. 1998.
Hall N, Gardner M.J, Hyman RW, Lasonder E, Wilson RJM, Scherf A, et al. Sequence of plasmodium falciparum chromosomes 1, 3-9 and 13. Nature 2002; 419: 527-31.
Gardner M.J, Hall N, Hyman RW, Hinterberg K, Mattei D, Wellem TE, et al. Sequence of plasmodium falciparum chromosomes 2, 10, 11 and 14. Nature 2002; 419: 531-4.
Hyman RW, Gardner M.J, Hall N, De Bruin D, Scherf A, Day KP, et al. Sequence of plasmodium falciparum chromosome 12. Nature 2002; 419: 534-7.
Anastassiou D. Genomic signal processing. IEEE Signal Processing Magazine 2001; 18: 4.
Alphey L. DNA sequencing. Manchester UK : Bios Scientific Publishers Limited; 1997.
Gardner M.J, Hall N, Fung E, White O, Berriman M, Hyman RW, et al. Chromosome 2 sequence of the human malaria parasite Plasmodium falciparum. Science 1998; 282: 1126-32.
Bowman S, Lawson D, Basham D, Brown D, Chillingworth T, Churcher C. M, et al. The complete nucleotide sequence of chromosome 3 of Plasmodium falciparum. Nature 1999; 400: 532-8.
Anonymous. Malaria after the genomes. The Lancet 2002; 360: 1107.