Academic
Publications
Mutation Information and Relative Entropy about RNA Secondary Structure

Mutation Information and Relative Entropy about RNA Secondary Structure,10.1109/icbbe.2011.5780021,Yi-Tao Yuan,Ying-Fei Sun

Mutation Information and Relative Entropy about RNA Secondary Structure  
BibTex | RIS | RefWorks Download
With the imminent completion of the Human Genome Project and the fast increase of many complete genomes of prokaryotes and eukaryotes, the task of organizing and understanding the generated sequences and structural data becomes more and more pressing and demands better and more efficient analysis algorithms. Effective detection of splice sites requires the knowledge of characteristics, dependencies, RNA secondary structure information, and relationship of nucleotides in the splice sites surrounding region. In this paper, we introduced a new method about computing RNA secondary structure mutation information and relative entropy based on information theory. Furthermore, we also explained why the addition RNA secondary structure information could improve the accuracy of RNA splice sites prediction based on mutation information and relative entropy theory. The basic gene structure for higher eukaryotes includes promoter, start codons, introns, exons, and stop codons, etc. The gene expressed by a several stage process comprising transcription and translation. Transcription involves initiation, elongation, and termination steps. RNA polymerase catalyzing RNA synthesis binds a special region (promoter) at the start of the gene and moves along the template, synthesizing RNA, until it reaches a teminator sequence. The mRNA consists of sequences (called exon) that encode the protein product. The gene sequence often includes noncoding regions, called introns, which are removed from the primary transcript during RNA splicing. Finding a gene merely based on genomic sequence characteristic requires that the program find the start codon, all splice sites and the stop codon. If we were able to predict all splice sites accurately, we would be able to perform a highly reliable gene prediction (3). Thus, improvement of splice site detection methods can directly enhance the gene prediction power. The precise removal of introns from mRNA precursors is mainly defined by the highly conserved sequences near the ends of introns. The 5' boundary (called donor site) of introns almost (99.24% of introns (4)) always contains the dinucleotide GU, while the 3' boundary (called acceptor site) contains the dinucleotide AG. However, for these conserved sequence characteristics occurs commonly in genomic sequence, the GU- AG rule is just a necessary condition and not a sufficient condition. Therefore if the splice sites detection program merely bases on the GU-AG rule, there will be a plenty of false splice sites predicted. Moreover, it is reported that RNA secondary structure information could aid splice site prediction in human genes (5). When a combination of sequence and structure information was applied to predict splice sites by neural network (6) and different order Markov models (7), the accuracy of prediction considerably improves. In this paper, we introduced a new method about computing RNA secondary structure mutation information and relative entropy based on information theory. Furthermore, we also explained why the addition RNA secondary structure information could improve the accuracy of RNA splice sites prediction based on mutation information and relative entropy theory.
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.