BACKGROUND: MicroRNAs (miRNAs) are short, non-coding
RNA molecules that are directly involved in
post-transcriptional regulation of gene expression. The
mature miRNA sequence binds to more or less specific
target sites on the mRNA. Both their small size and
sequence specificity make the detection of completely
new miRNAs a challenging task. This cannot be based on
sequence information alone, but requires structure
information about the miRNA precursor. Unlike
comparative genomics approaches, ab initio approaches
are able to discover species-specific miRNAs without
known sequence homology.
RESULTS: MiRPred is a novel method for ab initio
prediction of miRNAs by genome scanning that only
relies on (predicted) secondary structure to
distinguish miRNA precursors from other similar-sized
segments of the human genome. We apply a machine
learning technique, called linear genetic programming,
to develop special classifier programs which include
multiple regular expressions (motifs) matched against
the secondary structure sequence. Special attention is
paid to scanning issues. The classifiers are trained on
fixed-length sequences as these occur when shifting a
window in regular steps over a genome region. Various
statistical and empirical evidence is collected to
validate the correctness of and increase confidence in
the predicted structures. Among other things, we
propose a new criterion to select miRNA candidates with
a higher stability of folding that is based on the
number of matching windows around their genome
location. An ensemble of 16 motif-based classifiers
achieves 99.9 percent specificity with sensitivity
remaining on an acceptable high level when requiring
all classifiers to agree on a positive decision. A low
false positive rate is considered more important than a
low false negative rate, when searching larger genome
regions for unknown miRNAs. 117 new miRNAs have been
predicted close to known miRNAs on human chromosome 19.
All candidate structures match the free energy
distribution of miRNA precursors which is significantly
shifted towards lower free energies. We employed a
human EST library and found that around 75 percent of
the candidate sequences are likely to be transcribed,
with around 35 percent located in introns.
CONCLUSION: Our motif finding method is at least
competitive to state-of-the-art feature-based methods
for ab initio miRNA discovery. In doing so, it requires
less previous knowledge about miRNA precursor
structures while programs and motifs allow a more
straightforward interpretation and extraction of the
acquired knowledge.
%0 Journal Article
%1 Brameier:2007:BMCbinf
%A Brameier, Markus
%A Wiuf, Carsten
%D 2007
%J BMC Bioinformatics
%K algorithms, genetic linear programming programming,
%P 478
%R doi:10.1186/1471-2105-8-478
%T Ab initio identification of human microRNAs based on
structure motifs
%U http://www.biomedcentral.com/content/pdf/1471-2105-8-478.pdf
%V 8
%X BACKGROUND: MicroRNAs (miRNAs) are short, non-coding
RNA molecules that are directly involved in
post-transcriptional regulation of gene expression. The
mature miRNA sequence binds to more or less specific
target sites on the mRNA. Both their small size and
sequence specificity make the detection of completely
new miRNAs a challenging task. This cannot be based on
sequence information alone, but requires structure
information about the miRNA precursor. Unlike
comparative genomics approaches, ab initio approaches
are able to discover species-specific miRNAs without
known sequence homology.
RESULTS: MiRPred is a novel method for ab initio
prediction of miRNAs by genome scanning that only
relies on (predicted) secondary structure to
distinguish miRNA precursors from other similar-sized
segments of the human genome. We apply a machine
learning technique, called linear genetic programming,
to develop special classifier programs which include
multiple regular expressions (motifs) matched against
the secondary structure sequence. Special attention is
paid to scanning issues. The classifiers are trained on
fixed-length sequences as these occur when shifting a
window in regular steps over a genome region. Various
statistical and empirical evidence is collected to
validate the correctness of and increase confidence in
the predicted structures. Among other things, we
propose a new criterion to select miRNA candidates with
a higher stability of folding that is based on the
number of matching windows around their genome
location. An ensemble of 16 motif-based classifiers
achieves 99.9 percent specificity with sensitivity
remaining on an acceptable high level when requiring
all classifiers to agree on a positive decision. A low
false positive rate is considered more important than a
low false negative rate, when searching larger genome
regions for unknown miRNAs. 117 new miRNAs have been
predicted close to known miRNAs on human chromosome 19.
All candidate structures match the free energy
distribution of miRNA precursors which is significantly
shifted towards lower free energies. We employed a
human EST library and found that around 75 percent of
the candidate sequences are likely to be transcribed,
with around 35 percent located in introns.
CONCLUSION: Our motif finding method is at least
competitive to state-of-the-art feature-based methods
for ab initio miRNA discovery. In doing so, it requires
less previous knowledge about miRNA precursor
structures while programs and motifs allow a more
straightforward interpretation and extraction of the
acquired knowledge.
@article{Brameier:2007:BMCbinf,
abstract = {BACKGROUND: MicroRNAs (miRNAs) are short, non-coding
RNA molecules that are directly involved in
post-transcriptional regulation of gene expression. The
mature miRNA sequence binds to more or less specific
target sites on the mRNA. Both their small size and
sequence specificity make the detection of completely
new miRNAs a challenging task. This cannot be based on
sequence information alone, but requires structure
information about the miRNA precursor. Unlike
comparative genomics approaches, ab initio approaches
are able to discover species-specific miRNAs without
known sequence homology.
RESULTS: MiRPred is a novel method for ab initio
prediction of miRNAs by genome scanning that only
relies on (predicted) secondary structure to
distinguish miRNA precursors from other similar-sized
segments of the human genome. We apply a machine
learning technique, called linear genetic programming,
to develop special classifier programs which include
multiple regular expressions (motifs) matched against
the secondary structure sequence. Special attention is
paid to scanning issues. The classifiers are trained on
fixed-length sequences as these occur when shifting a
window in regular steps over a genome region. Various
statistical and empirical evidence is collected to
validate the correctness of and increase confidence in
the predicted structures. Among other things, we
propose a new criterion to select miRNA candidates with
a higher stability of folding that is based on the
number of matching windows around their genome
location. An ensemble of 16 motif-based classifiers
achieves 99.9 percent specificity with sensitivity
remaining on an acceptable high level when requiring
all classifiers to agree on a positive decision. A low
false positive rate is considered more important than a
low false negative rate, when searching larger genome
regions for unknown miRNAs. 117 new miRNAs have been
predicted close to known miRNAs on human chromosome 19.
All candidate structures match the free energy
distribution of miRNA precursors which is significantly
shifted towards lower free energies. We employed a
human EST library and found that around 75 percent of
the candidate sequences are likely to be transcribed,
with around 35 percent located in introns.
CONCLUSION: Our motif finding method is at least
competitive to state-of-the-art feature-based methods
for ab initio miRNA discovery. In doing so, it requires
less previous knowledge about miRNA precursor
structures while programs and motifs allow a more
straightforward interpretation and extraction of the
acquired knowledge.},
added-at = {2008-06-19T17:35:00.000+0200},
author = {Brameier, Markus and Wiuf, Carsten},
biburl = {https://www.bibsonomy.org/bibtex/2a4392716a541f0309057a790514472d6/brazovayeye},
doi = {doi:10.1186/1471-2105-8-478},
interhash = {59f64a0c2ac5c43cfe13587eb5d469b6},
intrahash = {a4392716a541f0309057a790514472d6},
journal = {BMC Bioinformatics},
keywords = {algorithms, genetic linear programming programming,},
month = {18 December},
notes = {PMID: 18088431 [PubMed - indexed for MEDLINE]},
pages = 478,
size = {11 pages},
timestamp = {2008-06-19T17:36:55.000+0200},
title = {Ab initio identification of human {microRNAs} based on
structure motifs},
url = {http://www.biomedcentral.com/content/pdf/1471-2105-8-478.pdf},
volume = 8,
year = 2007
}