Annotation process IRESite database is an open project
aiming to cluster results in IRES research area during the last 20
years. One of the most exciting goals of the project is to rate IRES
segments by their strength. Clearly not all results are directly
comparable and some results are not sufficiently accompanied by
experimental controls -- so the credibility of the results
demonstrating the respective IRES is in some cases bad. Unfortunately,
we do not know which IRESs fall into which group. Necessity for such a
rigorous research is based on the fact that a number of IRES segments
were reported during last 5 years to contain cryptic promoter,
alternative splicing seriously undermined several results, etc. The
goals are very tricky to acquire. Thus, hereby we present how we
annotate records in IRESite and how we currently think we could "rate"
the IRES segments in the future. We are very much open to discuss the
procedure and the outcomes preferrably on the e-mail list.
The procedure used throughout the annotation is the following:
1. Curator either obtains the sequence from original author or
reconstructs sequence of the plasmid in a computer. We use ApE plasmid
editor to manipulate the sequences -- to cut the plasmid sequence
and insert sequence of a cDNA fragment obtained elsewhere (for example
by doing electronic PCR using the primers described in the article)
into the restriction sites as described in Materials and methods of
the respective article).
Alternatively, the sequence data obtained from original author are
verified to match the description in the article (we have already hit
cases when this was not true and the issues had to be resolved by
mutual discussion).
2. Curator normalizes any measured values which quantify efficiency
of IRES segments studied to the respective negative and positive
controls. For example, curator converts activity or luminescence units
to percentages by recalculating the measured/reported values of the
putative IRES and brings them to scale with the positive control
(positive control has by definition 100% activity). When no positive
control has been used the activity of the putative IRES is set to 100%
and activity of the negative control is scaled accordingly. The
existence of consistently scaled values will help in orientation in
collected data and maybe could also make some results directly
comparable to each other (when same signal genes and controls were
used). Unfortunately, the reported reproducibility is sometimes very low.
3. When no numeric data are available from article or authors
(e.g. only spots on dot-blots were published) curator uses his/her lab
experience to make a reasonable guess while making a note in Remarks
section. Curator tags the record with certain values to emphasize the
value is a guess.
4. Curator annotates whether the integrity of the transcript
derived from a plasmid has been verified and how exactly. For example,
Northern blot assay cannot be considered as sufficient verification of
transcript integrity because it does not confirm the molecule was
really complete in its full-length (as required) and the sensitivity
is not sufficient either (Han and Zhang, 2002; van Eden et al.,
2004). Currently, only RNase protection assay and RT-PCR and maybe few
other sensitive methods not discussed herein we consider as sensitive
enough to provide convincing results. Still, depending on the exact
setup of the experiment there might arise some exceptions.
5. Special care is given to promoters used for in vivo
and/or in vitro expression. Further, we also look in the
publications whether the plasmids were shown to be without cryptic
promoters. Clearly, the risk of cryptic promoter affecting published
experiment can only be elucidated by analysis of complete sequences,
even at the enormous cost of our work to obtain/reconstruct them. We
justify the extra work also by the fact sequence data will be used for
other purposes as well (search for rRNA complementarity, transcription
factor binding sites and other promoter determinants) and their value
will not degrade in time.
6. Curator annotates position of the IRES element within the
underlying mRNA sequence and finds out what is the minimal part of
IRES required for its activity.
7. Curator records how secondary structure of IRES reported was
studied (using what kind of method or combination of methods). For
example, enzymatic, chemical or physical-chemical approaches like
UV-light cross-linking are used (Nishiyama et al., 2003; Shibuya et
al., 2003). Curator re-types the structure using bracket notation into
the database as already mentioned (possible improvement of the
representation itself will be discussed during the project when more
data accumulates).
8. Curator annotates which proteins (ITAFs) modulate IRES activity
in each case. Additionally, proteins directly interacting with the RNA
can be annotated. The solution is general and allows us to annotate
interaction between any region on the RNA molecules with any protein,
even with barely characterized protein (i.e. interaction of non-IRES
containing regions with some yet uncharacterized protein can be
recorded as well) and such region may of course overlap.
9. Curator lists all references used to annotate the respective
database entry.
10. Curator includes additional important data in relevant Remarks
sections and highlights any potential issues.
11. Super-reviewer will verify the annotated entry.
12. Curator contacts original authors of the publication entered
into the database in the form of plasmid_with_promoter_and_putative_IRES_translationally_characterized records and asks
for kind of verification of the data (before the data gets published
or shortly after).
13. Both curator and super-reviewer learn from the feedback from
original author and resolve potential issues.
Rating IRESs by their function and credibility of the primary
experiments
The following is a draft version of rules which could be used in
the future to consistently rate IRES segments by credibility of the
original experimental results. Such rating would be extremely
interesting and helpful not only to scientists who would wish to
improve strength of their IRES segments but would also help novice
scientists which "IRESs" should be avoided. Our aim is to show which
IRES segments are best characterized and how. For consistency there
has to be a list of clearly defined rules. Although the list might be
too strict and incomplete we still believe without such a list the
rating would be too subjective. We believe the list published below
will be subject to many discussions within our team and the
public. Currently it merely demonstrates how broad and diverse
experimental evidence could be and how difficult will it be to make the
rating. It also demonstrates what is the most difficult task curators
will face -- to judge whether the conclusions published by authors are
based on strongly supported evidence from current perspective. The
list contains several proposed values of the field conclusion
to be used to tag individual IRES segments with the message to show
whether the IRES presence/activity is from the current perspective
well justified by experimental evidence:
- true_IRES
Records tagged by this value are the best characterized IRESs and
the experimental results are convincing. The methods employed to prove
the respective IRES were sensitive enough to form the basis for
convincing interpretation of the results. This value is intended to be
really used only for the best studied cases which should hopefully
sustain even the critique of M. Kozak (Kozak 2001a,b,2003,2005;
Schneider et al., 2001). ;)
Although we do not want to generalize, such experiments might
have e.g. utilized one or a combination of the following methods to
prove no functionally monocistronic mRNAs were present in the studied
reporter system ought to be based on bicistronic mRNA assay:
RT-PCR based method(s) was employed with proper combination of
primers, preferably covering multiple regions of the bicistronic mRNA
5'-RACE method
RNase protection assay quantitatively confirmed integrity (at the
best with multiple probes covering more regions) of the transcript
direct bicistronic RNA transfection demonstrated IRES activity
Dual-plasmid setup was employed to express the bicistronic
mRNA. The first plasmid could regulate expression of bicistronic mRNA
from the second plasmid. The inducibility of the expression (and
therefore of the observed IRES activity) conveniently confirms that
there are no cryptic promoters in the second, bicistronic plasmid
containing the putative IRES in the intercistronic region.
Similarly, promoter-less plasmid can be used to test for presence
of a cryptic promoters in the putative IRES sequence.
- putative_IRES
When any doubts about integrity of the bicistronic mRNAs exist (cryptic
promoter, splicing, cleavage or degradation issues). For example, when
the integrity of the transcripts has only been "studied" by Northern
blot. However, the IRES has been functional in some in
vivo/in vitro studies based on bicistronic constructs.
- possibly_not_IRES
IRESs which we are not convinced about.
- disproved_IRES
Disproved IRESs.
It is clear one has to accommodate also additional experimental
evidence like presence of the respective mRNA in polyribosomal
fraction during e.g. poliovirus infection or other stresses, probably
also effects of the cap-analogs and rapamycin on the IRES activity,
effect of the presence of strong hairpin structures in certain mRNA
regions, etc.
Determination of IRES boundaries
Not surprisingly, flanking regions around IRES modulate its
function (Belsham, 1992; Hunt et al., 1993; Ohlmann et
al., 1999). Typically, IRES elements are experimentally
point-mutated or shortened on sides or some internal regions/loops are
deleted and their effect on the IRES activity is evaluated. This is
very valuable type of information which we want to record in a
comfortable way. Therefore, even multiple partially overlapping
regions can be annotated simultaneously as functional or defective
IRES segment.
Needless to say, there is no clear border between functional and
defective IRES segments. We doubt one could ever come up with a clear
rule to say that e.g. IRESs with activity of at least 85% of the
positive control are considered as functional while segments
displaying activities below 85% should be tagged as defective IRES
(not only it depends on the respective IRES used as the positive
control but also on the surrounding sequences of IRES). However, when
a putative IRES sequence has been systematically shortened by deletion
sometimes there is a sharp drop in the IRES activity after certain
deletion and in such cases it is rather intuitive to distinguish
between the functional and defective variant. Still, there is a
problem is that discontinuous IRES regions are sometimes reported and
authors refer to modular IRESs. So far we do not see a universal
solution to this problem.
Authenticity of the data
We are aware of the risk we might introduce errors when
reconstructing the sequence data and annotating the region transcribed
into mRNA. Therefore, we always ask authors for their sequence data
and when we receive them we verify that they do match the description
in "Materials and Methods" in the respective publication. Any
discrepancies are either resolved by mutual discussion or highlighted
in Remarks. When we assemble the records (incl. sequences) ourselves
we still do consult the resulting records with original authors so
that any potential misinterpretations can be unleashed.
Keeping track of publications being processed
The process of manual extraction of the primary experimental data
and assembly of primary sequences is very time consuming and is often
interrupted for various reasons. To organize the partially processed
data effectively we use Bugzilla
software to keep track of such open issues. Bugzilla is also used to
keep track of pending software and database schema bugs and
improvement proposals. It is not publicly accessible.
Cited literature
Belsham G.(1992) Dual initiation sites of protein synthesis on foot-and-mouth disease virus RNA are selected following internal entry and scanning of ribosomes in vivo. EMBO J. 11: 1106-1110
Han B., Zhang J.-T. (2002) Regulation of Gene Expression by Internal Ribosome Entry Sites or Cryptic Promoters: the eIF4G Story. Mol. Cell. Biol. 22: 7372-7384
Hunt S. L., Kaminski A., Jackson R. J. (1993). The influence of viral coding sequences on the efficiency of internal initiation of translation of cardiovirus RNAs. Virology 197:801-807
Kozak M. (2001a) New ways of initiating translation in eukaryotes? Mol. Cell. Biol. 21: 1899-1907
Kozak M. (2001b) New ways of initiating translation in eukaryotes? Authors reply. Mol. Cell. Biol. 21: 8241-8246
Kozak M. (2003) Alternative ways to think about mRNA sequences and proteins that appear to promote internal initiation of translation. Gene 318: 1-23
Kozak M. (2005) A second look at cellular mRNA sequences said to function as internal ribosome entry sites. Nucleic Acids Res. 33: 6593-6602
Nishiyama T., Yamamoto H., Shibuya N., Hatakeyama Y., Hachimori Y., Uchiumi T., Nakashima N. (2003) Structural elements in the internal ribosome entry site of Plautia stali intestine virus responsible for binding with ribosomes. Nucleic Acids Res. 31: 2434-2442
Ohlmann T., Jackson R. J.(1999) The properties of chimeric picornavirus IRESes show that discrimination between internal translation initation sites is influenced by the identity of the IRES and not just the context of the AUG codon. RNA 5: 764-778
Schneider R. et al. (2001) New ways of initiating translation in eukaryotes? Letter to the editor. Mol. Cell. Biol. 21: 8238-8241
Shibuya N., Nishiyama T., Kanamori Y., Saito H., Nakashima N. (2003) Conditional Rather than Absolute Requirements of the Capsid Coding Sequence for Initiation of Methionine-Independent Translation in Plautia stali Intestine Virus. J. Virol. 77: 12002-12010
van Eden M. E., Byrd M. P., Sherrill K. W., Lloyd R. E. (2004) Demonstrating internal ribosome entry sites in eukaryotic mRNAs using stringent RNA test procedures. RNA 10: 720-730
|