Annotation process
IRESite database is an open project aiming to cluster results in IRES research area during the last 20 years. One of the most exciting goals of the project is to rate IRES segments by their strength. Clearly not all results are directly comparable and some results are not sufficiently accompanied by experimental controls -- so the credibility of the results demonstrating the respective IRES is in some cases bad. Unfortunately, we do not know which IRESs fall into which group. Necessity for such a rigorous research is based on the fact that a number of IRES segments were reported during last 5 years to contain cryptic promoter, alternative splicing seriously undermined several results, etc. The goals are very tricky to acquire. Thus, hereby we present how we annotate records in IRESite and how we currently think we could "rate" the IRES segments in the future. We are very much open to discuss the procedure and the outcomes preferrably on the e-mail list.

The procedure used throughout the annotation is the following:

1. Curator either obtains the sequence from original author or reconstructs sequence of the plasmid in a computer. We use ApE plasmid editor to manipulate the sequences -- to cut the plasmid sequence and insert sequence of a cDNA fragment obtained elsewhere (for example by doing electronic PCR using the primers described in the article) into the restriction sites as described in Materials and methods of the respective article).

Alternatively, the sequence data obtained from original author are verified to match the description in the article (we have already hit cases when this was not true and the issues had to be resolved by mutual discussion).

2. Curator normalizes any measured values which quantify efficiency of IRES segments studied to the respective negative and positive controls. For example, curator converts activity or luminescence units to percentages by recalculating the measured/reported values of the putative IRES and brings them to scale with the positive control (positive control has by definition 100% activity). When no positive control has been used the activity of the putative IRES is set to 100% and activity of the negative control is scaled accordingly. The existence of consistently scaled values will help in orientation in collected data and maybe could also make some results directly comparable to each other (when same signal genes and controls were used). Unfortunately, the reported reproducibility is sometimes very low.

3. When no numeric data are available from article or authors (e.g. only spots on dot-blots were published) curator uses his/her lab experience to make a reasonable guess while making a note in Remarks section. Curator tags the record with certain values to emphasize the value is a guess.

4. Curator annotates whether the integrity of the transcript derived from a plasmid has been verified and how exactly. For example, Northern blot assay cannot be considered as sufficient verification of transcript integrity because it does not confirm the molecule was really complete in its full-length (as required) and the sensitivity is not sufficient either (Han and Zhang, 2002; van Eden et al., 2004). Currently, only RNase protection assay and RT-PCR and maybe few other sensitive methods not discussed herein we consider as sensitive enough to provide convincing results. Still, depending on the exact setup of the experiment there might arise some exceptions.

5. Special care is given to promoters used for in vivo and/or in vitro expression. Further, we also look in the publications whether the plasmids were shown to be without cryptic promoters. Clearly, the risk of cryptic promoter affecting published experiment can only be elucidated by analysis of complete sequences, even at the enormous cost of our work to obtain/reconstruct them. We justify the extra work also by the fact sequence data will be used for other purposes as well (search for rRNA complementarity, transcription factor binding sites and other promoter determinants) and their value will not degrade in time.

6. Curator annotates position of the IRES element within the underlying mRNA sequence and finds out what is the minimal part of IRES required for its activity.

7. Curator records how secondary structure of IRES reported was studied (using what kind of method or combination of methods). For example, enzymatic, chemical or physical-chemical approaches like UV-light cross-linking are used (Nishiyama et al., 2003; Shibuya et al., 2003). Curator re-types the structure using bracket notation into the database as already mentioned (possible improvement of the representation itself will be discussed during the project when more data accumulates).

8. Curator annotates which proteins (ITAFs) modulate IRES activity in each case. Additionally, proteins directly interacting with the RNA can be annotated. The solution is general and allows us to annotate interaction between any region on the RNA molecules with any protein, even with barely characterized protein (i.e. interaction of non-IRES containing regions with some yet uncharacterized protein can be recorded as well) and such region may of course overlap.

9. Curator lists all references used to annotate the respective database entry.

10. Curator includes additional important data in relevant Remarks sections and highlights any potential issues.

11. Super-reviewer will verify the annotated entry.

12. Curator contacts original authors of the publication entered into the database in the form of plasmid_with_promoter_and_putative_IRES_translationally_characterized records and asks for kind of verification of the data (before the data gets published or shortly after).

13. Both curator and super-reviewer learn from the feedback from original author and resolve potential issues.

Rating IRESs by their function and credibility of the primary experiments
The following is a draft version of rules which could be used in the future to consistently rate IRES segments by credibility of the original experimental results. Such rating would be extremely interesting and helpful not only to scientists who would wish to improve strength of their IRES segments but would also help novice scientists which "IRESs" should be avoided. Our aim is to show which IRES segments are best characterized and how. For consistency there has to be a list of clearly defined rules. Although the list might be too strict and incomplete we still believe without such a list the rating would be too subjective. We believe the list published below will be subject to many discussions within our team and the public. Currently it merely demonstrates how broad and diverse experimental evidence could be and how difficult will it be to make the rating. It also demonstrates what is the most difficult task curators will face -- to judge whether the conclusions published by authors are based on strongly supported evidence from current perspective. The list contains several proposed values of the field conclusion to be used to tag individual IRES segments with the message to show whether the IRES presence/activity is from the current perspective well justified by experimental evidence:

  1. true_IRES
    Records tagged by this value are the best characterized IRESs and the experimental results are convincing. The methods employed to prove the respective IRES were sensitive enough to form the basis for convincing interpretation of the results. This value is intended to be really used only for the best studied cases which should hopefully sustain even the critique of M. Kozak (Kozak 2001a,b,2003,2005; Schneider et al., 2001). ;)

    Although we do not want to generalize, such experiments might have e.g. utilized one or a combination of the following methods to prove no functionally monocistronic mRNAs were present in the studied reporter system ought to be based on bicistronic mRNA assay:
    RT-PCR based method(s) was employed with proper combination of primers, preferably covering multiple regions of the bicistronic mRNA
    5'-RACE method
    RNase protection assay quantitatively confirmed integrity (at the best with multiple probes covering more regions) of the transcript
    direct bicistronic RNA transfection demonstrated IRES activity
    Dual-plasmid setup was employed to express the bicistronic mRNA. The first plasmid could regulate expression of bicistronic mRNA from the second plasmid. The inducibility of the expression (and therefore of the observed IRES activity) conveniently confirms that there are no cryptic promoters in the second, bicistronic plasmid containing the putative IRES in the intercistronic region.
    Similarly, promoter-less plasmid can be used to test for presence of a cryptic promoters in the putative IRES sequence.

  2. putative_IRES
    When any doubts about integrity of the bicistronic mRNAs exist (cryptic promoter, splicing, cleavage or degradation issues). For example, when the integrity of the transcripts has only been "studied" by Northern blot. However, the IRES has been functional in some in vivo/in vitro studies based on bicistronic constructs.
  3. possibly_not_IRES
    IRESs which we are not convinced about.
  4. disproved_IRES
    Disproved IRESs.

It is clear one has to accommodate also additional experimental evidence like presence of the respective mRNA in polyribosomal fraction during e.g. poliovirus infection or other stresses, probably also effects of the cap-analogs and rapamycin on the IRES activity, effect of the presence of strong hairpin structures in certain mRNA regions, etc.


Determination of IRES boundaries
Not surprisingly, flanking regions around IRES modulate its function (Belsham, 1992; Hunt et al., 1993; Ohlmann et al., 1999). Typically, IRES elements are experimentally point-mutated or shortened on sides or some internal regions/loops are deleted and their effect on the IRES activity is evaluated. This is very valuable type of information which we want to record in a comfortable way. Therefore, even multiple partially overlapping regions can be annotated simultaneously as functional or defective IRES segment.
Needless to say, there is no clear border between functional and defective IRES segments. We doubt one could ever come up with a clear rule to say that e.g. IRESs with activity of at least 85% of the positive control are considered as functional while segments displaying activities below 85% should be tagged as defective IRES (not only it depends on the respective IRES used as the positive control but also on the surrounding sequences of IRES). However, when a putative IRES sequence has been systematically shortened by deletion sometimes there is a sharp drop in the IRES activity after certain deletion and in such cases it is rather intuitive to distinguish between the functional and defective variant. Still, there is a problem is that discontinuous IRES regions are sometimes reported and authors refer to modular IRESs. So far we do not see a universal solution to this problem.

Authenticity of the data
We are aware of the risk we might introduce errors when reconstructing the sequence data and annotating the region transcribed into mRNA. Therefore, we always ask authors for their sequence data and when we receive them we verify that they do match the description in "Materials and Methods" in the respective publication. Any discrepancies are either resolved by mutual discussion or highlighted in Remarks. When we assemble the records (incl. sequences) ourselves we still do consult the resulting records with original authors so that any potential misinterpretations can be unleashed.

Keeping track of publications being processed
The process of manual extraction of the primary experimental data and assembly of primary sequences is very time consuming and is often interrupted for various reasons. To organize the partially processed data effectively we use Bugzilla software to keep track of such open issues. Bugzilla is also used to keep track of pending software and database schema bugs and improvement proposals. It is not publicly accessible.

Cited literature
Belsham G.(1992) Dual initiation sites of protein synthesis on foot-and-mouth disease virus RNA are selected following internal entry and scanning of ribosomes in vivo. EMBO J. 11: 1106-1110
Han B., Zhang J.-T. (2002) Regulation of Gene Expression by Internal Ribosome Entry Sites or Cryptic Promoters: the eIF4G Story. Mol. Cell. Biol. 22: 7372-7384
Hunt S. L., Kaminski A., Jackson R. J. (1993). The influence of viral coding sequences on the efficiency of internal initiation of translation of cardiovirus RNAs. Virology 197:801-807
Kozak M. (2001a) New ways of initiating translation in eukaryotes? Mol. Cell. Biol. 21: 1899-1907
Kozak M. (2001b) New ways of initiating translation in eukaryotes? Authors reply. Mol. Cell. Biol. 21: 8241-8246
Kozak M. (2003) Alternative ways to think about mRNA sequences and proteins that appear to promote internal initiation of translation. Gene 318: 1-23
Kozak M. (2005) A second look at cellular mRNA sequences said to function as internal ribosome entry sites. Nucleic Acids Res. 33: 6593-6602
Nishiyama T., Yamamoto H., Shibuya N., Hatakeyama Y., Hachimori Y., Uchiumi T., Nakashima N. (2003) Structural elements in the internal ribosome entry site of Plautia stali intestine virus responsible for binding with ribosomes. Nucleic Acids Res. 31: 2434-2442
Ohlmann T., Jackson R. J.(1999) The properties of chimeric picornavirus IRESes show that discrimination between internal translation initation sites is influenced by the identity of the IRES and not just the context of the AUG codon. RNA 5: 764-778
Schneider R. et al. (2001) New ways of initiating translation in eukaryotes? Letter to the editor. Mol. Cell. Biol. 21: 8238-8241
Shibuya N., Nishiyama T., Kanamori Y., Saito H., Nakashima N. (2003) Conditional Rather than Absolute Requirements of the Capsid Coding Sequence for Initiation of Methionine-Independent Translation in Plautia stali Intestine Virus. J. Virol. 77: 12002-12010
van Eden M. E., Byrd M. P., Sherrill K. W., Lloyd R. E. (2004) Demonstrating internal ribosome entry sites in eukaryotic mRNAs using stringent RNA test procedures. RNA 10: 720-730

Last change to the database: 2015-04-16 16:45:23 GMT+1