FAQ - iRefWeb - Wodak Lab

Topics:

How is the MI (MINT-Inspired) score calculated?

The MINT-inspired score (MI score) is a measure of confidence in molecular interactions annotated from literature. To assign MI scores, we closely follow an earlier approach developed by the MINT database team (Ceol et al., 2010) and adopted their confidence score formula.

The idea underlying the score is to collect pieces of evidence from each publication that supports an interaction record. Unlike the MINT team, we do not curate the actual publications and hence are not able to list various figures, figure panels, tables, etc., as separate pieces of evidence. Instead we rely on the annotations of the interaction types and experimental detection methods, provided by the original database curators using the PSI-MI controlled vocabulary. As an illustration, if an interaction was detected using two independent methods, such as "two hybrid pooling approach (MI:0398)" and "anti tag coimmunoprecipitation (MI:0007)", they are treated as two separate piece of evidence. On the other hand, if the listed methods are "two hybrid pooling approach (MI:0398)" and "two hybrid (MI:0018)", they are treated as only one piece of evidence because the former term is a descendant of the latter one in the PSI-MI ontology hierarchy.

In accordance with the MINT approach, we assign the full weight of 1 to pieces of evidence that correspond to the interaction type "direct interaction (MI:0407)" or its descendants. Otherwise we reduce the evidence weight by half. Furthermore, we apply another reduction by half to evidence coming from high-throughput publications. Following the MINT approach, we define a publication as "high-throughput" if it supports 50 or more different interaction records in the iRefWeb. Hence, for example, if the source-database curators have annotated a particular interaction as a "physical association (MI:0915)" supported by a high-throughput experiment, this piece of evidence would contribute only 0.25 (non-direct, high-throughput) towards the overall weighted evidence for that interaction record.

Once the weighted evidence is determined for each interaction record, the score assignment proceeds as follows. For every pair of interacting proteins A and B, we first identify all interactions that contain this pair, including multi-protein complexes. Next, we identify various homologs of A and B in this and other organisms (i.e. both paralogs and orthologs); we then find all interactions containing pairs A' and B', where A' is a homolog of A and B' is a homolog of B. The information on homology is taken from Inparanoid.

To apply the MI score formula, three types of information are collected:

  1. The total number of unique PubMed publications that support the interactions containing A and B.
  2. The cumulative sum of weighted evidence from all interactions containing A and B.
  3. The cumulative sum of weighted evidence from all interologs, i.e. interactions containing homologous pairs A' and B'.

These three values are substituted directly into the MI formula to arrive at a score for the pair of proteins A and B.

For multi-subunit complexes, we determine the MI scores for all possible pairs (i.e. using the so-called matrix expansion of a complex) and then take their median as the overall MI score for the complex.

What does a MI (MINT-Inspired) score of 'n/c' mean?

MI (MINT Inspired) scores are in the range [0,1], are limited to interactions where all its proteins are the same species, supported by valid PubMed IDs, not predicted or from OPHID, not a self-interaction, and are from one of the following organisms:

  • A. thaliana
  • B. taurus
  • C. elegans
  • D. melanogaster
  • H. sapiens
  • M. musculus
  • O. sativa
  • R. norvegicus
  • S. cerevisiae
  • S. pombe

Any interaction that does not meet the above requirements has no score calculated and this is displayed as 'n/c'

Go here to find out how the score is calculated.

What information is represented in each of the columns of the MITAB file?

Go here for a detail description of each column.

What are the legacy identifiers for interactions?

In general we have tried to preserve identifiers across releases. However due to a change in iRefIndex 9, interaction IDs had to be updated. Thus the same interaction in iRefWeb 3.9 and iRefWeb 4.0 (or higher) will have different IDs.

However inputing an old ID in a URL like this
http://wodaklab.org/iRefWeb/interaction/show/OLD_ID
should bring you to
http://wodaklab.org/iRefWeb/interaction/show/NEW_ID
Note not all interactions in iRefIndex 8 still exist in iRefIndex 9. In particular due to changes in how taxonomy was handled for yeast and database changes a large number interactions have changed so many old identifiers will not exist.

If you require, mapping files can be found here.

How can I create links to iRefWeb?

You can use the MITAB file (either downloaded from the search page, or from here) as the basis for generating links to iRefWeb using columns 49, 50, and 51.

column 48
...
column 49
icrogida
column 50
icrogidb
column 51
icrigid
column 52
...
... 1466236 4803728 101 ...

Using the examples above, you can then generate your links as follows:

Page Link Structure
Protein (Interactor) Page http://wodaklab.org/iRefWeb/interactor/show/1466236
http://wodaklab.org/iRefWeb/interactor/show/4803728
Search Results Page for a Protein http://wodaklab.org/iRefWeb/search/index?search.q=act_id:1466236
http://wodaklab.org/iRefWeb/search/index?search.q=act_id:4803728
Interaction Page http://wodaklab.org/iRefWeb/interaction/show/101
Search Results Page for the Interaction http://wodaklab.org/iRefWeb/search/index?search.q=int_id:101

If you have an Entrez Gene ID (for example 10277), you can also try linking this way:
 http://wodaklab.org/iRefWeb/interactor/showForGene/10277
(Note not all proteins in iRefWeb can be mapped to Entrez Gene IDs.)

How do you account for the differences between databases?

There are variety reasons as to why the same paper might be annotated differently by the several source databases. We've examined these variations in a paper Literature curation of protein interactions: measuring agreement across major public databases.

What is a RIGID (redundant interaction group identifier)?

Each distinct protein binary interaction or complex can be assigned a distinct rigid that is calculated using only the primary sequence and taxon identifiers of the participant proteins. If two interaction records share the same rigid, they are said to belong to the same group of redundant interactions; this means that their protein participants all have the same primary sequence and come from the same organism. However, the experiments used to support the interaction may be different in each record. The rigid is an alphanumeric string.

What are NP, LPR, and HPR?

NP, LPR and HPR values can be used to help focus your search on interactions based on their relationship(s) to Pubmed. For example if you want interactions with multiple evidences (Pubmeds) or interactions from (or not from) high-throughput experiments, etc.

NP
Number of Publications:
Number of distinct publications (PubMed identifiers) that support this interaction.
LPR
Lowest PubMed Identifier (PMID) Reuse:
A publication may be used to support more than one interaction. The LPR metric (lowest PubMed Identifier re-use) is the lowest number of unique interactions that are supported by one of the interaction's PMIDs.
HPR
Highest PubMed Identifier (PMID) Reuse:
A publication may be used to support more than one interaction. The HPR metric is the highest number of unique interactions that are supported by one of the interaction's PMIDs.
LPR = 1
At least one of the PMID's supporting the interaction has never been used to support any other interaction and that the interaction is not likely to rely solely on high-throughput data.
LPR < 20
Likely describes an interaction that is supported by a low-throughput study.
LPR >= 20
Likely describes an interaction derived solely from middle-throughput or high-throughput experiments.
HPR = 1
None of the PMID's supporting the interaction has ever been used to support any other interaction and that the interaction has not been detected as part of a high-throughput study.
HPR >= 20
The interaction has been detected as part of a middle-throughput or high-throughput study.

Below is a simplified example to understand how these numbers are derived for a given interaction. A checkmark indicates that an interaction was noted in that pubmed, and x, the interaction was not seen in that pubmed.

Interaction Pubmed 1 Pubmed 2 Pubmed 3 Pubmed 4 NP LPR HPR
interaction A         2 1 3
interaction B         2 3 3
interaction C         3 1 3
interaction D         1 3 3

In the table above, for Interaction C, its NP, LPR and HPR values are determined as shown below:

Total interactions found in each pubmed that include Interaction C
P 2 Ints = | { int A , int B , int C }| = 3
P 3 Ints = |{ int B , int C , int D }| = 3
P 4 Ints = |{ interaction C }| = 1
Calculations for Interaction C
NPint C = |{ Pubmed 2 , Pubmed 3 , Pubmed 4 }| = 3
LPRintC = min( P 2 Ints , P 3 Ints , P 4 Ints ) = min( 3 , 3 , 1 ) = 1
HPRint C = max( P 2 Ints , P 3 Ints , P 4 Ints ) = max( 3 , 3 , 1 ) = 3
iRefWeb 13.0
In collaboration with Ian Donaldson — using iRefIndex Version 13.0
Watch a short video tutorial on how to use iRefWeb here.