CellML Model Repository Overview
The History Behind the CellML Model Repository
The CellML model repository has its origins as a distribution site for models encoded in CellML to provide examples on the use of CellML as well as to test the features of the language as it developed. It then progressed into a repository of previously published biological models that had been encoded into CellML, a state in which the repository existed for several years. These models had all been encoded into CellML based on the literature, although the models were not able to be tested for consistency or completeness at the time. To curate a model used to mean simply checking that the mathematics from the publication was accurately represented in the CellML. As tools have evolved we are now in a position to begin validating the models in the repository with respect to their function and output as well.
Model Naming and the Structure of the CellML Model Repository
When a CellML model is added to the model repository, it is automatically assigned a name which is based on the author names and the date of the published paper it was taken from.
Model Naming: Versions
If a CellML model is subsequently modified, the new updated version(s) are added to the repository and they are automatically allocated a new version number. By clicking on a model name in the main repository list, by default you are navigated to the most recent version of that CellML model.
Model Naming: Variants
Where a single publication contains more than one model, these are listed under the same piece of documentation, but are referred to as variants - for example, a single published paper may contain two models: one specific to an endocardial cell, whilst the other has parameters and equations which are specific for an epicardial cell.
Note: In the future we intend to allow the model name to be specified by the model author, which will hopefully allow a more meaningful, human-readable name to be assigned to a CellML model.
Repository Structure
The models in the repository are listed in alphabetical order by the first author's surname. It is also possible to find a model based on key word searches. Alternatively, the models can also be filtered by their type or curation status.
Storing CellML 1.1 Models
Currently
(23rd July 2008) the CellML Model Repository has a flat structure and it is not
designed to handle CellML 1.1 models. Many of these limitations and
problems will be resolved with the implementation of PMR2. Until
then, we are choosing not to focus on the translation of the CellML
1.0 models into a CellML 1.1. format.
However, for reasons of publication and model sharing, there are a small number of CellML 1.1 models that have been made available in the CellML model repository. This has raised a few issues with regards to assigning a curation status to such a model. Such issues arise because these CellML 1.1 models are not stored in the actual CellML model repository, rather they are uploaded into their own folder with all dependencies (http://www.cellml.org/models_1.1/)
Potential Problems:
A model that is not in the actual repository is not subject to the same curation workflow, so a model might be changed without the curation status being updated.
Also, the model may not have a permanent path - we need to make sure that we keep the paths of these models valid for as long as feasible, since links to some of them have been given as references in publications.
Temporary Solutions:
The models can be assigned a curation status on the condition that they should not be modified after they have been uploaded into the CellML 1.1 folder. This shouldn't be an issue with three of the models currently in the folder (Cooling et al., Pandit-Hinch-Nieder, and Faville et al.), as they are the original code which was used to generate the figures for the respective publications – so we are confident they are capable of recreting the published results (and therefore shouldn't require further curation).
The fourth model (Guyton) is being modified by Jonna, but this model has not been loaded into the CellML Model Repository in it's CellML 1.1 format – so it's OK for Jonna to edit the code as she likes.
All paths of CellML 1.1 models will be kept for posterity once they have been created – so if the models are referenced in a paper this link will be retained.
Currently access rights will not be restricted to curation staff only, because the number of CellML 1.1 models in this folder is low. However, if this situation changes, this policy will be revisited.
CellML Model Curation: the Theory
The basic measure of curation in a CellML model is described by the curation level of the model document. We have defined four levels of curation:- Level 0: not curated;
- Level 1: the CellML model is consistent with the mathematics in the original published paper;
- Level 2: the CellML models has been checked for (i) typographical errors, (ii) consistency of units, (iii) that all parameters and initial conditions are defined, (iv) that the model is not over-constrained, in the sense that it contains equations or initial values which are either redundant or inconsistent, and (v) that running the model in an appropriate simulation environment reproduces the results published in the original paper;
- Level 3: the model is checked for the extent to which it satisfies physical constraints such as conservation of mass, momentum, charge, etc. This level of curation needs to be conducted by specialised domain experts.
CellML Model Curation: the Practice
Of the ~300 models that are currently in the CellML repository (April 2008), approximately half have been curated to some degree.
Our ultimate aim is to complete the curation of all the models in the repository, ideally to the level that they replicate the results in the published paper (level 2 curation status). However, we acknowledge that for some models this will not be possible. Missing parameters and equations are just one limitation; at this point it should also be emphasised that the process of curation is not just about "fixing the CellML model" so that it runs in currently available tools. Occasionally it is possible for a model to be expressed in valid CellML, but not yet able to be solved by CellML tools. An example is the seminal Saucerman et al. 2003 model, which contains ODEs as well as a set of non-linear algebraic equations which need to be solved simultaneously. The developers of the CellML editing and simulation environment PCEnv are currently working on addressing these requirements.
The following steps describe the process of curating a CellML model:
Step 1: the model is run through PCEnv and COR. COR in particular is a useful validation tool. It renders the MathML in a human readable format making it much easier to identify any typographical errors in the model equations. COR also provides a comprehensive error messaging system which identifies typographical errors, missing equations and parameters, and any redundancy in the model such as duplicated variables or connections. Once these errors are fixed, and assuming the model is now complete, we compare the CellML model equations with those in the published paper, and if they match, the CellML model is awarded a single star - or level 1 curation status.
Step 2: Assuming the model is able to run in PCEnv and COR, we then go onto compare the CellML model simulation output from COR and PCEnv with the published results. This is often a case of comparing the graphical outputs of the model with the figures in the published paper, and is currently a qualitative process. If the simulation results from the CellML model and the original model match, the CellML model is awarded a second star - or level 2 curation status.
Step 3: if, at the end of this process, the CellML model is still missing parameters or equations, or we are unable to match the simulation results with the published paper, we seek help from the original model author. Where possible, we try to obtain the original model code, and this often plays an invaluable role in fixing the CellML model.
Step 4: Sometimes we have been able to engage the original model author further, such that they take over the responsibility of curating the CellML model themselves. Such models include those published by Mike Cooling and Franc Sachse. In these instances the CellML model is awarded a third star - or level 3 curation status. While this is laudable, ideally we would like to take the curation process one step further, such that level 3 curation should be performed by a domain expert who is not the author of the original publication (i.e., peer review). This expert would then check the CellML model meets the appropriate constraints and expectations for a particular type of model.
A point to note is that levels 1 and 2 of the CellML model curation status may be mutually exclusive - in our experience, it is rare for a paper describing a model to contain no typographical errors or omissions. In this situation, Version 1 of a CellML model usually satisfies curation level 1 in that it reflects the model as it is described in the publication - errors included, while subsequent versions of the CellML model break the requirements for meeting level 1 curation in order to meet the standards of level 2.
Taking this idea further, this means that a model with 2 yellow stars doesn't
necessarily meet the requirements of level 1 curation but it does meet the requirements of level 2. Hopefully this conflict will be resolved when we replace the current star system with a more meaningful set of curation annotations.
Ultimately, we would like to encourage the scientific modeling community - including model authors, journals and publishing houses - to publish their models in CellML code in the CellML model repository concurrent with the publication of the printed article. This will eliminate the need for code-to-text-to-code translations and thus avoid many of the errors which are introduced during the translation process.
CellML Model Simulation: the Theory and Practice
As part of the process of model curation, it is important to know what tools were used to simulate (run) the model and how well the model runs in a specific simulation environment. In this case, the theory and the practice are essentially the same thing, and carry out a series of simulation steps which then translate into a confidence level as part of a simulator's metadata for each model. The four confidence levels are defined as:- Level 0: not curated (no stars);
- Level 1: the model loads and runs in the specified simulation environment (1 star);
- Level 2: the model produces results that are qualitatively similar to those previously published for the model (2 stars);
- Level 3: the model has been quantitatively and rigorously verified as producing identical results to the original published model (3 stars).