Translating a model into CellML

Translating a model into CellML

Currently there are two main tools available for writing CellML models; Cellular Open Resource (COR) (http://cor.physiol.ox.ac.uk/) and Physiome CellML Environment (PCEnv) (http://www.cellml.org/tools/pcenv/). Both of these have their own tutorials and instructions for building a model. Alternatively models can be written by hand in a text editor.

Outlined below is the process of translating a model from a published paper into CellML. Although best practice actions will be highlighted, it should be emphasised CellML is a flexible language, and there are often several different ways, all of which are valid CellML, for expressing a model.

Units

CellML requires that all the variables and numbers that appear in a model have a defined unit. These are based on the International System of Units (SI). Additional non-SI units can be used by expressing them in terms of SI units:

E.g. millilitre: prefix “milli” + unit “litre”

or by defining them as a “base unit”:

e.g. pH: base unit = “yes”

Global variables and the “environment” component

Any model which contains ODEs is going to require a definition of time. Usually this global variable is defined just once in a component called “environment”. Any other components which require time then import it from the environment.

Equations

Mathematical equations are expressed using MathML 2.0, an XML-based language which is embedded within the CellML framework. It has been suggested there should only be one equation per component, but in practice this is often difficult. Due to the large size of some of the models, one equation per component would generate hundreds of connections and unnecessary complexity.

Different types of models; electrophysiological, signal transduction and metabolic pathways, mechanical, etc....

The flexibility of the CellML language means it can be used to describe many different physiological – and even mechanical – processes. Indeed there are examples of single models which describe more than one process – perhaps linking a metabolic pathway with an ATP-dependent ionic current.

Originally CellML contained a reaction element which was used to describe individual reaction steps in a pathway. This included a description of the reaction kinetics, the reactants, products and any enzyme catalysts or inhibitors. However, in practice we found that the use of the reaction element often required the model equations to be re-written, such that they no longer reflected the original published model. The reaction element will be depreciated in CellML1.1.1 and will be removed from the CellML1.2 specification.

Furthermore, in the future we intend to implement the use of an ontology to label model variables as reactants, enzymes, products etc., thereby rendering the reaction element somewhat redundant.

Connections

The mapping of shared variables between components occurs via connections. Currently the CellML1.1 specification allows just one connection between any two components. However, for modelling convenience, the possibility of having multiple connections between two components is currently being considered. Redundancy would be avoided as each variable mapping would only appear once in any single model.

Grouping components and model structure: flat Vs hierarchical

There are two predefined types of grouping in CellML: encapsulation and containment. Encapsulation is a logical type of grouping, or effectively a modelling convenience. Containment is used to describe the physical, or geometric, organisation of a model, such as biological structure. This type of grouping specifies that components are physically nested within their parent component, for example an ion channel may be physically embedded within a membrane.

In practice, all the reaction pathway models, including cell cycle, signal transduction, and metabolic pathway models in the CellML repository are completely flat. That is, they do not include encapsulated components. While encapsulation is perfectly acceptable, we have not found a great need for it in this type of model.

By contrast, the electrophysiological models often have activation and inactivation gates encapsulated within ion channels. It was natural to include this type of encapsulation because gate properties are specific to individual channels, therefore they can be hidden from the rest of the model.

CellML1.0 and CellML1.1

?????

Model Validation

Validation of the CellML models currently in the repository is an ongoing process. In the case of a new model, it is run in both PCEnv and COR, and ideally it is checked by a second person before it is loaded into the repository. The model author is often contacted and the original source code is obtained where available. This resolves the problem of potential type errors in the published paper, or an incomplete data set. One ultimate aim will be to have the model available in CellML as the work is published – eliminating human error in the translation process.

Metadata

These provide a context for the document and can include: the name of the model author, the date the model was created, key words related to its content (which later facilitate the process of searching for models in the repository), and information about the published paper the model was taken from. Although metadata are optional we strongly recommend including all of the above. The process of adding metadata to a model has been made simple through use of a metadata editor which can be viewed when loading a model into the repository: