CellML primer
Outlined below is a brief description of the makeup of the CellML language. This tutorial assumes a basic knowledge of XML. For more information about XML, please see W3Schools XML primer.
For a more detailed description of the CellML language, please see the CellML FAQ and the CellML specifications. If you have any questions or comments regarding translating models into CellML, or this document in general, please do not hesitate to contact the CellML community through the general discussion mailing list.
What can CellML describe?
CellML is primarily designed to describe mathematical models of cellular biological function, but is not domain specific; it can be used to describe models from essentially any field, or biological models at smaller or larger spatial scales. From a mathematical point of view, CellML 1.1 is primarily used to describe models using real numbers, ordinary differential equations (ODEs), differential algebraic equations (DAEs), and simple linear algebra.
These capabilites are particularly suited to lumped parameter models and descriptions of reaction kinetics and interactions between biomacromolecules, small molecules, etc. As such, CellML is well suited to describe models of a wide range of physiological processes, including metabolism, electrophysiology, signal transduction, cell division, immunology, muscle contraction etc. A very useful document on modelling mass-action kinetics, as well as taking advantage of the modular capabilities of CellML, is the chapter A Primer on Modular Mass-Action Modelling with CellML by Mike Cooling.
For more information regarding the capabilities and competencies of CellML, the reader is referred to the publication "CellML and associated tools and techniques" by Garny et al.
CellML elements
Because CellML is a declarative language, as opposed to procedural languages such as C or MatLab, the order in which elements are defined does not affect how the model is processed. Below we present a brief introduction to the elements that make up the CellML language.
Variables and components
A component in a CellML model is a functional unit that may correspond to a physical compartment, event, or species, or it may be just a convenient modelling abstraction. A component contains variables and mathematical relationships that manipulate those variables. The following CellML fragment defines the environment component and the variable time.
<component name="environment">
<variable name="time" public_interface="out" units="second"/>
</component>
Components are given a name
attribute, while variables are given a name
, an interface, an initial_value
and units
. The interface
and units
attributes will be discussed in subsequent sections.
Math elements and equations
Mathematical equations are expressed using MathML 2.0, an XML-based language which is embedded within the CellML framework. MathML is a verbose representation of mathematics, and can look complex, but most CellML users use a tool to convert equations in a regular format into MathML. All mathematical expressions defined using MathML must be placed inside a <mathml:math>
element and any variables used in an equation must be named within a <mathml:ci>
element. Similarly, all numbers used in an equation must be named within a <mathml:cn>
element and they must have units associated with them. These features are illustrated in the equation below.
Often it is useful to be able to identify a particular mathematical equation with a reference ID. For example, this ID could be the original equation number taken from the published paper to designate the same equation in the code. Alternatively, an equation ID could also be helpful during the process of model validation, allowing the quick identification of a possible error in the code. IDs are assigned to math elements rather than to the equations themselves (as shown in the fragment of CellML code below); it can therefore be useful to allocate one math element per equation to allow equations to be identified by their ID.
<math id="1" xmlns="http://www.w3.org/1998/Math/MathML">
<apply><eq />
<ci> C </ci>
<apply><plus />
<ci> A </ci>
<ci> B </ci>
<cn cellml:units="second"> 20.0 </cn>
</apply>
</apply>
</math>
The above example shows the equation C = A + B + 20 [seconds] as represented in MathML.
Grouping
CellML components can be organised into groups. There are two predefined types of grouping in CellML: encapsulation and containment.
Containment is used to describe the physical or geometric organisation of a model, such as biological structure. This type of grouping specifies that components are physically nested within their parent component, for example an ion channel may be physically embedded within a membrane.
Encapsulation allows the modeller to hide a complex network of components from the rest of the model and provides a single component as an interface to the hidden network. Encapsulation effectively divides the network into layers, where connections between the layers must only be made through the interface components. For example, electrophysiological models often have activation and inactivation gates encapsulated within ion channels. It is useful to use encapsulation in this instance because gate properties are specific to individual channels.
Connections and interfaces
Connections provide the mechanism for mapping variables declared within one component to variables in another component, allowing information to be exchanged between the various components in the network. The mapping of variables involves the transfer of a variable's value from one component to another, a process which may involve a conversion to ensure the units match.
The complete set of variable mappings between any two components constitutes a connection. Only one connection may be created between any given pair of components in a model. Each connection references the two components involved in the connection, and then maps variables from each of the components together. The interface attributes of each pair of variables must be compatible — an "
out
"
variable in one component's interface must map to an "
in
"
variable in the other component's interface. The direction of each mapping is determined by the value of the public_interface
and private_interface
attributes on the two variables: the value is always passed from the variable with an interface value of "
out
"
to the variable with an interface value of "
in
"
. The value of a variable declared with an interface value of "
out
"
may be passed out to any number of variables in other components declared with interface values of "
in
"
. The component to which a variable belongs is found by tracing the variable back from "
in
"
to "
out
"
interfaces, following the model's connections.
<connection>
<map_components component_1="membrane" component_2="sodium_current"/>
<map_variables variable_1="V" variable_2="V"/>
<map_variables variable_1="i_Na" variable_2="i_Na"/>
</connection>
Above we show an example CellML connection. The components 'membrane' and 'sodium_current' are connected, and the variables V and i_Na are mapped between them. Note that the connection element itself does not imply direction - this information is defined by the variable interfaces.
Units
CellML requires that all variables and numbers in a model are associated with a defined unit, and all units used in a model must be declared under units elements. The majority of these are based on the International System of Units (SI) although some non-SI units that are particularly common in biological systems are also provided. Additional units can be defined as complexes and variations of SI units.
<units name="s">
<unit units="second" />
</units>
<units name="nM">
<unit prefix="nano" units="mole" />
<unit units="litre" exponent="-1" />
</units>
<units name="flux">
<unit units="nM" />
<unit units="s" exponent="-1" />
</units>
Note, although this method of defining a unit may appear verbose, the power of CellML is that it is precise and avoids the possibility of ambiguity. The model authoring tool would not normally expose the modeller to the raw CellML code, so much of this complexity and verbosity would be hidden.
The reaction element
CellML contains a Reaction element which is used to describe individual reaction steps in a pathway. This includes a description of the reaction kinetics, the reactants, products and any enzyme catalysts or inhibitors. However, in practice we find implementing the reaction element in a CellML model often requires the equations to be re-written, such that they no longer reflect those in the original publication. At best, this creates extra work and some confusion, at worst it breaks the model. Consequently, usage of the reaction elements is currently discouraged and it has been proposed that they be removed from the next version of the CellML specification.
CellML 1.1 and imports
The primary difference between CellML 1.0 and 1.1 is the addition of the ability to import components from separate files. This feature promotes reusability of models and components and allows CellML models to be incorporated into hierarchical frameworks. For example, a complex model of a cardiac myocyte may be constructed by importing many individual models, each describing a particular process: metabolism, electrophysiology, the contractile apparatus, adrenergic signalling, etc. The use of imports eliminates the requirement for vast, monolithic models constructed by assimilating multiple models, and allows duplication of imported modules and multi-tiered hierarchies.
Metadata
Metadata provides a context for the document and can include: the name of the model author, the date the model was created, key words related to its content (which later facilitate the process of searching for models in the repository), and information about the published paper the model was taken from (the citation). Although metadata is not required for a CellML document to be valid, we strongly recommend including all of the above.
Metadata can also be added to models to describe entities and components within the model, as described in the CellML Metadata Specification . This kind of metadata is used for annotation to provide context and information about the processes and entities being described by the model. For example, a variable called "a" may in fact represent a protein kinase enzyme; this information can be represented in the CellML file by annotating the variable with metadata. Alternatively, metadata can be used to add information to the model that may be useful for simulation, such as optimal integration parameters.