Proposal: Best current practice for the top-level mathematics operator
Status of this document
This document is a proposed Best Current Practice document. If it attains wider support, it will be moved out of Members and into specifications, and will become a 'Best Current Practice' document.Abstract
The CellML specifications version 1.0 and version 1.1 allow for CellML models to contain any MathML which is allowed by the content MathML specification, and recommends that documents only use a limited subset of this, for interoperability purposes. However, it does not provide a semantic meaning for the expressions, address the fact that tools need to interpret any expressions found in the model in order to determine the actual procedure by which the model is validated.This document attempts to provide guidance for the authors of CellML models and processing tools, with the intention of allowing the widest variety of CellML model use.
The document will firstly provide a general semantic interpretation of expressions found in MathML models. This provides a firm interpretation for CellML models, and provides an underlying basis on which the remainder of the document can rest.
Because of the wide range of uses to which CellML could potentially be put, there will necessarily be an unlimited number of forms which the top-level apply operator can take, but many classes of CellML processing tool will need to recognise classes of mathematics and interpret them. This document addresses this by providing a number of 'patterns' in expressions which CellML processing tools can recognise, and provides some guidance on exactly what these patterns mean, and how they can be converted into steps for the evaluation of the model.
It will never be possible to define all possible patterns. Instead, authors should attempt to define their model using the patterns identified as being the most portable, and if not possible, should move onto less portable constructs. Where a model author finds that they cannot express their model using the available patterns, the author may propose a new pattern, and a corresponding interpretation of that pattern. All new patterns should be consistent with the CellML specification (and hence should be valid MathML), and should also be consistent with the semantic interpretation described in this document.
A semantic interpretation for top-level expressions in CellML
Note: within this document, an expression is defined as a content MathML element and all descendants of that element. A top-level expression is an expression which has, as its parent element, a MathML math element (whether that MathML math element has as its parent a component or a reaction element).A top-level expression shall be interpreted as a declaration that the value of the top-level expression is unconditionally equal to the boolean value true within the model.
Authors of models should be mindful that their model could be used by tools in any way consistent with this semantic interpretation. They should not assume that tools will use a particular pattern defined below.
Rules for developing and interpreting patterns
- A pattern should be expressed in an XML-like syntax.
- Whitespace should be used in the pattern to maximise readability. Whitespace does not affect the meaning of the pattern.
- The following elements have special meaning:
- <expression/> means any valid content MathML expression.
- <boolean/> means any valid content MathML expression which will always evaluate to an element of the set {true, false}, where true and false are the boolean truth values.
- <pattern match="some pattern reference" /> means a match of another pattern defined in this document.
- <variablereference/> means a reference, using the MathML <ci> element, and following the rules of the MathML and CellML specifications, to a variable in a CellML component.
- Any element which does not have a special meaning shall be interpreted as being the corresponding MathML element.
- id attributes appearing in the pattern are used to reference parts of a pattern match in any text describing the pattern, and are not required to be present in any documents using the pattern.
Note that despite the phrase 'any valid content MathML expression', authors should still take note of the interoperability recommendations in CellML, and in particular, the recommendation that they choose content MathML elements from the CellML subset.
Pattern 1: A variable equals an expression
Model authors who use the following pattern will find that their model is interpreted by the largest number of tools (at least from the tools publicly available at the time this was written). All examples in the CellML specification match this pattern.Model authors should note that some of the tools which support pattern 1 do so by supporting a more general pattern, such as pattern 2 or pattern 3. Model authors should therefore not make any assumptions about the specific steps taken to evaluate a model.
Pattern 1 matches may be divided into two classes:
Class A: Variable computations
<apply>Class B: Rate computations
<eq/>
<variablereference id="assigninto"/>
<expression id="assignfrom"/>
</apply>
<apply>The order of the ratevar and bound may be interchanged without changing the interpretation of the pattern.
<eq/>
<apply>
<diff/>
<variablereference id="ratevar"/>
<bvar id="bound"><variablereference/></bvar>
</apply>
<expression id="ratefrom"/>
</apply>
The authors of CellML tools are encouraged to use a more general pattern, such as pattern 3, which also matches this pattern, as implementing these more general patterns will also allow CellML models to be used more flexibly.
Tools supporting only Pattern 1 will generally produce procedural steps which compute assigninto by evaluating the expression assignfrom. This does limit the flexibility of the model within that tool, as it may be that assigninto is known, and one of the variables used to compute assignfrom is unknown.
Even for pattern 1 only tools, the order of top-level expressions within the CellML model is not significant. Tools are still expected to determine the correct order in which expressions are calculated. For pattern 1 only tools, this can be done efficiently by representing all class A equations as nodes in a directed acyclic graph (DAG), with arcs running from expressions which have variable v as assigninto, to expressions which involve variable v in expression assignfrom. The ordering of calculations is then determined by a standard topological ordering algorithm.
Pattern 2: Bi-directional equalities with a variable on one side
A match to class A or class B is also a match to the corresponding class in pattern 2.In addition, a match to the following form is a class A match to pattern 2:
<apply>
<eq/>
<expression id="assignfrom"/>
<variablereference id="assigninto"/>
</apply>
A match to the following is a class B match to pattern 2:
<apply>The order of the bound and ratevar elements may be interchanged without changing the meaning of the expression.
<eq/>
<expression id="ratefrom"/>
<apply>
<diff/>
<variablereference id="ratevar"/>
<bvar id="bound"><variablereference/></bvar>
</apply>
</apply>
In summary, the difference between pattern 1 and pattern 2 is that pattern 2 makes the ordering of arguments to the eq operator unimportant.
If an expression matching the pattern:<apply>
<eq/>
<variablereference id="a-var"/>
<variablereference id="b-var"/>
</apply>
is given to a pattern 2 based tool, the tool has two possible interpretations: a-var is assignfrom and b-var is assignto, or vice versa. While this flexibility provides an additional burden on the tool (as the more efficient DAG topological ordering approach is no longer possible), it also means that the mathematics can remain useful after changes. This flexibility is further expanded by a match to pattern 3.
Pattern 3: General equalities
The following pattern rule is used for pattern 3:<apply>
<eq/>
<expression id="expr1">
<expression id="expr2">
</apply>
Pattern three is a more general case of patterns 1 and 2. It allows arbitrary expressions on either side of the equals signs. Some guidelines are provided:
- Tools are encouraged to compute variables by the most efficient means possible. Where a tool needs to compute a variable, and that variable can be resolved as assignto from a patten 2 match (or a rate, which can be resolved to ratevar), the tool should compute the variable by evaluating the expression on the other side of the equation, and assigning into the variable.
- Tools may perform symbolic algebra on expressions. Tools which do this must ensure that the manipulations are valid for all real numbers, and are not invalid for certain special values, such as zero, unless the tool has other information which indicates that value will never occur. However, model authors should not rely on any non-real values (such as infinity, -infinity, and not a number) to work in a particular way in a tool supporting patten 3.
- Tools may also perform a linear or non-linear solve (either univariate over a single top-level expression, or multivariate over multiple top-level expressions) as required to compute values. Model authors should be aware that many tools may not support multivariate solves, and even if they do, they may be prohibitively expensive. They should therefore try to define their models so that multivariate solves of systems of equations are not required. This is particularly true for non-linear systems of equations, but as many tools do not have linear solvers, authors are advised to convert their systems into 'triangular' forms so that tools can evaluate expressions as a series of equations with a single unknown.
- Model authors should note that not all tools support the case where there is more than one expression involving the same derivative, and even in tools that do support this, the tool will need to choose one expression to use to solve for the derivative. The choice made by the tool might not be the best one for computational speed or numerical stability point of view, so authors are advised to define a single variable to contain the rate, and assign this into the rate variable.
Pattern 4: Rates with reset rules
The following pattern is not supported, at the time of writing, by any tools. However, this pattern allows 'reset conditions', and other similar modelling constructs, which cannot be reliably expressed using any of the earlier patterns. It should be noted that tools which support this would normally also support one of the earlier patterns.The following pattern rule applies:
<apply><or/>
<apply><eq/>
<apply><diff/>
<variablereference id="ratevar"/>
<bvar id="bound"><variablereference/></bvar>
</apply>
<expression id="rateexpression"/>
</apply>
<apply id="resetrule"><and/>
<apply id="resetaction"><eq/>
<variablereference id="same-as-ratevar"/>
<expression id="resetvalue"/>
</apply>
<boolean id="condition"/>
</apply>
</apply>
The apply element resetrule may be repeated an arbitrary number of times within the same or application (provided at least one resetrule apply is present).
The apply element resetaction may be repeated an arbitrary number of time within the same and application (provided at least one resetaction apply is present).
The variable referred to by ratevar must refer to the same variable as same-as-ratevar.
The ratevar and bound elements may be interchanged without changing the interpretation of the pattern match.This form is used to define that normally, the rate of change of ratevar (with respect to the bound variable) is the value of evaluating rateexpression. However, when condition is true, the variable ratevar (not its rate) is clamped to resetvalue.
The following guidelines are provided:
- Model authors are advised to choose a condition which is true for a finite period of time. Choosing a condition which is only true at a certain point in the model is likely to result in the condition being skipped over. The author may need to also provide simulation parameters required for the model to reproduce the correct results.
- Model authors should not use this pattern solely to express the initial values of state variables. Instead, they should use the initial_value attribute defined in the CellML specification.
- Tool developers may be able to achieve better results for models using the pattern by setting the default step size or maximum step size to a smaller value. They might also be able to improve results by reducing the step size or maximum step size once a condition has evaluated to true.
- In some cases, it might be possible for tool developers to identify special values of the bound variable over which the model is running, at which the condition is likely to change, and ensure that a step ends exactly on that point. However, model authors should note guideline 1, and should not rely on tools to implement this behaviour.