Thoughts on the typing system

Adding a good typing system could substantially increase the usefulness of CellML 1.2 over CellML 1.1, but there are lots of design decisions to be made. This document is a way to collect some thoughts on how this should be done.

Should CellML have a typing system?

Points for:

Can express repetitive systems of equations using matrix notation instead of using a large number of individual equations. This allows for a greater mathematical efficiency.
Can potentially define 'open world sets' where we don't enumerate all the set members, but rather, specify that certain dependent variables are members of the set. This makes it much easier to define concepts such as the relationship between fluxes and the rate of change of concentration, by defining a sum over a set, and then throughout the model, adding variables to the set.
Can use structured types of dynamic size to represent the creation and destruction of entities in the model (for example, the construction of new cellular compartments or the division of cells over time).
Can have objects of functional types, which can be used to avoid repeating mathematics more than once (especially when we are representing papers which define functions and use them in several places).
Can share functions, rather than variables (which may implicitly be dependent variables which are functions of independent variables) across interfaces.
Can potentially have types which are functions of types, in order to create dependent types (and we could then define functions which take a type and values of that type) to create an effect similar to, say, templates in C++. Using this could make it easy to define re-usable components which work properly with units checking, even when we don't know the exact units and only the relationships between them.
Our current units checking system could be considered as a special case of typing (for real numbers), where the mathematical operators can be thought of as a functions for which the output type depends on the input type.
We could define the specification so dependent variables are special short-hand for functions which change with respect to the independent variables. Making this choice explicit would resolve some the issues we currently have with the interpretation of systems of PDEs, and would also solve the problem of how to represent delayed variables.

Points against:

Processing structured types could require a lot more complexity in processing software (mitigating factor: secondary specifications can limit the acceptable types for particular problem types where, say, structured types aren't needed).
Open-world set membership statements are deliberately ambiguous - there is more than one acceptable interpretation of a model using such notation, but we are assuming that if we don't have information to say that a set contains a given member, it doesn't contain that member (this probably isn't of practical significance, and could be considered on par with other approximations tools make, such as in numerical solvers).
The existing units system won't play nicely with a typing system, because units are really a specific facet of the type of real numbers. This means that we couldn't, for example, import units from a CellML 1.1 model into a CellML 1.2 model, because they would be incompatible (although cross-version imports are a grey area under the current draft which needs to be resolved anyway).
Adding a level of generality on top of the existing units system will most likely end up imposing at least a small burden of verboseness. This is especially true if we move away from our real number specific units system to something based on content MathML, in which things like operator applications to derive units are far more verbose than using attributes to specify how one unit is derived from another.
We could have situations where you want to solve some indices into a variable (e.g. a vector), and clamp other parts (mitigating factor: perhaps we could ban this either in the specification or in some of the secondary specifications).

Options for implementing a typing system in CellML

Built-in dictionary of types

Have a built in set of structured types like vector_of_reals.
If you want something that is not in that particular set, then you have to wait for the next version of CellML.
The interaction of units and types is solved because every member of a structured set is a real number (or has some close relationship to a real number) and so only one type attribute is needed.
If we ever extend beyond built-in dictionaries of types, we can keep the 'built-in' types, and create some additional types which are not built in.

Custom XML based system for deriving types

A simple hierarchical system very similar to how units work, in which structured types can be derived from other types.
We would probably want to have at least reals as the base type, and derivations for structures (composition of one or more potentially different types indexed by name), arrays (fixed number of repetitions of the same type, indexed by number), vectors (variable number of repetitions of the same type, indexed by number), and sets (variable number of objects of the same type, order isn't significant).
Complex numbers could be structures with two real members, or they could be arrays, or we could build them in.
Matrices could be represented as vectors of vectors. The only problem with this is it doesn't imply that every row has the same number of columns, and it might be more cumbersome to represent. It also means that vectors of vectors are specifically identified as matrices when using operators which work on matrices. For these reasons, providing allowing for matrix-of-type derivations might be useful. Also, MathML provides matrix constructors, so we wouldn't necessarily want these to create objects of type vector-of-vector.
Representing lambda functions would be very useful, and we could provide a type for functions from one type to another (or alternatively, rather than storing lambdas in variables, we could do what SBML did, and create new types of elements for defining functions).
References to variables would be very useful - the ability to take a variable reference and put it on a set, and then define operations like summation over the membership of that set. Another way of viewing this might be that we need a constructor which labels a particular value with a tag, so that it is not considered equal to any other value which is otherwise equal except for a different tag.
There are many subsets of the real numbers which could be useful, such as the integers, positive, non-positive, non-negative, or negative integers, primes, and so on. However, it is not clear that this information is best represented as a type; it could be viewed more as a constraint on which real values are legitimate (of course, the same argument could be made for making real numbers a special case of complex numbers limited by constraint, and the fact that this is less palatable does potentially indicate a design bias towards domains which primarily work with real numbers. However, units fit most naturally on real numbers, so this approach probably makes at least some sense).
Unions (where something can be of one or more of several types) and any types can potentially make processing more difficult, and I am not aware of a use case where they are actually required (aside from use cases where parametric types would be better).
The interplay of types and units is quite complex - a structure could have real numbers of different types. The desired semantics are really that real numbers with different units are different types, so for example, real_metre and real_second are different types. Operators like times automatically change the units, so for example is x has type structure { real_metres_per_second speed; real_second period }, then times(getMember(x, speed) * getMember(x, period)) would have type real_metre.
We would need to assume that units don't change across vectors and matrices. This is probably a good assumption for many of the common uses of matrices in modelling, but clearly there are cases where this might be a problem (especially if a system involving different types of variables is written using a matrix equation).
We would need to define new operators, for example, for indexing into a structure, vector, or array and pulling out values, or for composing values into a structure (MathML already has matrix and vector constructors).
We would need to make the units attribute on constants and variables either obsolete, or a short-hand for being a real number of some particular type, and add new mutually exclusive attributes for the type.
This system would not allow for features like parameterised types, and so the reusability of components defined using this system might be questionable.
Imports would need to be extended to allow for types to be imported. This could either replace or supplement the imports of units.

Lambda calculus inspired typing system

The current units system could be replaced with a type aware lambda-calculus inspired system.
Types and units (which are treated as a specific type of real number) would, as far as CellML is concerned, be first class mathematical objects, alongside real numbers, vectors of reals, and the like.
There would no longer be any units element, only components, and units would be a particular type of real type, and would be stored in variables inside components, which could be connected and imported, as required.
Types would in turn have types; for simplicity we could just have a single type t0 for all immediately usable types, another called t1 for single parameterised types (that is, a function which returns a type of type t0), t2 for functions which return a type of type t1, or in other words functions which return a function which returns a type of type t0, or in other words a doubly parameterised type, and so on. These types would of course need to be built in to bootstrap the system (along with types like real).
We would have to define new MathML operators for deriving units, for example one similar to the pi operator in lambda calculus which makes the type of a function taking a parameter of one type and returning a type, as well as for constructing composition / structure types, vector / matrix / array types. We would also need things like fetching a type from a value, and constructing structures (in addition to the constructor for the structure type).
Imports would be simplified, because the only thing that could be imported would be components.
This could potentially be very verbose - MathML's apply syntax is longer than our current units definition work.
Exactly what can be done with units would probably end up being quite restricted in secondary specifications to make implementation more feasible. However, I expect that some of the more advanced secondary specifications could still be significantly better than CellML 1.1 in terms of model representation ability.