Stabilisation of XPCOM::IObject

Stabilisation of XPCOM::IObject

One possibility to deal with future incompatibilities between products, at least at the XPCORBA level, is to stabilise the XPCOM::IObject interface. This will not solve all potential incompatibilities, but it will hopefully resolve some of them.

I have looked into the top-level object interfaces used by various languages, to see how they have evolved over time, and what sort of things they support. The idea behind this analysis is to reach a conclusion about what we do and do not need on the top-level objects underlying all projects which use CORBA, including the CellML API and CellML Context (for information on how these fit into the tool roadmap, see http://www.cellml.org/tools/Roadmap/). This top-level interface is part of XPCORBA, but everything in the CellML API ultimately inherits from it. Changing this interface, or making significant changes to the semantics of them, creates significant problems if several incompatible versions are supposed to co-exist in the same browser.

Therefore, at least as one aspect of the approach to resolve these problems, we should do our best to make the interface as stable as possible. It seems that this is possible (the best example being Java. JDK 1.0.2, the oldest for which I can find docs on, was released in May 1996. The java.lang.Object class definition has not changed at least between May 1996 and the release of J2SE 1.5.x, the latest stable version, and probably even longer than that). The reason for the stability is probably so as not to break JNI applications, and some others are apparently less stable.

Obviously, to achieve a level of stability similar to java.lang.Object, we need to make sure we don't miss any functionality off the top level object that we cannot support any other way. Currently XPCOM::IObject looks like this

module XPCOM
{
  typedef string utf8string;
  typedef wstring utf8wstring;

  interface IObject
  {
    oneway void add_ref();
    oneway void release_ref();
    IObject query_interface(in utf8string id);
    long compare(in IObject compareTo);
  };
#pragma flat_name CORBA_IObject
#pragma ID IObject "DCE:00000000-0000-0000-c000-000000000046:1"
};

Obviously, the lack of documentation is one clear problem here, and this lack has created some semantic problems which need to be addressed.

add_ref and release_ref perform standard reference counting, so they are documented to some extent by other documents about memory management.

query_interface has well established semantics from the CellML API, but these semantics need to be documented.

compare is more problematic. It is supposed to return 0 on equality, negative when one object is less than the other, and positive when another is greater than another. A number of parts of the code assume that (a.compare(b) < 0) defines a transitive and anti-symmetric ordering, and that a.compare(a) == 0. This seems to be fine where all objects are on the same system, but more work is needed when the objects reside on different systems. One solution, which has already been employed for mozCmgui, is to ask the ORB to marshall the object reference into an IOR string, and compare these lexicographically. This should produce transitivity in most cases (but not in some unusual cases where an object is being proxied through a third party).

There are a number of additional methods in some other languages which we don't have (from looking at Java, Python, and Ruby):

  • Java, Ruby, Python: Converting an object to a string (Ruby also has a to array, and Python has a to sequence, to mapping, and to number). This could be useful in some cases.
  • Java, Ruby, Python: Getting a hash of the object. This doesn't have to be unique, but collisions should be rare. This is probably a good thing to have on our interface, because it means we can put arbitrary objects into a hashtable. However, providing a unique ID should be sufficient, as the ID can be hashed.
  • Python, Ruby: The ability to set arbitrary attributes on objects. This could be useful, because it means we can annotate arbitrary objects with things specific to the annotator but unanticipated by the original developer. This can be done anyway in O(ln n) time (n number of objects), but supporting it here is the only way to do it in O(1) time (but O(ln k) where k is the number of keys per object).
  • Java, Ruby: The ability to clone arbitrary objects (Ruby also has a shallow only copy). It is not clear that arbitrary objects should be able to be cloned, so I don't think we want this.
  • Java: Synchronisation primitives at the object level. However, they are not widely used by Java programmers, as it is not clear exactly how they are used without clarification for specific interfaces. Therefore, I wouldn't recommend copying this. Instead, we can add them to specific interfaces that need them.
  • Java, Ruby: Ways to check the top-level class (as opposed to whether or not they implement an interface, like query_interface can do). This is not very useful, because part of the idea behind this infrastructure is to separate implementation from interfaces. However, if you need to know the infrastructure for display purposes, the Context provides this anyway.
  • Python: Retrieving documentation strings. If required, this probably would be better stored with the class annotation service in the CellML Context.
  • Ruby: Freezing (preventing changes) and tainting (marking as having come from an untrusted source) objects to support security and prevent changes for algorithmic/programming practice reasons. This would probably be better decided on an object-by-object interface (for example, the CellML Context provides facilities for tagging a model as frozen, to support a model change versioning scheme, although it cannot actually enforce this).
  • Ruby: Getting a unique identifier for an object. This is nearly impossible to do properly in a distributed environment (at least without going through a single synchronised ID allocator, or some sort of hierarchical range allocation system with a single root, which we definitely don't want to force on everyone). However, we can make a good approximation (warning: this creates potential security problems).
  • Ruby: Some additional types of comparison operators, such as === and =~, which do not have well defined meanings except on string objects, so I don't think we should copy.
  • Ruby, Python (and Java on different interfaces which are always supported): Reflection, e.g. listing methods. We could implement this on a different interface generated from the IDL if we really needed it for some application, but most applications will never need it.
  • Ruby: Programmatic response to methods which are not specifically implemented. CORBA DSI can implement much of this, but part of the motivation of defining the PCM was to be simple and avoid this complexity.

A potential standardised interface follows:

/**
 * The XPCOM module represents core infrastructure.
 */
module XPCOM
{
  /**
   * This is used to represent a string of 8-bit bytes making up a UTF8 string.
   */
  typedef string utf8string;

  /**
   * This is used to represent a wider string. On the Mozilla side, this will be UTF16,
   * although it may be UCS4 on other platforms, depending on the compiler and platform.
   */
  typedef wstring utf16string;

  /**
   * The object is the top-level object, from which anything which needs to get passed
   * between modules should inherit.
   */
  interface IObject
  {
    /**
     * Called to indicate that some code is keeping an owning reference to the object.
     * That code must call release_ref() later when it has finished with the object, or
     * a memory leak may result. The object should be preserved in memory. Care should be
     * taken to ensure that a cycle of objects waiting for each other to call release_ref
     * is not created.
     */
    oneway void add_ref();

    /**
     * Removes a reference to an object which was created by some other means (e.g. by return
     * from a function, out parameters, or add_ref). The object may destroy itself any time
     * after no references remain.
     */
    oneway void release_ref();

    /**
     * Returns an IObject of the same implementation, which supports a specific interface.
     * @param id The name of the interface, with each part of the scope separated by double
     *           colons. For example, "XPCOM::IObject".
     * @return A supporting IObject, or null if interface not supported.
     * An implementation should support query_interface for all interfaces directly or indirectly
     *   (through inheritance) supported by that interface.
     * @note This particular operation needs special treatment by bridges, because although the
     *       return type is XPCOM::IObject, the bridge is expected to look up the id and produce
     *       a bridge suitable for casting to the desired type. If the bridge cannot do this,
     *       because, for example, it doesn't know the interface, it should return null (since
     *       the bridges + implementation cannot support the desired interface).
     */
    IObject query_interface(in utf8string id);

    /**
     * Fetches the ID of the object. IDs should be generated in a way which makes the
     * probability of a collision negligible. The recommended method is to use a random
     * number generated seeded with a sufficient amount of data, and output data in this
     * form %08X-%04X-%04X-%04X-%04X%08X (where %0nX represents a hex string of n digits
     * padded to the left with zeros if needed).
     * The id must never change once set.
     */
    readonly attribute string id;
  };
#pragma flat_name CORBA_IObject
#pragma ID IObject "DCE:00000000-0000-0000-c000-000000000046:1"
};

Note that ID allows for both compare and hash to be implemented without placing any additional burden on the base object (as library functions which rely on id).