Mittwoch, 21. August 2013

The Multiformat Specification Challenges


Providing a specification in multiple formats, in particular JSON, XML and RDF is increasingly seen as useful, as different purposes are best served using different formats, in particular where data are to be processed in various contexts and with various tools. 
 
However these formats differ in their expressive power as well as in their precision. Therefore, attempts to provide bindings for the same information model in different formats have the effect that each of these bindings will capture specific aspects of the information model while other aspects cannot be described in a machine readable way using a particular binding format.

In order to use the most appropriate format for each purpose it is highly desirable to be able to convert data from one format into another. However, while there are a variety of generic algorithms to translate between XML, JSON and RDF, these are of no help as they are unaware of the intended information model: JSON Data transformed into XML and back to JSON will not be equivalent to the original data and will in almost any case not conform to the original information model.

This leads to the following Challenge 1 for an information model M:
Define bindings JM, XM and RM in JSON, XML and RDF respectively and transformations JX:JMàXM, XR:XMàRM, RJ:RMàJM such that for each instance J of JM, X of XM and R of RM the superposition of these three transformations applicable to this instance is equivalent (for the specific binding framework) to the original instance. For example for each JSON instance  of the  JSON binding JM of the information model M RJ(XR(JX(J))) should be equivalent to the original instance J.
Note that meeting challenge 1 implies the existence of inverse transformations with the same properties, as the superposition of two of these transformations will be an inverse of the third.
Note also, that meeting Challenge 1 would allow for strong data conformance testing: For example given a JSON data instance J, the instances J, JX(J) and XR(JX(J)) can be validated against a JSON schema for J, an XML schema and Schematron rules for JX(J) and an ontology for XR(JX(J)).

A generalization of Challenge 1 is the following Challenge 2:
Define a machine readable format of information models and an algorithm which generate for each information model M in this format, bindings JM, XM and RM in JSON, XML and RDF respectively and transformations JX, XR and RJ meeting Challenge 1.
Note that we do not request that each information model must be representable in the format of Challenge 2. We expect, however, that for many domains it will be possible to design information models in a way that can be equally well served by JSON, XML and RDF, requiring only few modifications as compared to information models designed with only one particular binding in mind.

Note: Mark Nottingham’s discussion at http://www.mnot.net/blog/2012/04/13/json_or_xml_just_decide in favor of choosing between JSON and XML argues against this approach saying That leaves creating a new metamodel and mapping it to JSON and XML. This is time-consuming, tricky, and still isn’t likely to feel “native” in either of the formats; i.e., you’ll get the worst of both worlds.
I don’t think this is valid for very many models. Definitely for the JSON centered ADL Experience API it would have required only two minor changes to map it easily and equivalently to natural XML and back. The “trick” required is to be aware of the specifics of JSON, XML and RDF and not to work against any of them. Solving the above Challenge 2 could confine information models to those that can be mapped to natural XML, JSON and RDF, getting rid of the “tricky” part and would automate the generation of the required bindings and transformations, thus getting rid of the time-consuming part.