
First announced officially towards the end of 1996, the Extensible Markup Language (XML) has gained momentum during the course of 1997. This especially, because some of the submissions to the W3C are based on XML. What's more is that at the beginning of 1998 the members of W3C will decide upon the finished Proposed Recommendation and, the W3C draft of XML-Style is in the making, too (for an introduction into XML see the article by Ingo Macherius).
One of the major differences between XML and the Hypertext Markup Language (HTML) is that XML isn't limited to a finite set of elements. Although there are extensions to the HTML standard, they are browser specific (IE: MARQUEE; Navigator: BLINK) and, by far not all browsers interpret them (frames, for instance, have made it to general acceptance). XML, on the other hand, being a subset of SGML, can handle any number of element types; reading a document type definition (DTD) makes the elements known to an XML-enabled browser.
You can write your own DTD, although it might turn out not being as easy to do as writing common HTML. And, in such element types as well as in their attributes, you can store so-called metadata (data about data). The following - admittedly rather short and rudimentary - listing shows, what can be done just by namig the element types in a way everyone understands:
<address><name><firstname>Manfred</firstname><surname>Ehrlich</surname></name><email>me@xyz_company.de</email><wpage>http://www.xyz_company.de/people/me/</wpage></address>
What catches the eye (hopefully), is that the element types have descriptive names; only wpage isn't really evident. Additionally the listing shows that elements can be nested, albeit more rigidly than in HTML: literally, containers are not allowed to overlap. Instead, the first one (address) entirely contains the other(s). As an alternative to the listing above you can encode something like an address with attributes, as the next listing shows:
<adresse id="Manfred_Ehrlich" vorname="Manfred" name="Ehrlich" email="me@xyz_company.de" wpage="http://www.xyz_company.de/lpeople/me/" />
>'. In a case like this the element must include a slash ("/") right before the closing bracket (">"), because otherwise, XML would definitely expect an end tag (compulsary). XML documents are displayed using a style sheet (which is missing here) and a back-end processor. Within style sheets authors can define, amongst other possibilities, which parts of a document are to be displayed - and which aren't. An article regaring XML's style mechanism will be published in a forthcoming issue of iX - once a standard is agreed upon ...
Metadata, as seen in this example only rudimentarily, are considered to be the application of XML. And, as the few lines of code quoted above might imply: having a common set of element types (like DublinCore for electronic documents) isn't such a bad idea. In fact, it's a necessity. XML applications (sometimes also called "profiles") can serve this purpose by defining element types for specific realms. The most famous example of such a (in this case SGML) application is HTML. Whether paragraph or table, the browsers include all the elements defined by the HTML specification. But then again, regarding metadata they only know about the META element, which helps the Web author to include at least his name and the document's expiry date. Not very much.
As far as metadata are concerned there are, as had to be expected, competing submissions by Microsoft and Netscape, being brought forward to the W3 Consortium. Apart from that the W3C has migrated a PICS (Platform for Internet Content Selection) working group to an RDF group (Resource Description Format). The Consortium plans to generate this RDF out of both Netscape's and Microsoft's as well as further submissions. Both the prominent vendors are certainly going to implement it.
Netscape's submission, the Meta Content Framework (MCF), is based on Apple's HotSauce, a structure definition language which organises information as a directed labelled graph. Nodes, labels (also called property types) and arcs, consisting of soure- and target-node, are the basic elements. In addition, there is a set of predefined elements, meant to facilitate the description of relationships. In MCF syntax the address element shown above might look like this:
<xml-mcf><mcf-ref XML-LINK="SIMPLE" ROLE="XML-MCF-BLOCK" href="http://www.whereever.org/necessities.mcf" /><person id="Manfred_Ehrlich"><surname>Ehrlich</surname><firstname>Manfred</firstname><description>Webmaster at XYZ-Company</description><email>me@xyz_company.de</email><wpage unit="/people/me" /></person><!-- further definitions --></xml-mcf>
After Netscape had submitted MCF to the W3 Consortium in the beginning of June 1997, Microsoft didn't hesitate any longer and submitted XML-Data (www.microsoft.com/standards/xml/xmldata.htm) only a few weeks later. Their specification doesn't work with directed labelled graphs, but slightly more simplistic with trees. It is based loosely on Microsoft's own Web Collections in XML.
XML code following this specification looks rather similar to MCF code. Here, too, elements can be nested and data can be put into attributes, as well. Of course, XML-Data and the Channel Definition Format are a perfect fit; after all, they are both coming from Redmond. Elements, though, are not realised as a DTD, but as schemata, written in XML. The following listing shows such a schema for the element "person":
<xml:schema><!-- ... --><elementType id="person"><relation href="#FIRSTNAME" /><relation href="#SURNAME" /><!-- et cetera --></elementType><!-- et cetera --></xml:schema>
Elements, declared in such fashion, are extensible: the Microsoft document regarding XML-Data describes, how - using extends="#pieceOfArt" - the element book inherits the relation "title" and adds the author.
The same concept of inheritance applies to entire schemata, as well. Provided, there is a more general schema for person, its elements can be used for another element like employee. All one has to do is to use extends with a URL and an XPointer reference.
extends="http://www.mycompany.com/schema/schema?XML-XPTR=ID(person)"
The ID refers to the element in the document (referred to by the URL), which has exactly that ID - i.e. person.
The working draft of RDF, the Resource Description Format (Version Oct. 2 1997) at first looks like the W3C was about to follow the Netscape draft for MCF. The data model for the time being is supposed to be a directed labelled graph: with nodes, property types (see "labels" in MCF) and three part tupels, which consist of a node, a property type and another node or an atomic value (a Unicode string), respectively.
If you translate simple statements like "James Joyce is the author of the book Ulysses" into a more abstract notation, it might look like [author, "Ulysses", "James Joyce"] - where author is the property type and the two strings are nodes. RDF uses XML, but doesn't need a DTD, only a well-formed document, i. e. the markup must be consistently in XML syntax.
The RDF specification knows schemata, as occur in XML-Data. Real code is embedded in
<RDF:serialization></RDF:serialization>
This embedded material can be single resources, so-called assertions (statements) or aggregates (ordered and unordered lists as well as alternatives). The listing [#l5 further down] shows two ordered lists (<RDF:seq>), which could make up the RDF part of an XML document.
Those who want to make use of different sources will encounter the phenomenon of double, if not triple, definitions of the same element(s) - think of person or employee and the possible conflicts arising from various definitions. XML solves such definition problems in several DTDs by chosing the first one, but then again, this could be the wrong one.
W3C currently are thinking about resolving the clashing of namespaces - thinking so rigorosly that sometimes documents might not be available for the public. The Layman/Bray suggestion, for instance, was at certain times only visible for members of the W3C. At the time of publishing this referenced link was valid in the sense of availablity. RDF builds on such regulations, as the following listing shows:
<?xml-namespace href="http://www.w3.org/schema/rdf-schema" as="RDF"?><RDF:serialization><RDF:seq id="JoyceByDate"><RDF:li href="http://www.joycean.de/artist.html"/><RDF:li href="http://www.joycean.de/ulysses.html"/><RDF:li href="http://www.joycean.de/finnegan.html"/></RDF:seq><RDF:seq id="JoyceByAlph"><RDF:li href="http://www.joycean.de/finnegan.html"/><RDF:li href="http://www.joycean.de/artist.html"/><RDF:li href="http://www.joycean.de/ulysses.html"/></RDF:seq></RDF:serialization>
In principle, the ideas put forward by Andrew Layman of Microsoft and Tim Bray, who does consulting work for Netscape, mean that the prologue of an XML document can contain several source referencs like
<?xml-namespace href="http://source/of/rdf-schema" as="q1">
- and in the document itself references to the contents of as are noted like
<q1;qElement>...</q1:qElement><q2;qElement>...</q2:qElement><q3;qElement>...</q3:qElement>
It probably won't be long until a working draft regarding this topic will show in detail, whether it shall be as outlined here or different. You'll find the working drafts, older and recent, as well as submissions and notes (some kind of casual base for future discussion within the W3C without the W3C having to make any commitments) on the W3C server. Theses information includes papers on the Web Interface Definition Language (WIDL) by webMethods and the Open Software Description Format by Marimba's Arthur van Hoff together with Microsoft's Hadi Partovi and Tom Thai.
The fact that with CDF XML is already used on the Web and, the Document Object Model, which is still work in progress (like many other W3C working drafts), will enable Web authors to access and change HTML as well as XML documents, indicates that the XML train can gain even more momentum, once the specification is agreed upon. As far as the syntax is concerned, the memers of the W3C will decide in January, whether - or maybe that - the Proposed Recommendation of XML will be it.
Literature
[1] Ingo Macherius; Web Languages; Experts' Revolution; XML: a professional alternative to HTML
[2] Angus Davis; Deploying Metadata Representations Of Web Content
[3] John Tigue; A Standardized XML API in Java (XAPI-J, xml.datachannel.com/xml/dev/XAPIJ1p0.html)
[4] [Microsoft column]; Dr. GUI does data-with XML; Microsoft Developer Network 20. 10. 1997
[5] Tim Bray, Andrew Layman; Options for Implementing Namespaces in XML
Dieser Text ist der Zeitschriften-Ausgabe 01/1998 von iX entnommen.
iOS, Android, Windows Phone 7 und HTML5 - das neue Sonderheft von heise Developer führt Einsteiger und Profis in die Programmierung mobiler Geräte ein.