
Whoever, webmaster or freelancing web author, is in charge of the regular update of link lists and the like, is liable to have a close look at possible errors ever so often. What is corrected here, isn't necessarily updated on the next one. There's more than one way to avoid this dilemma. A particular one is the subject of this article.
There are two points to be mentioned here. First, the data should reside in a database table (which it does, at least, in this example) so that those in charge can take care of the data's consistence by updating within precisely one application and, can take care of creating - an indefinite number, in the end - HTML target documents. Apart from this, a short prospect shows how to use parts of style sheets for different applications.
It has turned out to be practical for on-line readers (web server statistics show this) that a list of on-line available iX articles exists. Less practical, on the other hand, is that all the data reside in a single file, which means you have to load about 36 K every time. We wanted to change this, particularly, because the file consists of a table of about 300 lines by now - therefore displaying it takes a few seconds. Of course, you could divide the table into several HTML pages (the simplest solution). But as the data comes from an RDBMS, anyway, it appeared more sensible to add a few fields to the database table and create HTML via report(s). After all, the objective was to create and update several Web pages automatically.
The Extensible Markup Language (XML) was brought in, because it allows structured markup of documents (see [1]): I can call an article article or a year year and don't have to use only the limited set of HTML elements. This means, the element and attribute names (can) contain meta-information instead of just helping to format (like in <p> or <h5>). What enforced using XML was that once programmes exist (this is the real work), future update action is limited to a few keystrokes - without any work on the final product (the HTML pages).
It is not too complicated, after all, to shape information extracted from a database with this language and some style sheets (DSSSL and CSS, see [2]), particularly to create several HTML documents from the stock of data. It is like any programming: more work to start with, but later on you get the benefit of having all the style sheets at hand.
As mentioned above, the starting point for the following programmes is the collection of about 200 on-line available iX articles. Apart from a document that really lists them all (if anyone should insist) there ought to be others which display only those of a specific year - or Perl or Linux or ... And, the programmes (style sheets) should allow for extensibility (other little applications).
This goes for operating systems as well: the programmes shown below, run on a Solaris machine, but the software needed is available (compilable) on other Unix platforms and Windows.
As always, the first thing to do is programming, bevor, in the end, one or two command lines will do to create 10 or 100 HTML pages. I had already worked with the Document Style Semantics and Specification Language (DSSSL), which is meant for formatting SGML documents, and I had used Jams Clark's jade (see [2]), which is why the tools were obvious from the start. So-called DSSSL style sheets create the target documents, using jade's capability to transform data from one SGML language (DTD) into another (in this case HTML). Later more ...
Even those who want to concentrate on HTML "medium-term-wise" - and be it just to provide users of not-XML-capable browsers with the data - can profit from XML as an intermediate data format. And at least the big browsers are supposed to be soon able to display XML data coupled with an XSL style sheet. On top of this, planned features like XPointer will enable pointing into documents. Once XPointer is implemented, authors can include a document a quote or a reference in their own page (copyright issues notwithstanding).
The to-do list consists of
Of course, you could simply create lots of database reports, which result in a specific HTML document, respectively. But then again, you would have to work on each of them, once requirements change. Using XML and DSSSL, though, you can divide a finished programme (the so-called style sheet) into different parts, some of which, e. g. representing menus for corporate identity, can then be made available for a variety of applications (see the [#end end of this article]). Once this is done - and provided, there is a corresponding Makefile with suitable jade calls and copy commands - it should be enough to type in
make make install
to build and copy the HTML files to the directory (or directories), in which they are supposed to be found by users.
Before a report is used to write the data to a file you should define the structure of the resulting data. The database table is two-dimensional, in an XML file there are properly nested elements. It is difficult to read at first glance, but looking at it again soon itturns out to be a relief to have a description of the document's structure. Convenient then that such a description, called document type definition (DTD), is required, if you want to work with jade. Listing 1 shows a slightly abbreviated DTD for the on-line articles.
Easy to see that the structure isn't a deep one (that will make it easier to write style sheets later on). The element at the root of the document online consists of at least one element year (the plus sign denotes this, as in regular expressions), year, on the other hand, contains one or more issue(s) (issue), which on their part contain articles. Each of these has data such as a title, an author, both elements of their own right. Furthermore, the DTD defines, which data should be part of the elements year, issue and article (see digression for a few hints, when to use elements or attributes).
The starting point, as mentioned before, is that information on the articles is available from a database table and any upates are done there. From this table a report extracts the information necessary for the XML/HTML documents. Listing 2 shows an example based on the Informix RDBMS. Those who work with other database systems, will have to adapt to their report possibilities. Users of MySQL and other systems, which don't offer report facilities, can achieve similar reports with a Perl script using Perl's database interface (DBI).
The report prints for each field its content surrounded by start- and end-tag and wraps the content of some fields into attributes. Like in HTML, the attributes and their values are defined in the start-tags. A series of print statements takes care of the correct (well-formed) syntax of the resulting file. To ensure this, a shell script wraps the output into the necessary ingredients of an XML instance.
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?><!DOCTYPE online SYSTEM "online.dtd"><online><!-- [lots of data] --></online>
The first line tells programmes, which are supposed to process this file, that this is an XML document, which is encoded in ISOlatin-1 and that there are markup declarations outside the document (it is not standalone). The second line states the type of the document and refers to the file, which describes its structure (online.dtd). Finally, start- and end-tag of the element online make up the frame for the rest of the data.
Within the report one needs to pay attention to a few details. For instance, the attribute ID of the articles should be a string without any whitespace and exist in always the same form ("a"yymmppp, where "y" stands for year, "m" for month/issue and "p" for page) - for easier and consistent reference. To print the data this way, in Informix use using "&&" will make sure that leading zeros will appear.
print 'id="a', jahr using "&&",
monat using "&&", seite using "&&&", '"'
This way it's easier to remember how to address an article using IDREF. The "a" at the beginning of the ID is only included, because IDs can't be just figures.
print '<page>', page using '<<<', '</page>'
In the element page is included aditionally (for reference) in the data, the leading zeros aren't necessary; on the contrary, they can be a pain... The notation as given above means that only existing figures will be printed.
A few if clauses take values, provided they exist, and put them as attributes in an element. In Listing 2 this goes for the URL of the English version of an article. (en_url > " ") checks, whether the field has a value, which is greater than " " (space, ASCII: 32). In other words: if the field has any meaningful value, there really is a URL. Last not least: clipped, used with text fields, takes care of the removal of leading and trailing whitespace in field values - they are not part of elements or attributes.
When all entries for a specific year have been printed, the end-tag </year> must follow - this is XML's well-formedness: no start- or end-tag may be missing. The same is true for each issue. The necessary output is taken care of by the before/after group statements in listing 2. After all this,
sacego -q dat.arc > dat.xml
is the Informix-specific command that extracts and prints the report to an XML file, which looks like Listing 3. Now is the time for style sheets.
What all this reporting is all about, is to use the compilation of data (regarding the on-line articles) to create a - basically unlimited - amount of HTML documents. For a start, you need a DSSSL style sheet like listing 4 (part of default.dsl for the default page of the target directory). However, this is missing important parts, which deal with the menus (see below) and a few functions. [2] contains a bit more elaborate information on the work with DSSSL (in German, though). But at least one example shall show, how jade transforms XML to HTML.
All three definitions see listing 4) describe what to do with the elements in questions. Only articles of the current year shall be included in the resulting page, which is why
(if (equal? (attribute-string "WHICH" (current-node)) "98") (make sequence (make element gi: "H1" attributes: (cons (list "class" "year") '()) (literal (string-append "19" (attribute-string "WHICH")))) (process-matching-children 'issue)) (empty-sosofo))
checks whether the value of the corresponding attribute is really "98". If this is true, the style sheet creates a headline (make element ...) and processes the child elements issue and article. If the attribute which has a different value, the style sheet does nothing - more correctly put: it creates an empty specification of a sequence of flow objects (SOSOFO): (empty-sosofo).
When dealing with an article of 1998, the style sheet default.dsl produces a header (H3) with the month's name, which a Scheme function converts into a string (see listing 6). Following this header is a list of this issue's articles: a DT for subject and title and a DD for the subhead.
Until now the work has only resulted in a single HTML file, which, of course, is not enough. When the DSSSL programme is up and ready - including the menu on the left hand side and at the top of the following [[bild_url1] screen shot 1] - you can start to dissect it. An almost obvious choice for an extra file is the definition of special flow objects like element and document-type, which James Clark provides (in listing 5 this would be the file ../top1.dat). Scheme functions needed by the style sheet would be part of an external file in this example (functions.scm).
|
http://www.heise.de/ix/online/: the default page of the on-line articles shows next to the articles two menus: the overall iX menu on the left and an application-specific one at the top. |
|
Data for the menu on the left and the string on the right (up to the HR) are set in top2.dat in the parent directory, regarding the right hand side the corresponding files are menu.dat and xyz.l.dat. |
Menus, which are potentially being used by more than one application, can be kept in an external file. And, finally, you should consider page-specific headers et cetera. Here, XML, like SGML, offers the possibility to include external entities: the document type declaration (see listing 4) contains a list of entities - within the square brackets - which become part of the (logical) document by a following &entity;. [[bild_url6] Screen shot 2] above shows the different parts, which can be kind of outsourced. These contain
The rest is the content of files like default.dsl (listing 5 and Listing 4 plus a few details). Here, the element declarations define the ouput. In the example above (listing 4) these were the articles of 1998. Other style sheets produce older articles or, everything there is on the Web, Tcl et cetera. In this application it's about a dozen files.
|
http://www.heise.de/ix/raven/Web/xml/timeline/: the timeline looks like the list of on-line articles, because of the same menu on the left and the top lines and contains a similar application-specific menu. |
Storing the files top1.dat and top2.dat in the parent directory makes it easier to refer to the documents the same way (provided there is a common directory tree). [[bild_url9] Screen shot 3] shows the default page for an application, which was developed parallel to this one. The data for the Internet timeline is not extracted from a database table, but reside in a single XML file. The specific style sheets were easily "ported" to the new appliction. Further reuse is planned and should be easyly done, as well.
Literature
[1] Ingo Macherius; Web Languages; Experts' Revolution; XML: a professional alternative to HTML
[2] Henning Behme, Stefan Mintert; Web-Programmierung; Klammern gehört zum Handwerk; DSSSL: XML-Dokumente fürs Web formatieren (in German, sorry about this)
| iX-TRACT |
|
Dieser Text ist der Zeitschriften-Ausgabe 11/1998 von iX entnommen.
iOS, Android, Windows Phone 7 und HTML5 - das neue Sonderheft von heise Developer führt Einsteiger und Profis in die Programmierung mobiler Geräte ein.