ACCC Home Page ACADEMIC COMPUTING and COMMUNICATIONS CENTER
Accounts / Passwords Email Labs / Classrooms Telecom Network Security Software Computing and Network Services Education / Teaching Getting Help
 
The A3C Connection, July/August/September 2000 The A3C Connection
July/Aug/Sept Contents Welcome Back! XML and the Future XHTML: Straddling the Fence Processing XML: XSLT
SOAP: Under the Covers RSS: Spreading the News Web Resources Roaming with Eudora About the A3C Connection

Processing XML: XSLT, A Matter of Style

 
News on the Net
WWW Everyone

If you want the benefits of XML, and there are many, you have to pay the price. That means finding a way to do something useful with your XML files. The available options are a bit more limited now than XML is itself, but they are improving rapidly. Fortunately, one of the big benefits is the ability to write your XML content now and change your mind repeatedly about how to process it, without changing the original data at all.

One option is to transform an XML document into some other format, be it non-XML, a different XML tag set, reorganized XML content, or HTML. If you were a programmer, you'd be looking into DOM, Document Object Model, or SAX, Simple API for XML, which break an XML document into parts in different ways, suitable for a program in Java, perl, or C++.

But in this article, I'll give you a tiny taste of XSLT, X Stylesheet Language-Transformations, which is the currently working part of the overall XSL, Extensible Stylesheet Language, standard. XSLT is a special XML tag set, but one that contains rules for processing other XML documents, rather than real content itself.

 
   
 
     
Style Sheets
 

Remember style sheets? The whole original premise of markup was the separation of documents into content (in this case, the XML) and styles, the rules for dealing with content. In simple terms, you'd send an XML file and a style sheet to a browser and have the document rendered. To change the rendering, just change the style sheet. And it actually can work that way. But when you look at the details of XML styles, things are much more flexible and useful than that simple picture.

 
     
HTML Style Sheets -- CSS
 

The current champ of style-sheet languages is CSS, Cascading Style Sheets. CSS predates XML and is used to specify how HTML is to be rendered. For example, you could write a CSS rule that says "when you see a <p> tag, set the text in 12 pt Helvetica" or "when you see <b>, set the text in red but don't boldface." Sure, you could do most of what CSS can do with judicious use of <font>, <table>, and other HTML tags. But if you want to change the rendering of dozens of HTML files in one fell swoop, CSS is the way to go. CSS also provides many fine-control positioning commands and, best of all, has some support in most current browsers.

Unfortunately for CSS, rendering isn't the only thing you might want to do with a style sheet. You might want to insert new content or rearrange the old content. This CSS can't handle; it's just for rendering.

 
     
XML Style Sheets -- XSLT
 

Suppose you have an XML file with information about your department, such as the names and addresses of the faculty and staff, a list of grad students for each prof, a list of courses and their descriptions, and so on. Now you want to generate a list of professors in HTML to use on your departmental home page. Sure, you can copy and re-edit (and re-alphabetize) the file, but do you really want to do that every time you add a new professor? Or when you decide to reformat the output to include a list of courses by each prof? Better to write a style sheet, and use the style sheet rules to do the extraction and rearrangement.

Enter XSLT.

 
     
An Example Is Worth 1000 Tags
 

So here's an example: the XML file in figure 1.

Figure 1: The XML Input File

<?xml version="1.0"?>
<department>
   <dept-name>
      Department of Magical Implications
   </dept-name>
   <address>
      10 Hogwarts Castle
    </address>
   <person type="chair">
      <name>Albus Dumbledore</name>
    </person>
   <person type="prof">
      <name>Severus Snape</name>
      <interests>potions</interests>
   </person>
   <person type="prof">
      <name>Minerva McGonagall</name>
      <house>Gryffindor</house>
   </person>
</department>

The file is short, but the idea is clear -- a <department> contains a set of departmental info such as <dept-name> and <address> and also a list of <person>s in random order. Each <person> can have a different type, which is indicated with the attribute type: <persontype="prof">. And then there are subelements, such as <name> or <house>. An easy way to store information, provided you can process it.

Now suppose you just want a list of the people in the department. Names only, alphabetized. You want it in HTML format, with the name of the department at the top, suitable for use with your departmental home page. And please indicate somehow the departmental chair in the list.

Now consider figure 2, a simple XSLT file. It may look vaguely like a programming language, and in some ways it is. But it is fundamentally an XML document like any other and is not too hard to read, at least for simple cases.

Figure 2: The XSLT Input File

<xsl:transform
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      version="1.0"
      xmlns:saxon="http://icl.com/saxon"
      exclude-result-prefixes="saxon"
>
<xsl:template match="department">
      <HTML>
      <BODY>
      <xsl:apply-templates select="dept-name" />
      <UL>
      <xsl:apply-templates select="person" >
        <xsl:sort select="name" order="ascending" />
      </xsl:apply-templates>
      </UL>
      </BODY>
      </HTML>
</xsl:template>
<xsl:template match="dept-name">
      <h1>
        <xsl:value-of select="."/>
      </h1>
   </xsl:template><
<xsl:template match="person">
      <li>
        <xsl:value-of select="name"/>
      </li>
</xsl:template>
<xsl:template match="person[(@type='chair')]">
      <li>
        <xsl:value-of select="name"/>
        (<b>Chair</b>)
      </li>
</xsl:template>
</xsl:transform>

An XSLT file has a bunch of <xsl:template> elements. Each such template is just a rule: "As you parse the XML file of interest, when you come across a tag of such-and-such type, do the following." So each template has to specify which tags it applies to (using the match attribute) and then specifies what actions to take in the body of the template element.

<xsl:template match="dept-name">
This is the simplest tag. This says, "When you come across an element of type <dept-name>, output <h1>, then the value of the <dept-name> element, then </h1>." That's it; simply put the name of the department into an <h1> element.

<xsl:template match="person">
This tag is just as simple. It says, "When you come across a <person> element, put the content in an <li> element." Obviously, this is because we want a list of <person>s.

<xsl:templatematch="person[(@type='chair')]">
I put this in just to show that the match attribute can be a bit fancier. It is almost the same as the <xsl:templatematch="person"> element, but it only applies to <person> elements that have an attribute of type that equals chair. In this way, the chair of the department can be treated differently.

Of course, <persontype="chair"> is also a normal <person> element. So how does XSLT know which template to use? The general answer is it picks the most specific. Which is almost always what you want.
 
     
-- Modifying the Processing Order
 

Now for the hard part, and the piece that makes XSLT so different from CSS. XSLT does not go through the input XML file in linear order. The actions one can take inside a template can affect the order of processing. This is crucial and makes XSLT extremely powerful. In fact, XSLT only processes the first input element by default. The overall order of processing is determined by the actions inside the templates.

<xsl:templatematch="department">

Consider this element. True enough, when the parser finds the <department> element, it outputs some HTML. However, when it gets to the apply-templates element <xsl:apply-templates select="person"> things change. This says, "Stop outputting HTML temporarily. Go find all the <person> elements in the input, regardless of where they are in the input; for each one you find, find the applicable template and apply that template now." And the <xsl:sort> part says, "Don't take the <person> elements in order that you find them, but rather process them in order of <name>, alphabetically."

 
     
-- Processing the XML with the XSL Rules
 

So now we run the XML file in figure 1 through an XML processor (we have an XML interpreter on icarus; it's called SAXON) with the XSLT file in figure 2, and it produces the HTML file in figure 3.

Want to try it yourself? On icarus, copy the XML file (figure 1) into a file called sample.xml and the XSL file (figure 2) into sample.xsl. (You can  cut-and-paste from this newsletter on the Web.) Then enter the command:

    saxon sample.xml sample.xsl > sample.html

to produce the output HTML file in figure 3.

Figure 3: The HTML Output File

<HTML>
<BODY>
   <h1>
       Department of Magical Implications
   </h1>
   <UL>
     <li>Albus Dumbledore
       (<b>Chair</b>)
     </li>
     <li>Minerva McGonagall</li>
     <li>Severus Snape</li>
   </UL>
</BODY>
</HTML>

SAXON is actually a collection of XML tools, including a standard XSLT processor, a Java library, which supports a similar processing model to XSLT but allows full programming capability, and a version of the Ælfred XML parser from Microstar.

The saxon command on icarus is a script that executes a piece of the SAXON Java library; to see what it really does, on icarus, enter:

    more /usr/local/bin/saxon

Would you rather work on your personal computer? Try Instant SAXON, an XLST interpreter for Windows.

The SAXON source and documentation, including explanations of the XSLT tags and lots of examples, are on the SAXON home page at: http://users.iclway.co.uk/mhkay/saxon/

In the End

This article is only a small taste of XSLT. There are several good tutorials in book form and on the Web; see Web Resources.

Bob Goldstein, bobg@uic.edu
ACCC Operating System Support and Development

 
 

The A3C Connection, July/Aug/Sept 2000 Previous: XHTML: Straddling the Fence Next: SOAP: Under the Covers


2000-10-13  connect@uic.edu
UIC Home Page Search UIC Pages Contact UIC