Simple Outline XML: SOX

Introduction

SOX is an alternative syntax for XML. It is useful for reading and creating XML content in a text editor. It is then easily transformed into proper XML.

SOX was created because developers can spend a great deal of time with raw XML. For many of us, the popular XML editors have not reached a point where their tree views, tables and forms can completely substitute for the underlying markup language. This is not surprising when one considers that developers still use a text view, albeit enhanced, for editing other languages such as Java.

SOX uses indenting to represent the structure of an XML document, which eliminates the need for closing tags and a number of quoting devices. The result is surprisingly clear. For example, here is an XSLT script written in SOX form:

stylesheet> 
    xmlns=http://www.w3.org/1999/XSL/Transform
    version=1.0
    template> 
        match=node()
        copy> 
            apply-templates>
                select=node()

Here is a scrap of XHTML written in a slightly more compact style:

html>
    head>
        title> My Home Page
    body>
        h1> Contact Details
        p>  I can be contacted at
            a>  href=mailto:me@myplace.net 
                this address
            except when on vacation.

SOX can be used to write a subset of XML consisting of elements, attributes and text. Other parts of XML such as processing instructions, comments and entities are not covered by SOX at this stage.

An implementation of SOX as a SAX reader is provided.

Basic SOX

In the basic SOX grammar, each line represents an XML element, attribute, or text node as follows. The full SOX grammar adds quoted text and the single-line forms as detailed in following sections.

SOX

XML

An XML element is introduced by a name followed by a '>'. The content of an element follows, in an indented block.
element>
    ...
    ...
<element ... >
 ... 
</element>
An indented block begins with a line indented relative to its predecessor and encloses each contiguous line of the same or greater indentation
A>
    B>
        C>
    D>
<A>
<B>
<C/>
</B>
<D/>
</A>
An XML attribute is introduced by a name followed by a '='. Text following the '=' up to the end of the line forms attribute's value. Attributes must precede other children of the element.
Element>
    attribute=value
    ...
    
<element 
   attribute="value"
 ... > 
 ...
</element>
A line of text which does not contain an unquoted '>' or '=' forms an XML text node.
Element>
    ...
    text node
    ...
<element ...> 
    ... text node ... 
</element>

Whitespace

Whitespace consists of spaces and tabs. Whitespace is treated as follows:

  1. Lines consisting only of whitespace are ignored.

  2. Indentation is represented by whitespace at the beginning of a line, counting tabs as equivalent to 8 spaces.

  3. In unquoted text, leading and trailing whitespace (other than the indent) is ignored and each internal span of whitespace is treated as a single space.

  4. A single space is unconditionally appended to the unquoted text forming an XML text node. (This can be prevented by quoting.)

  5. All other whitespace is ignored.

Quoted Text

A string of text can be quoted by enclosing it within a pair " or ' characters. A quoted string can be used wherever unquoted text may appear.

SOX

XML

A quoted string may be used for an attribute value, following the '=' .
template>
    match="html:p[class='note']"
<template match= 
   "html:p[class='note']"
/>
A quoted string on a line by itself represents a text node. (No space is appended to the string.)
pre>
    "controlled    sp"
    "acing"
<pre>controlled    spacing</pre>
A quoted string may appear within unquoted text (including at the beginning and end of the unquoted text). The string is inserted into the quoted text (without quotes).
p>
    Whole ">" the parts.
<p>Whole &gt; the parts. </p>
Adjacent quoted strings are concatenated without any intervening space.
p> 
    "This" "and" "that"
<p>Thisandthat </p>

Within the string :

  1. Whitespace is preserved.

  2. The the '=' and '>' characters are preserved.

  3. The ' character (for a string quoted with ") and the " character (for a string quoted with ') is preserved.

  4. No line breaks are allowed.

Multiline Text

A multiline string of text is quoted with triple quote marks. Each multiline string represents an XML text node For example:

SOX

XML

pre>
    """Text spanning several
    lines forming a single XML
        'so-called' text node"""
<pre>Text spanning several
lines forming a single XML
    'so-called' text node</pre>
  1. A multiline string is introduced by a (suitably indented) triple quote, ''' or """.

  2. All text following the triple quote up to a matching triple quote forms part of the string. This includes newlines, but indentation is treated specially, as follows.

  3. Indentation within the multiline string is adjusted to form the string value. Any indentation less than or equal to the current indentation level is removed. Indentation greater than the current level is reduced by the current indentation level.

Continuing on the Same Line

For clarity, an attribute or child text node of an element may appear on the same line as the element name, following the '>'. Additional children of the element may follow in an indented block as usual. (Children including any on the same line as the element must still appear in the correct order.) For example:

Basic SOX

Alternative

template>
    name=item
    html:p>
        ITEM:
        apply-templates>
            select=node()
template> name=item
    html:p> ITEM:
        apply-templates> select=node()


A element may also appear on the same line as its parent. In this case, the element is the only child of the parent and the following indented block, if any, belongs to the (innermost) child. For example, an XML schema fragment:

Basic SOX

Alternative

element>
    name=doc
    annotation>
        documentation>
             the document element
    complexType>
        sequence>
            element>
                name=body
                type=bodyType
element> name=doc
    annotation> documentation> 
        the document element
    complexType> sequence> element> name=body
        type=bodyType


Implementation

A Java implementation of a SAX parser and a SAX serialiser is provided. The source is here: SOX-20020331.zip.

A convenient way to parse and generate SOX is to use styler. Styler can be used from the command line or as an Ant task to process SOX.

Notes

The foregoing definition of SOX is in the public domain and may be copied and used freely.

The acronym "SOX" also refers to a circa 1999 XML Schema proposal: http://www.w3.org/TR/NOTE-SOX/


Copyright 2001, 2002 Langdale Consultants
Contact Arnold deVos for further information.