HOMECONTACT US

The Exegenix Export DTD

Design goals

  • Represent known structures
  • Capture document hierarchy
  • Easy to transform in the post-processing (E.g., XSLT) phase
  • As similar as possible to commonly-used DTDs
  • Preserve formatting information
  • Extensible (easy to add new elements -- <block role="abstract">, etc.)
  • Human-readable markup

Based on DocBook

The Exegenix Export DTD employs a structure model, element and attribute names, and table model which have been chosen to conform to those commonly used in the industry, particularly those found in the DocBook DTD (http://www.docbook.org), with some augmentations that provide for:

  • Extra constructs, such as page headers and forms
  • Formatting properties as attributes, named to match CSS and XSL-FO
  • Ease of transformation

Transformability

The DTD has been designed to facilitate transformation of the XML output using tools such as XSLT. The principal way in which this has been done is by providing container elements that give scripts easy access to groups of related elements. For example:

  • <sectionbody> surrounds all contents of a <section> after the title.
  • <listitembody> surrounds all contents of a <listitem> after the <mark>.
  • <footnotereferencegroup> surrounds sequences of <footnotereference> elements.
  • <figure> surrounds sequences of <mediaobject> elements.

The similarity of our markup to industry-"standard" markup also facilitates transformation. Also, redundant block-level repetition of formatting information, rather than using inheritance rules, makes script development easier.

Block structure

We have opted for a very flexible block structure, where most block elements can contain any other block element type (including itself). When authoring documents, a DTD having a rigid content model prevents incorrect usage of elements by authors using validating authoring tools (a prescriptive DTD design). We feel a more flexible approach is required for a descriptive DTD which is intended to model an extremely wide variety of documents, and not enforce particular authoring rules. It also makes the DTD simpler to write and understand.

Hierarchy

The DTD can represent a hierarchy with any number of levels. The entire document is wrapped in a <document> element. Subsequent hierarchical divisions (including "chapter") are represented by the <section> element, which can be nested infinitely. The different levels can be distinguished using the optional level attribute, or simply by computing the number of <section> ancestors of a particular <section>.

Sections have an optional title (<title>), and can store section-level headers and footers. The remaining contents of a section are contained in a <sectionbody> element.

Paragraph structure

The <para> element represents a paragraph, which by its broadest definition encompasses a sequence of thematically linked blocks: for example a block of text, which introduces a list, continues after the list terminates, and later references a block quote, could all be surrounded by the same <para> tag. For this reason, ordinary text inside a <para> is surrounded by a <block> tag, in order to avoid mixed block and inline content. Furthermore, contiguous blocks of text that exhibit formatting differences (for example, lines may have a shorter length as they wrap around an image) can each be represented by a <fragment>.

Titles

Titles can contain any block content; in practice, we expect that most titles will be marked up as one or more <block> elements, each representing contiguous lines of title text, where each physical line is contained in a <line> element. A title can contain one or more blocks.

Lists

The DTD supports ordered (<orderedlist>), unordered <itemizedlist>, and compound <compoundlist> lists. Compound lists are a generalization of the HTML "definition list", and consist of lists whose "term" item can have an arbitrary number of sibling "definition" items, and an arbitrary number of sub-items.

For ordered and unordered lists, list marks (bullets, numbers) are represented by a distinct <mark> element whose content is the mark itself.

Other blocks

The DTD supports the following other block constructs: Block Quote; Literal Layout; Note; Equation; Side Bar.

Table model

To represent tables we use the CALS model with some augmentations. All table models (including HTML) have a table-row-cell structure. Our tables will be easily transformable to clients' preferred models (when not CALS). To the CALS model we have added some HTML attributes such as cellspacing and cellpadding, as well as a richer set of separators.

Inline Emphasis

Most forms of inline emphasis are represented by the <emphasis> element. Individual styles are distinguished by the values of the relevant formatting property attributes. For example, font-weight="bold" represents bold emphasis. Following common industry practice, subscripts and superscripts are represented by the specific elements <subscript> and <superscript>.

Formatting

Formatting properties are represented in attributes of the various objects themselves. These attributes were adopted from CSS and XSL-FO: for example, font-weight, font-style. We felt that storing this information in the same file as the document content and structure would make the XML output easier to work with than storing it in a separate file and linking it to elements in the main file by way of IDs or some other mechanism.

Pagination

Page breaks are represented by the <beginpage> element. This element records the ordinal page number in the source document. It can also contain page-specific headers and footers, and stores any footnotes that appear on the page.

Submit sample documents
for conversion.
Try it FREE!

More Info