Doctype Descriptor Format

Document types must provide a /HARP-META/doctype.xml file describing how their elements behave.
Important: Document types conforming to the DITA standard do not need a doctype descriptor, as DITA documents are self-describing via the @class attribute.

Technically, the only configuration that must be in doctype.xml are

  • The ID and language attributes
  • Profiling attributes
  • Links
  • Graphics
  • Divisions (chunk boundaries)
  • Titles, for populating empty ID-IDREF links and creating ToC navigation of chunked divisions.

Titania Delivery needs this information in order to properly handle the references in a document. Most of the other configuration that can go into doctype.xml is more presentational. However, describing presentational elements in this file allows TD to apply default styling, leaving little to no need for doctype-specific XSLT.

Structural Elements

<doctype>
This is the root element of the file. It has the following attributes.
@id
Required. An identifier for the document type, captured as a property on documents of this type.
@id-attr
Optional. The ID attribute from the document type. TD will look for @xml:id by default.
@lang-attr
Optional. The attribute in the DTD used to identify the language of an element. TD will look for @xml:lang by default.
<comment-element>
A single top-level tag describing the markup to use when embedding comments in XML. If no <comment-element> is present, then TD will use namespaced markup for comments.
@name
Required. The commenting element name.
@dispositionAttr
The attribute to use for disposition values.
@authorAttr
The attribute to use for the author's name.
@timestampAttr
The attribute to use for the timestamp.
@dataElement
The element to use for comment metadata, nested within a comment.
@dataKeyAttr
The attribute on the data element for the metadata key.
@dataValueAttr
The attribute to use for metadata values on the data element.
@wrapReplies
True or false. True by default. If true, replies will be wrapped in a data element with a key of "replies." (This is necessary in many content models where comment tags cannot be directly nested.)
<links>
Container for link element descriptions <idref>, <external>, and <fileref>.
<graphics>
Container for graphic element descriptions (<graphic>).
<specials>
Container for <special> descriptions. May carry a @type attribute containing the default special type for the nested descriptions.
<profling-atts>
Container for <profiling-att> descriptions.

Descriptive Elements

Each of the elements in the structural group above can optionally start with a <title> and/or <description> element providing a human-readable overview of the elements contained within. These are not required for the system, but are usful when maintaining the file.

Common Attributes to All Markup Description Tags

The following attributes appear on every markup description tag described below.

@match
The XPath pattern used to identify the elements being described. This will most often be a simple element name, but could also be an OR-ed list of tag names, or use a predicate to describe different elements with the same name differently based on attributes or context. For example:
<special match="title" type="title"/>
<special match="literallayout | programlisting | screen" type="preformatted"/>
<special match="emphasis[@role='bold']" type="bold"/>
<special match="emphasis[@role='italic']" type="italic"/>
<special match="emphasis[@role='underline']" type="underline"/>
<special match="emphasis[@role='strikethrough']" type="strikethrough"/>
<special match="emphasis" type="emphasis"/>
When there is ambiguity between two configurations - for example, <emphasis role="bold"> is matched by both emphasis[@role='bold'] and emphasis - the earlier configuration in the document will apply. When creating a configuration file, you should put your most specific rules first, and your most generic rules last.
@html-element
Optional. The default HTML element to be generated by the default HTML transform for elements that match the pattern in @match. The HTML equivalent is frequently inherent in the description (e.g. links should be come <a>, graphics should become <img>, bold elements should become <b>, etc.)

Profiling Attribute Configuration

The <profling-atts> element can contain any number of <profiling-att> elements defining the attributes used for profiling. The available attributes are:

@name
The name of the attribute.
@delm
The delimiter for multiple values. The default is a space.

Link Descriptors

<idref>
Describes an ID-IDREF linking element.
@linkend
Required. The attribute that will contain the ID of the link target.
Note: In order for ID-IDREF links to work properly, the @id-attr attribute on the root <doctype> element must be set correctly.
<idref match="xref" linkend="linkend"/>

This will match <xref linkend="abc"/>.

<external>
Identifies external (generally web) links.
@href
The name of the attribute identifying the URI of the target.
<external match="ulink" href="href"/>

This will match <ulink href="http://whatever"/>.

<fileref>
Identifies links that refer to another file in the system.
@href
The attribute containing the URI of the target file.
<external match="fileref" href="href"/>

This will match <fileref href="note.xml"/>

Graphic Descriptors

<graphic>
Names a graphic element.
@src
The attribute naming the URI of the target graphic.
Note: Currently, there is no configuration for sizing attributes, placement (inline vs. block), or other presentational aspects of graphic references. We should consider adding these. For now, these can be accounted for in html.xsl.

Specials

The <special> element is a sort of catch-all "everything else" configuration element. It carries a @type attribute describing the type or types of element it represents.

An element may have more than one type. For example, an inline code snippet would be described with type="preformatted inline".

The available type values are:

inline
Inline elements. The generic stylesheet styles everything as a block by default, so inline elements should be enumerated explicitly unless they only appear inside other elements configured with type="para".
titled-block
Blocks with titles. Empty IDREF links to titled blocks will be populated with the block's title.
title
The element used for titles of titled blocks and divisions.
division
A document division. When processing a document, it will be chunked at the division elements, replaced by a link to the division in the root chunk (essentially establishing a ToC).
para
Paragraphs. Any element appearing within a paragraph style element, and which is not described in doctype.xml, will be treated as inline instead of block.
preformatted
Preformatted text. This will be rendered as a block of text unless the element is also marked as inline.
ordered-list
unordered-list
list-item
Elements describing lists.
definition-list
dlentry
dt
dd
Elements describing definition lists.
bold
italic
underline
emphasis
strikethrough
superscript
subscript
monospace
Inline styles.
blockquote
A block quote.
indexterm
An index term. The contents will be hidden in output by the default stylesheet but boosted in the search record for the document.
no-search
The text will appear but not be included in the full-text index.
hidden
The text will be included (but not boosted) in the search index, but not displayed.

Namespaces in doctype.xml

If you need to identify elements or attributes in doctype.xml that are in a namespace, you do so by declaring the namespace with a prefix on the root <doctype> element, then simply using that prefix when you identify content. For example, to describe DocBook 5, which has a default namespace, you would do something like:

<doctype xmlns:d="http://docbook.org/ns/docbook">
  <links>
    <idref match="d:xref" linkend="linkend"/>
  </links>
  <!-- etc. -->
</doctype>

Doctype Descriptor DTD

Here is the full DTD of doctype.xml.

<?xml version="1.0" encoding="UTF-8"?>
<!-- =============================================================== -->
<!-- Titania Delivery Document Type Descriptor                       -->
<!-- Describes a document type's elements for processing by Titania  -->
<!-- Delivery. Accessed via the public identifier                    -->
<!--    <!DOCTYPE doctype PUBLIC                                     -->
<!--      "-//Titania//DTD Document Type Descriptor 1.0//EN"         -->
<!--      "doctype.dtd">                                             -->
<!-- =============================================================== -->
<!ENTITY % heading "title?, documentation?">
<!ENTITY % common-atts "match CDATA #REQUIRED html-element NMTOKEN #IMPLIED">
<!ENTITY % types
    "inline |
    titled-block |
    title |
    division |
    para |
    preformatted |
    monospace |
    ordered-list |
    unordered-list |
    list-item |
    definition-list |
    dlentry |
    dt |
    dd |
    bold |
    italic |
    underline |
    emphasis |
    indexterm |
    no-search |
    hidden |
    blockquote |
    strikethrough |
    superscript |
    subscript"
>
<!ENTITY % type "type (%types;) #IMPLIED">

<!ELEMENT doctype
    ((%heading;),
    (links |
    graphics |
    specials |
    profiling-atts |
    comment-element)*)
>
<!ATTLIST doctype
    id CDATA #REQUIRED
    id-attr NMTOKEN #IMPLIED
    lang-attr NMTOKEN #IMPLIED
>

<!ELEMENT title (#PCDATA)*>
<!ELEMENT documentation (#PCDATA)*>

<!ELEMENT specials ((%heading;), special*)>
<!ATTLIST specials %type;>
<!ELEMENT special EMPTY>
<!ATTLIST special
    %common-atts;
    %type;
>

<!ELEMENT links ((%heading;), (idref | external | fileref)*)>
<!ELEMENT idref EMPTY>
<!ATTLIST idref
    %common-atts;
    linkend NMTOKEN #REQUIRED
>
<!ELEMENT external EMPTY>
<!ATTLIST external
    %common-atts;
    href NMTOKEN #REQUIRED
>
<!ELEMENT fileref EMPTY>
<!ATTLIST fileref
    %common-atts;
    href NMTOKEN #REQUIRED
>

<!ELEMENT graphics ((%heading;), graphic*)>
<!ELEMENT graphic EMPTY>
<!ATTLIST graphic
    %common-atts;
    src NMTOKEN #REQUIRED
>

<!ELEMENT profiling-atts ((%heading;), profiling-att*)>
<!ELEMENT profiling-att EMPTY>
<!ATTLIST  profiling-att
    name NMTOKEN #REQUIRED
    delim CDATA ' '
>

<!ELEMENT comment-element EMPTY>
<!ATTLIST comment-element
    name NMTOKEN #REQUIRED
    dispositionAttr NMTOKEN 'disposition'
    authorAttr NMTOKEN 'author'
    timestampAttr NMTOKEN 'time'
    dataElement NMTOKEN #IMPLIED
    dataKeyAttr NMTOKEN 'name'
    dataValueAttr NMTOKEN 'value'
    wrapReplies (true|false) 'true'
>