Doctype Descriptor Format
/HARP-META/doctype.xml
file
describing how their elements behave.Technically, the only configuration that must be in doctype.xml
are
- The ID and language attributes
- Profiling attributes
- Links
- Graphics
- Divisions (chunk boundaries)
- Titles, for populating empty ID-IDREF links and creating ToC navigation of chunked divisions.
Titania Delivery needs this information in order to properly handle the references
in a
document. Most of the other configuration that can go into doctype.xml
is
more presentational. However, describing presentational elements in this file allows
TD to
apply default styling, leaving little to no need for doctype-specific XSLT.
Structural Elements
- <doctype>
- This is the root element of the file. It has the following attributes.
- @id
- Required. An identifier for the document type, captured as a property on documents of this type.
- @id-attr
- Optional. The ID attribute from the document type. TD will look for @xml:id by default.
- @lang-attr
- Optional. The attribute in the DTD used to identify the language of an element. TD will look for @xml:lang by default.
- <comment-element>
- A single top-level tag describing the markup to use when embedding comments in XML.
If
no <comment-element> is present, then TD will use namespaced
markup for comments.
- @name
- Required. The commenting element name.
- @dispositionAttr
- The attribute to use for disposition values.
- @authorAttr
- The attribute to use for the author's name.
- @timestampAttr
- The attribute to use for the timestamp.
- @dataElement
- The element to use for comment metadata, nested within a comment.
- @dataKeyAttr
- The attribute on the data element for the metadata key.
- @dataValueAttr
- The attribute to use for metadata values on the data element.
- @wrapReplies
- True or false. True by default. If true, replies will be wrapped in a data element with a key of "replies." (This is necessary in many content models where comment tags cannot be directly nested.)
- <links>
- Container for link element descriptions <idref>, <external>, and <fileref>.
- <graphics>
- Container for graphic element descriptions (<graphic>).
- <specials>
- Container for <special> descriptions. May carry a @type attribute containing the default special type for the nested descriptions.
- <profling-atts>
- Container for <profiling-att> descriptions.
Descriptive Elements
Each of the elements in the structural group above can optionally start with a <title> and/or <description> element providing a human-readable overview of the elements contained within. These are not required for the system, but are usful when maintaining the file.
Common Attributes to All Markup Description Tags
The following attributes appear on every markup description tag described below.
- @match
- The XPath pattern used to identify the elements being described. This will most often
be a simple element name, but could also be an OR-ed list of tag names, or use a
predicate to describe different elements with the same name differently based on
attributes or context. For
example:
<special match="title" type="title"/> <special match="literallayout | programlisting | screen" type="preformatted"/> <special match="emphasis[@role='bold']" type="bold"/> <special match="emphasis[@role='italic']" type="italic"/> <special match="emphasis[@role='underline']" type="underline"/> <special match="emphasis[@role='strikethrough']" type="strikethrough"/> <special match="emphasis" type="emphasis"/>
When there is ambiguity between two configurations - for example,<emphasis role="bold">
is matched by bothemphasis[@role='bold']
andemphasis
- the earlier configuration in the document will apply. When creating a configuration file, you should put your most specific rules first, and your most generic rules last. - @html-element
- Optional. The default HTML element to be generated by the default HTML transform for elements that match the pattern in @match. The HTML equivalent is frequently inherent in the description (e.g. links should be come <a>, graphics should become <img>, bold elements should become <b>, etc.)
Profiling Attribute Configuration
The <profling-atts> element can contain any number of <profiling-att> elements defining the attributes used for profiling. The available attributes are:
- @name
- The name of the attribute.
- @delm
- The delimiter for multiple values. The default is a space.
Link Descriptors
- <idref>
- Describes an ID-IDREF linking element.
- @linkend
- Required. The attribute that will contain the ID of the link target.
Note: In order for ID-IDREF links to work properly, the @id-attr attribute on the root <doctype> element must be set correctly.<idref match="xref" linkend="linkend"/>
This will match
<xref linkend="abc"/>
. - <external>
- Identifies external (generally web) links.
- @href
- The name of the attribute identifying the URI of the target.
<external match="ulink" href="href"/>
This will match
<ulink href="http://whatever"/>
. - <fileref>
- Identifies links that refer to another file in the system.
- @href
- The attribute containing the URI of the target file.
<external match="fileref" href="href"/>
This will match
<fileref href="note.xml"/>
Graphic Descriptors
- <graphic>
- Names a graphic element.
- @src
- The attribute naming the URI of the target graphic.
Note: Currently, there is no configuration for sizing attributes, placement (inline vs. block), or other presentational aspects of graphic references. We should consider adding these. For now, these can be accounted for inhtml.xsl
.
Specials
The <special> element is a sort of catch-all "everything else" configuration element. It carries a @type attribute describing the type or types of element it represents.
An element may have more than one type. For example, an inline code snippet would
be
described with type="preformatted inline"
.
The available type values are:
- inline
- Inline elements. The generic stylesheet styles everything as a block by default, so inline elements should be enumerated explicitly unless they only appear inside other elements configured with type="para".
- titled-block
- Blocks with titles. Empty IDREF links to titled blocks will be populated with the block's title.
- title
- The element used for titles of titled blocks and divisions.
- division
- A document division. When processing a document, it will be chunked at the division elements, replaced by a link to the division in the root chunk (essentially establishing a ToC).
- para
- Paragraphs. Any element appearing within a paragraph style element, and which is not
described in
doctype.xml
, will be treated as inline instead of block. - preformatted
- Preformatted text. This will be rendered as a block of text unless the element is also marked as inline.
- ordered-list
- unordered-list
- list-item
- Elements describing lists.
- definition-list
- dlentry
- dt
- dd
- Elements describing definition lists.
- bold
- italic
- underline
- emphasis
- strikethrough
- superscript
- subscript
- monospace
- Inline styles.
- blockquote
- A block quote.
- indexterm
- An index term. The contents will be hidden in output by the default stylesheet but boosted in the search record for the document.
- no-search
- The text will appear but not be included in the full-text index.
- hidden
- The text will be included (but not boosted) in the search index, but not displayed.
Namespaces in doctype.xml
If you need to identify elements or attributes in doctype.xml
that are
in a namespace, you do so by declaring the namespace with a prefix on the root
<doctype> element, then simply using that prefix when you identify
content. For example, to describe DocBook 5, which has a default namespace, you would
do
something like:
<doctype xmlns:d="http://docbook.org/ns/docbook"> <links> <idref match="d:xref" linkend="linkend"/> </links> <!-- etc. --> </doctype>
Doctype Descriptor DTD
Here is the full DTD of doctype.xml
.
<?xml version="1.0" encoding="UTF-8"?> <!-- =============================================================== --> <!-- Titania Delivery Document Type Descriptor --> <!-- Describes a document type's elements for processing by Titania --> <!-- Delivery. Accessed via the public identifier --> <!-- <!DOCTYPE doctype PUBLIC --> <!-- "-//Titania//DTD Document Type Descriptor 1.0//EN" --> <!-- "doctype.dtd"> --> <!-- =============================================================== --> <!ENTITY % heading "title?, documentation?"> <!ENTITY % common-atts "match CDATA #REQUIRED html-element NMTOKEN #IMPLIED"> <!ENTITY % types "inline | titled-block | title | division | para | preformatted | monospace | ordered-list | unordered-list | list-item | definition-list | dlentry | dt | dd | bold | italic | underline | emphasis | indexterm | no-search | hidden | blockquote | strikethrough | superscript | subscript" > <!ENTITY % type "type (%types;) #IMPLIED"> <!ELEMENT doctype ((%heading;), (links | graphics | specials | profiling-atts | comment-element)*) > <!ATTLIST doctype id CDATA #REQUIRED id-attr NMTOKEN #IMPLIED lang-attr NMTOKEN #IMPLIED > <!ELEMENT title (#PCDATA)*> <!ELEMENT documentation (#PCDATA)*> <!ELEMENT specials ((%heading;), special*)> <!ATTLIST specials %type;> <!ELEMENT special EMPTY> <!ATTLIST special %common-atts; %type; > <!ELEMENT links ((%heading;), (idref | external | fileref)*)> <!ELEMENT idref EMPTY> <!ATTLIST idref %common-atts; linkend NMTOKEN #REQUIRED > <!ELEMENT external EMPTY> <!ATTLIST external %common-atts; href NMTOKEN #REQUIRED > <!ELEMENT fileref EMPTY> <!ATTLIST fileref %common-atts; href NMTOKEN #REQUIRED > <!ELEMENT graphics ((%heading;), graphic*)> <!ELEMENT graphic EMPTY> <!ATTLIST graphic %common-atts; src NMTOKEN #REQUIRED > <!ELEMENT profiling-atts ((%heading;), profiling-att*)> <!ELEMENT profiling-att EMPTY> <!ATTLIST profiling-att name NMTOKEN #REQUIRED delim CDATA ' ' > <!ELEMENT comment-element EMPTY> <!ATTLIST comment-element name NMTOKEN #REQUIRED dispositionAttr NMTOKEN 'disposition' authorAttr NMTOKEN 'author' timestampAttr NMTOKEN 'time' dataElement NMTOKEN #IMPLIED dataKeyAttr NMTOKEN 'name' dataValueAttr NMTOKEN 'value' wrapReplies (true|false) 'true' >