Extracting Metadata from XML Files

You can customize metadata gathering using XSLT to transform the XML content into a simpler XML structure listing the metadata names and values for that content. The XSLT should be placed at /HARP-META/metadata.xsl either in a document type or a project. If placed within a document type, it will apply to all content using that document type in any project. If placed within a project, it will apply to all XML content in that project.
Note: This means that the XSLT should be able to successfully be applied to all XML content that might appear in the project, whatever its structure or doctype. It should probably contain template rules matching only on the expected root element names and/or DITA classes.

The output of this XSLT must follow this structure:

  • The root element can have any name.
  • Directly within the root element there must be a list of <metadata> elements. The @name attribute specifies the metadata name, and the textual contents of the element are its value.
  • If a given metadata name should have more than one value, then generate multiple <metadata> elements with the same @name attribute.
  • By default, metadata assigned to DITA maps will also be applied to the contextualized copies of the topics referenced by that map. If you do not want this to be the case, specify cascades="false" on the <metadata> element.

Here is an example structure that represents two metadata fields:

  • mdName=First Value, Second Value
  • mdName2=Second Metadata Entry, and will not cascade to topics.
<root>
  <metadata name="mdName">First Value</metadata>
  <metadata name="mdName">Second Value</metadata>
  <metadata name="mdName2" cascades="false">Second Metadata Entry</metadata>
</root>

The output of all applicable metadata transforms will be combined together and applied to the content during processing.

Capturing Keywords as Metadata

If we want to capture all of the DITA keyword elements in a topic and store them in a keywords multi-valued metadata, our XSLT might look like this:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="2.0">
    
    <!-- Create the root tag -->
    <xsl:template match="/">
        <root>
            <xsl:apply-templates/>
        </root>
    </xsl:template>
    
    <!-- Capture keywords -->
    <xsl:template match="keyword">
        <metadata name="keywords">
            <xsl:value-of select="normalize-space(.)"/>
        </metadata>
    </xsl:template>
    
    <!-- Suppress text -->
    <xsl:template match="text()"/>
    
</xsl:stylesheet>