Extracting Metadata from XML Files
You can customize metadata gathering using XSLT to transform the XML content into
a
simpler XML structure listing the metadata names and values for that content. The
XSLT should be
placed at
/HARP-META/metadata.xsl
either in a document type or a project.
If placed within a document type, it will apply to all content using that document
type in any
project. If placed within a project, it will apply to all XML content in that
project. Note: This means that the XSLT should be able to successfully be applied to all XML
content that might appear in the project, whatever its structure or doctype. It should
probably contain template rules matching only on the expected root element names and/or
DITA
classes.
The output of this XSLT must follow this structure:
- The root element can have any name.
- Directly within the root element there must be a list of
<metadata>
elements. The@name
attribute specifies the metadata name, and the textual contents of the element are its value. - If a given metadata name should have more than one value, then generate multiple
<metadata>
elements with the same@name
attribute. - By default, metadata assigned to DITA maps will also be applied to the contextualized
copies of the topics referenced by that map. If you do not want this to be the case,
specify
cascades="false"
on the<metadata>
element.
Here is an example structure that represents two metadata fields:
- mdName=First Value, Second Value
- mdName2=Second Metadata Entry, and will not cascade to topics.
<root> <metadata name="mdName">First Value</metadata> <metadata name="mdName">Second Value</metadata> <metadata name="mdName2" cascades="false">Second Metadata Entry</metadata> </root>
The output of all applicable metadata transforms will be combined together and applied to the content during processing.
Capturing Keywords as Metadata
If we want to capture all of the DITA keyword
elements in a topic and
store them in a keywords
multi-valued metadata, our XSLT might look like
this:
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <!-- Create the root tag --> <xsl:template match="/"> <root> <xsl:apply-templates/> </root> </xsl:template> <!-- Capture keywords --> <xsl:template match="keyword"> <metadata name="keywords"> <xsl:value-of select="normalize-space(.)"/> </metadata> </xsl:template> <!-- Suppress text --> <xsl:template match="text()"/> </xsl:stylesheet>