Skip to main content

priint:cloud XML Data Structure

This document describes the flexible content data input format to be used with the priint cloud rendering.

Data coming from content systems should be in this format. If the content system cannot provide this target format then - in some cases - the input can be transformed within the rendering service itself.

note

Transformations must be defined as XSLT format.

JSON input can be transformed into a standard XML, which then can be transformed using XSLT.

or a transformation must be provided to bring them into this format. Transformations can either be done in Enterprise Service Bus or Integration Hub tools like Talend or as XSL transformations within the priint cloud rendering itself. The latter is an additional service with the priint cloud rendering.

Requirements

The format must be easily processable within comet for InDesign or PdfRenderer.

Comet supports XML offline projects. XML offline means that content or product data for a rendering are provided in a single local XML file. This contains three restrictions

  • All content data come in a single file – no more content data are requested from external sources after the rendering was started
  • The file is a local file
  • The file format is XML

The first restriction is the most important. It mainly says that all content data must be known in advance.

The second restriction may be dropped in the future to support HTTP URIs. The third restriction (XML) stems from the fact that comet supports two query languages on local files, a) comet xmlquery and b) comet xpath. Both working on XML documents1.

Given the two query languages xmlquery and the comet version of xpath, we strongly favor the use of xmlquery because the way how xpath is embedded in comet does not yield satisfying performance.

Xmlquery works well with simple XML documents but gets ugly with deeply nested structures. Therefore, we oppose these two further restrictions:

  • XML data should not be deeply structured.
  • The structure should allow for optimal utilization of xmlquery indexes.

Main Elements

The format is very simple, generic, and extensible.

A root element <data/> is defined to allow for a view metadata attributes for the whole document, like created, revision, expiry that may affect the caching behavior of priint cloud rendering. Otherwise the root <data/> element inherits from the main <item/> element.

A main <item/> element to contain product data content as well as relations.

A <display/> element and related display attribute to define rules for data aggregation that will be run before the rendering.

<item/> is an embeddable structure (items can be children of items) and is a potentially inheritable structure (items can be based on other items utilizing the idref attribute).

<data/>

  • Parents
    • This is only allowed as root element
  • Children
    • can have 0..n item children. items must be locally unique using a compound key build from type and id
    • can have 0..n display children. display must be locally unique using their id as key
    • can have any additional “custom” children to allow for project specific extensions
  • Attributes
    • All attributes from item plus the following
    • @version
      • Fixed version string for the document type. Initially during development “1.0-snapshot” – later in first release “1.0”.
    • @revision
      • Optional revision number for this document
    • @created
      • Optional date when this document was created (technically a xsd:DateTime value)
    • @expires
      • Optional expiry date for this document (technically a xsd:DateTime value)

<item/>

  • Parents
    • child element of either <data/> or another <item/>
      • Children
    • /item
      • can have 0..n item children
    • /display
      • Optional hints for how to display the content of this items and its sub items
    • /*(any)
      • can have any additional “custom” children to allow for project specific extensions
      • Attributes
    • @type
      • Optional “class” type of the element. Typically, the entity type used in the content system.
    • @id
      • Optional identifier of the element. If used it must be “locally unique” for all siblings of the same type. Example: a product code in the content system.
    • @idref
      • Optional reference to an identifier of another item element on top level. The reference is established by the xpath expression: /data/item[@id=$idref and @type=$type]
      • This is a shortcut for another element. In implementations the shortcut may be replaced by a copy of the destination element while applying all attributes (except idref) and children of the current item to the copied item.
    • @key
      • Optional grouping key.
    • @label
      • Optional localized name of the element.
    • @content
      • Optional content of the element.
      • Typically, plain text or markup content. In cases where the contentType is binary like for image/png the content must contain the base64 encoded blob or be empty. If empty @uri must be set, and @uri is the means to get the content.
    • @contentType
      • Optional mime type of the content. Default is plain text.
    • @uri
      • Optional reference to an external resource as used in media assets.
    • @lang
      • Optional language code, same as in xml:lang
    • @base
      • Optional base uri for resolving uri
    • @display
      • ID reference to a <display/> element defined on the root level. Can be overridden by a <display/> element child of the current item.

<display/>

The display element does not represent content data, but hints how to interpret or aggregate the content data in relation to printed outputs. Display is to be evaluated within the priint cloud rendering preparation phase.

We only describe a small subset of elements here that intends to support some simple table representations.

  • Parents
    • child element of either <data/> or <item/>. <display/> elements on root level (children of <data/>) make up a “repository” of displays that can be referred to on <item/> level using the @display attribute. <display/> elements as children of <item/> set the unique display of individual tables.
  • Attributes
    • @id
      • ID of this display element
    • @type
      • table is the only currently defined display type. It is meant to return” <table/> elements.
      • Children
    • /col
      • (for display type="table" only) Column definition. The only element specified in the examples in the appendix.

<col/>

  • Parents
    • Child of a <display type="table"/> element.
  • Attributes
    • @label
      • Literal name of the column or path to the name in the child items representing the column in the data.
    • @key
      • Name of the column. Used in mapping child items by key to this column.
    • @value
      • Path to the cell value in an item representing the cell in a row.
    • @valueType
      • Specification whether the value is a simple string or a numerical or date value – and how it is to be computed.
    • @formatPattern
      • String pattern applied to the identified value.
    • @style
      • Hint for a CSS class or InDesign format.
    • @align
      • Vertical alignment of the column.
    • @width
      • Width of the column in CSS like measures (px, pt, cm, %).

<table/>

An extended HTML 5 table. Content type is actually application/xhtml+xml, because we embed into XML. For table details see the appendix.

The table may contain the following elements in a strict hierarchy.

  1. <table/>
  2. <caption/> | <thead/> | <tbody/> | <tfoot/> | <colgroup/> | <col/> | <figcaption/>
  3. <tr/>
  4. <th/> | <td/>

All elements set in bold allow mixed HTML content to be embedded.

Figure captions <figcaption/> are an extension to make our code easier. In HTML all tables would be embraced by a <figure/> element and the <figcaption/> would be moved from within the table to be the last element of the figure.

All elements allow for a custom contentType attribute with any of the following values text/html, text/indesign-tagged-text, text/plain. If any of these values is present, then the textContent of the XML (CDATA) is interpreted conformant to this setting. If contentType is not set, then the innerXML of the input is interpreted as text/html.

If rendered as InDesign table, the class attributes will be interpreted as named table or cell formats.

Enumerations

Content Type

Media type to be supported for item values or table cell text contents.

  • HTML - text/html
    • Any HTML markup. Could also be XHTML. In some cases, applications may choose to transform content to XHTML before further processing, allowing for legacy and partially invalid HTML as most browsers would do.
  • XHTML - application/xhtml+xml
    • HTML as XML markup. If markup does not conform to the XHTML specs an error will occur.
  • PLAIN_TEXT - text/plain
    • Plain text. In most cases line breaks (NL) will be handled equivalent to <br/> in HTML as opposed to <p/>.
  • TAGGED_TEXT - text/adobe-tagged-text
    • Adobe InDesign Tagged text. May include w2 extensions.
    • Adobe did not specify an official media type for this. We call it text/adobe-tagged-text since this term is already used in publishing server plug-ins as HtmlToTaggedText or TableDataToHtml.

Alignments

In displays we may want to give hints where the content should be placed within the available space for a rendering element. The following two enumeration are the safe set of options provided from InDesign TaggedTexts and from HTML5 specifications. AUTO always means inherit from parent or default.

Applying the values on different objects may lead to different effects.

  • Vertical
    • AUTO
    • BOTTOM
    • MIDDLE
    • TOP
  • Horizontal
    • AUTO
    • CENTER
    • JUSTIFY
    • LEFT
    • RIGHT

Length Attributes

Length attributes like “width”, “height”, “size” etc. support values in different units.

In old TableData objects all length attributes are simple floats intended to be used as millimeters. In Web screen design pixel is the most often used unit. 1px is 1/54 of an inch and 1in is 25.4mm. Absolute measures like px, pt, pc, mm, cm, and in can be computed into each other. We expect either mm or px here. Other measures are relative to another setting or computation like ex, em, percent. These can only be transformed if the rendering context is known. That is why we do not recommend using them.

If no unit is attached to a length string the default is “mm”, since then main media target is print.

PROPORTIONAL - as sometimes used in HTML 4 or lower - is not supported. Please use PERCENT instead.

  • CENTIMETER “cm”
  • FONTSIZE “em”
  • INCH “in”
  • MILLIMETER “mm”
  • PERCENT “%”
  • PICA “pc”
  • PIXEL “px”
  • POINT “pt”
  • XHEIGHT “ex”

Identifiers

Identifiers are arbitrary strings. The identifiers are generated by the content system or the ESB or integration hub finally creating the input data. We do not impose any restriction on the identifier strings except of trimming trailing whitespace.

Identifiers must be locally unique, which means only one element of the same element type (item or display) with the same type attribute and identifier attribute is allowed in the set of direct children of an element.

idref attributes are always resolved using the top level of elements the children of the root). An idref on an item is resolved using xpath equivalent /data/item[@id=$idref and @type=$type]. An idref on a display is resolved using xpath equivalent /data/display[@id=$idref].

Localization

Localization is often needed to execute rendering, because we need information on hyphenation rules, decimal formatting, sorting etc.

Language and regional (country) context are given via the @lang attribute. The @lang attribute must conform to IETF BCP 47. The value is inherited down the item hierarchy. To establish a neutral context in a deeper structure an empty string must be provided.

Languages and countries not supported by ISO norms might be used if a mapping is given. So, lang=”eng-EU” is allowed to specify European Union as a region (or e.g. “WW” for “world wide”).

Note that these custom regions may not always processed properly by sub-processes (e.g. when creating a dynamic URI for an image using country as context).

Additional Context Attributes

Other typical context attributes like targetGroup or assortment are not defined in this format. The main role of these context properties is to filter those data in the content system that should be used for a specific document. In case of priint cloud rendering we expect that the input data stream already is filtered (and sorted), so that the rendering process does not need to do anything here. Filtering and sorting can get complicated if many conditions are involved. SQL is well suited to do such a thing, but typical expression languages used for XML or JSON are not (XPath, JsonPath, XmlQuery – W3C XQuery being an exemption).

If for specific use cases we later need a filtering on comet side, we can always use custom node, either as key value or as custom element:

<item key="assortment" value="SummerOf69"/>
<assortment>SummerOf69</assortment>

XmlQuery can react on this, but it implies custom XmlQuery or CScript code on comet side.

Order of Items

Items in the result will be processed in the order provided in the input. I.e. the input transformation creating the <data/> document is responsible for the correct order.

Using Reference Items

The recommended style for using the generic structure is the “database” style.

In database style the depth of the content data should not exceed two levels of items. All items having children must be defined on the top level and must be referred to via idref if used as children of other items. Elements not having any children (terminal elements) need not to be defined on top level.

If you inspect the item with id “i4” in the example below you may observer that content entities of type “description” or “productImage” are just embedded with their value on level two, whereas bucket entity “variantProduct” is referenced using idref.

<data>
<item type="product" id="i4" label="Item 0004">
<item type="variantProduct" idref="i5"/>
<item type="productImage" uri="https://werkii.example.de"/>
<item type="description" id="i11" content="Hello &lt;i&gt;World&lt;/i&gt;" lang="en-US" contentType="text/html"/>
</item>
<item type="variantProduct" id="i5">
...
</item>
</data>

Display

The <display/> element contains hints how to aggregate or transform the content data in relation to printed outputs. Display is to be evaluated within the priint cloud rendering preparation phase.

Display elements of type table contain rules how to aggregate the items in the input into HTML 5 tables to be used in renderer.

Other display types may be defined in future. Examples for display elements and how to interpret them are given in the appendix.

The display element makes only sense if there are associated transformation implementations.

Display elements are typically referenced within an item using the “display” attribute as an idref to display repository.

The display repository is a combined set of all display elements known from the priint cloud rendering configuration and from the input data itself (top level only).

Configuration is meant in a very broad way here. Transformations can – in theory - be configured by end user wizards in a GUI, transformations can be hold in XML or JSON etc. documents, but finally they will materialize as code in a programming language or some kind of templating mechanism like XSLT for instance.

Appendix A – Examples

Examples for using <display/> element and how it interacts with the content items to produce a output structure. The rules to select or identify items and to map items are specified using XPath 1.0 in this case.

Key-Value Table

A key-value table is a two-column table where the left column contains a key or key label and the right column contains the value for the key.

<data>
<display type="table" id="keyValue" >
<col label="CHARACTERISTICS" content="@key" align="left" width="40%" />
<col label="VALUE" content="@content" align="right" />
</display>
<item label="Product Description" >
<item key="10NC" label="10NC" content="&lt;p&gt;9240 36217200&lt;/p&gt;" contentType="text/html" />
<item key="Type Number" label="Type Number" content="9003" />
<item key="ECE" label="ECE" content="yes" />
<item key="SAE" label="SAE" content="yes" />
<item key="INMETRO" label="INMETRO" content="aber sicherlich!" />
<item key="Cap" label="Cap" content="P43t-38" />
<item key="Comments" label="Comments" content="Carlamp, 190 sheet &quot;reglemented lamp&quot;" />
</item>
</data>

We expect a two-layer item structure with nested item layers for table, rows.

The key and content attribute of a row will represent cell contents for column 1 and 2.

item[@display="keyValue"]/item[@key and @content]

We will create 2 columns because of

count(display[@id="keyValue"]/col) = 2

We will create n rows as given by

count(item[@display="keyValue"]/item[@key and @content]) = n

The column header contains 1 row.

The content for the header cells come from display/col/@label.

The width for the columns come from display/col/@width.

The alignment for the cells come from display/col/@align.

The content for the first cell of a row comes from item[@display="keyValue"]/item/@key.

The content for the second cell of a row comes from item[@display="keyValue"]/item/@content.

In both cases this is something like:

item[@display="keyValue"]/item/@*[name()=substringafter(display/col/@content, "@")]

Keyed-item-cell-Table

A keyed-item-cell table represents a multi-column table. The columns must be explicitly defined. The column keys must match item keys.

Left column with a key and right column with the value for the key.

<data>
<display type="table" id="itemPerCell">
<col key="Par" label="PARAMETER" content="@label" width="30%" />
<col key="SC" label="SC" width="10%" />
</display>
<item label="Photometry - General / ECE" display="table0001">
<item key="Stable_time" label="Stable time" content="2">
<item key="Val" content="2" />
<item key="SC" label="SC" />
<item key="Com" label="Comment / Instruction" content="With test voltage (each filament)" />
</item>
<item key="Test_voltage_HB" label="Test voltage HB" content="-">
<item key="Val" label="Test voltage HB" content="-" />
<item key="SC" label="SC" />
<item key="Tol" label="Tol." content="-" />
<item key="Com" label="Comment / Instruction" />
</item>
</item>
</data>

We expect a three-layer item structure with nested item layers for table, rows, cells.

Cell items with a specific key match to columns with the same key.

item[@display="itemPerCell"]/item/item

We will create 6 columns because of

count(display[@id="itemPerCell"]/col) = 2

We will create 2 rows because of

count(item[@display="itemPerCell"]/item[@key]) = 2

The column header contains 1 row.

The content for the header cells come from display/col/@label.

The width for the columns come from display/col/@width.

The alignment for the cells come from display/col/@align.

The value for the cell 1 in row 1 comes from

$key = display[@id="itemPerCell"]/col[1]/@key
item[@display="itemPerCell"]/item[1]/item[@key=$key]/@content

Content can be empty.

Extended Keyed-item-cell-Table

This defines a table where the number of columns is not known in advance but computed from the available data. This is an extended version of "itemPerCell" table.

A specific column with “wildcard” key (*) can be used.

The number and keys of the columns are computed from the distinct item/item/item/@key excluding the already explicitly defined column keys.

The col/@content may contain "@key" or "@label" as hints to get the column header label from the (first non empty) data item attribute with that name.

These "any" keys are brought in a) doc order, b) alphabetical order of key name, c) in alphabetical order of computed column label.

All "any" columns will wither have not explicit width (HTML) or then same width (PDF) as computed proportion of the total width given in the width attribute of the "any" column definition.

<data>
<display type="table" id="dynTable">
<col key="Par" label="PARAMETER" width="30%" />
<col key="*" content="@label" width="60%" />
<col key="SC" label="SC" width="10%" />
</display>
<item label="Photometry - General / ECE" display="dynTable">
<item key="Stable_time" label="Stable time" content="2">
<item key="Par" content="Hello" />
<item key="Val" content="2" />
<item key="SC" label="SC" />
<item key="Com" label="Comment / Instruction" content="With test voltage (each filament)" />
</item>
<item key="Test_voltage_HB" label="Test voltage HB" content="-">
<item key="Par" content="World" />
<item key="Val" label="Test voltage HB" content="-" />
<item key="SC" label="SC" />
<item key="Tol" label="Tol." content="-" />
<item key="Com" label="Comment / Instruction" content="Eins-zwei-drei"/>
</item>
</item>
</data>

Keys

Par | Com | Tol | Val | SC

Resulting table

PARAMETERComment / InstructionTol.ValSC
HelloWith test voltage (each filament)2
WorldEins-zwei-drei--