[Accessibility conventions are described at the bottom of the page]
*** This is a free preview excerpt of a commercial publication. ***
1. The context of XSLT and XPath
[> 2.][^^^]
1.0 Overview
[> 1.1][> 2.][< 1.][^^][^^^]
This chapter reviews the roles of the following Recommendations in the XML family and overviews contexts in which XSLT and
XPath are used.
Extensible Markup Language (XML)
[[1] - hierarchically describes an instance of information
[[2] - using embedded markup according to rules specified in the Recommendation
[2] - information is identified with a vocabulary of labels (a set of element types each with a name, a structure and optionally
some attributes) described by the user
][1] - optionally specifies a mechanism for the formal definition of a vocabulary
[[2] - controls the instantiation of new information
[2] - validates existing information is using the expected set of labels
]]
XML Path Language (XPath)
[[1] - the document model and addressing basis for XSLT and XQuery
]
Extensible Stylesheet Language Family (XSLT/XSL/XSL-FO)
[[1] - XSL Transformations (XSLT)
[[2] - specifies the transformation of structured information into a hierarchy using the same or a different document model primarily
for the kinds of transformations for use with XSL
][1] - XSL (Formatting Semantics, a.k.a. XSL-FO)
[[2] - specifies the vocabulary and semantics of the formatting of information for paginated presentation
[2] - colloquially referred to at times as XSL Formatting Objects
]]
Namespaces
[[1] - disambiguates vocabularies when mixing information from different sources
[1] - identifies the dictionary for the labels used to mark up information
]
Stylesheet Association
[[1] - names resources as candidates to be utilized as a stylesheet for processing an XML document
[[2] - does not modify the structural markup of the data
[2] - used to specify the rendering of an instance of information
]]
1.1 The XML family of Recommendations
[> 1.2][< 1.0][^^][^^^]
1.1.1 Extensible Markup Language (XML)
[> 1.1.2][> 1.2][> 2.][< 1.0][^^][^^^]
[[1] - [http://www.w3.org/TR/REC-xml]
[1] - [http://www.w3.org/TR/xml11]
]
A Recommendation fulfilling two objectives for information representation:
[[1] - expressing information in a hierarchical arrangement using XML-defined markup
[1] - restricting and/or validating the use of XML markup according to user-specified constraints
]
Document description and data description
[[1] - the roots of XML are from the ISO specification for Standard Generalized Markup Language (SGML) used for document description
[1] - any hierarchical arrangement can be expressed using XML
[1] - any non-hierarchical arrangement can be expressed hierarchically using XML
[1] - XML now commonly used for the description of many kinds of data because of the platform independence of the use of markup
and Unicode text
]
XML defines basic constraints on physical and logical hierarchies of syntax
[[1] - the concept of well-formedness with a syntax for markup languages
[[2] - the vocabulary and hierarchy of constructs in an instance of information is implicit according to the specified rules governing
syntactic structures
][1] - a language for specifying how a system can constrain the allowed logical hierarchy of information structures
[1] - the semantics of the user's vocabulary are not formally defined using XML constructs
[[2] - can be described in XML comments using natural language
[2] - are defined by the applications acting on the information
]]
Physical hierarchy (the content organization):
[[1] - single collection of information ("XML instance") from multiple physical resources ("XML entities")
[[2] - an XML file is not required to be comprised of more than one physical entity
[2] - physical modularization typically used to manage a large information set in smaller fragments
[2] - inappropriately used for XML fragment sharing due to parsing context
][1] - resource is nested syntactically using XML external parsed general entity construct
[[2] - each physical resource has a well-formed logical hierarchy
][1] - unparsed data entities in a declared notation are outside of the parsed hierarchy
]
[Figure 1.1: Physical hierarchy of external general entities
Connected triangles are shown, each representing an XML fragment. The leftmost labeled a.xml, connected through &b; to b.xml and through &c; to c.xml. b.xml itself is connected through &d; to d.xml and through &e; to e.xml.
Out of each of c.xml and e.xml are lines connected to a non-XML fragment labeled x.gif.
In the top left corner is a legend of file locations for the fragments:
Files:
adir/a.xml
adir/c.xml
adir/x.gif
bdir/b.xml
bdir/e.xml
bdir/ddir/d.xml
Below the legend is a fragment of the internal declaration subset from a.xml:
<!ENTITY b SYSTEM "../bdir/b.xml">
<!ENTITY c SYSTEM "c.xml">
<!ENTITY d SYSTEM "../bdir/ddir/d.xml">
<!ENTITY e SYSTEM "../bdir/e.xml"
<!NOTATION gif-file SYSTEM "gif-uri">
<!ENTITY x SYSTEM "x.gif" NDATA "gif-file">
]
Logical hierarchy (the information):
[[1] - single collection of information ("XML instance") comprised of multiple nested containers (XML elements, attributes, text,
etc.) where each container is labeled with a name
[1] - each piece is expressed using an XML construct at a user-defined granularity
[1] - the nested breakdown of the information is hierarchical
]
[Example 1-1:
A well-formed XML purchase order instance01 <?xml version="1.0"?>
02 <purchase>
03 <customer db="cust123"/>
04 <product db="prod345">
05 <amount>23.45</amount>
06 </product>
07 </purchase>
]
The implicit document model exists by the mere presence of logical hierarchy
[[1] - the markup of the XML constructs demarcates the locations of the information in the hierarchy
[1] - data model is comprised of family-tree-like relationships of parent, child, sibling, etc.
]
[Figure 1.2: Logical hierarchy of information
The top shows "(XML instance)" in italics, with an arrow down to the document element named purchase. Two solid arrows point to the element children, customer on the left and product on the right. product itself has a solid arrow down to amount. Each of customer and product have a dotted arrow on the right to the attached attribute db of each element.
]
A logical hierarchy need not come from XML syntax
[[1] - through "data projection" the logical tree of any information that can be organized as if it came from XML syntax is indistinguishable
from that tree that actually does come from XML syntax
[1] - the data model doesn't retain whatever syntax was used (XML or otherwise) to create the logical tree
]
XML allows user constraints on the logical hierarchy (the vocabulary)
[[1] - defines the concept of validity with a syntax for a meta-markup language
[1] - Document Type Definition (DTD) describes the document model as a structural schema
[[2] - the vocabulary defines the logical hierarchy of the information constructs explicitly according to user-specified constraints
][1] - other structural and content schema languages exist for XML
[[2] - validation constraints extend to values found within text and attribute content
[2] - different approaches to describing models provide different benefits
][1] - constrains during generation and confirms during processing
[1] - does not convey semantics of information being marked up
]
[Example 1-2:
A valid XML purchase order instance01 <?xml version="1.0"?>
02 <!DOCTYPE purchase [
03 <!ELEMENT purchase ( customer, product+ )>
04 <!ELEMENT customer EMPTY>
05 <!ATTLIST customer db CDATA #REQUIRED>
06 <!ELEMENT product ( amount )>
07 <!ATTLIST product db CDATA #REQUIRED>
08 <!ELEMENT amount ( #PCDATA )>
09 <!ATTLIST amount currency ( GBP | CAD | USD ) "USD"> ]>
10 <purchase>
11 <customer db="cust123"/>
12 <product db="prod345">
13 <amount>23.45</amount>
14 </product>
15 </purchase>
]
The DTD can supplement the data model with additional information:
[Figure 1.3: Logical hierarchy of supplemented information
The top shows "(XML instance)" in italics, with an arrow down to the document element named purchase. Two solid arrows point to the element children, customer on the left and product on the right. product itself has a solid arrow down to amount. Each of customer and product have a dotted arrow on the right to the attached attribute db of each element. amount has a dotted arrow on the right to the attached attribute currency.
]
[[1] - note how the shape of the tree is different in the presence of defaulted attribute declarations
[[2] - the currency attribute is included in the tree when the DTD is present
[2] - without the DTD the logical tree for the sample instance does not include the currency attribute
[2] - the markup used is identical in both the example instances
]]
The equivalent set of document constraints on the logical hierarchy expressed using W3C Schema could be in purc.xsd:
[Example 1-3: Equivalent constraints in W3C Schema01 <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
02 <xsd:element name="purchase">
03 <xsd:complexType>
04 <xsd:sequence>
05 <xsd:element name="customer">
06 <xsd:complexType>
07 <xsd:attribute name="db" use="required"/>
08 </xsd:complexType>
09 </xsd:element>
10 <xsd:element name="product" maxOccurs="unbounded">
11 <xsd:complexType>
12 <xsd:sequence>
13 <xsd:element name="amount">
14 <xsd:complexType mixed="true">
15 <xsd:attribute name="currency" default="USD">
16 <xsd:simpleType>
17 <xsd:restriction base="xsd:string">
18 <xsd:enumeration value="GBP"/>
19 <xsd:enumeration value="CAD"/>
20 <xsd:enumeration value="USD"/>
21 </xsd:restriction>
22 </xsd:simpleType>
23 </xsd:attribute>
24 </xsd:complexType>
25 </xsd:element>
26 </xsd:sequence>
27 <xsd:attribute name="db" use="required"/>
28 </xsd:complexType>
29 </xsd:element>
30 </xsd:sequence>
31 </xsd:complexType>
32 </xsd:element>
33 </xsd:schema>
]
The hint that a particular W3C Schema applies to a document is given via reserved attributes
[[1] - a processor is not obliged to use the hints
]
[Example 1-4: A document referencing a schema01 <?xml version="1.0"?>
02 <purchase xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
03 xsi:noNamespaceSchemaLocation="purc.xsd">
04 <customer db="cust123"/>
05 <product db="prod345">
06 <amount>23.45</amount>
07 </product>
08 </purchase>
]
DTD declarations affecting the information set of the instance are significant to transform processing that is focused on
the implicit logical model of the instance:
[[1] - some attribute declarations in DTD are significant
[[2] - attribute list declarations impact transform processing by modifying the information set of the instance
[[3] - supply of defaulted attribute values for attributes not specified in start tags and empty tags of elements
[3] - declaration of ID-typed attributes (for ID/IDREF processing) that confer element identification uniqueness in an instance
[3] - declaration of attribute types affecting the attribute value normalization during XML processing
[3] - attribute information does not affect the well-formed nature of an XML instance
]][1] - all DTD content model declarations are not significant
[[2] - what the logical model could contain does not affect what the actual logical model does contain
]]
[P2.0]W3C Schema declarations inform the construction of the data model for the XML instance from the Post Schema Validation
Infoset
[[1] - only when a schema-aware processor is being used and when validation is engaged for the source files
[[2] - schema type assignment, default attribute and element value provision, white space normalization of element content
[2] - the user-supplied lexical form of elements and attributes with atomic schema types may be lost
][1] - when not validated, input information items are treated per the XML information set
[[2] - considered as having unknown data types
][1] - DTD default attribute value declarations override W3C Schema defaults
]
[P1.0]No respect of element content white space is implied by the content models
[[1] - a content model is defined as either element content (a content model without #PCDATA) or mixed content (a content model with #PCDATA)
[1] - the term "element content white space" is defined in [http://www.w3.org/TR/xml-infoset]
[[2] - sometimes colloquially termed elsewhere as "ignorable white space"
][1] - all white space is significant to most XSLT 1 processors
[1] - some recognition of white space can be influenced by the XSLT stylesheet
]
[P2.0]White space text node disposition is at user request
[[1] - strip all, preserve all, strip ignorable
]
XML Recommendation describes behavior
[[1] - required of an XML processor
[1] - how it must process an XML stream and identify constituent data
[1] - the information it must provide to an application
[1] - note that programming interfaces that have been standardized are separate initiatives and are not defined by the XML Recommendation
[[2] - tree-oriented paradigm using DOM (Document Object Model)
[2] - stream-oriented paradigm using SAX (Simple API for XML)
]]
An XML document is only a labeled hierarchy of information
[[1] - XML only unambiguously identifies constituent parts of a stream of hierarchical information
[1] - no inherent meanings or semantics of any kind associated with element types
]
No rendition or transformation concepts or constructs
[[1] - information representation only, not information presentation or processing
[1] - no defined controls for implying rendering semantics
[1] - the xml:space attribute signals whether white space in content is significant to the data definition
]
1.1.2 XML information links
[> 1.1.3][> 1.2][> 2.][< 1.1.1][^][^^][^^^]
Links to useful information
[[1] - [http://www.xml.com/axml/axml.html] - annotated version of XML 1.0
[1] - [http://xml.coverpages.org/xml.html] - Robin Cover's famous resource collection
[1] - [http://xml.coverpages.org/xll.html] - Extensible Linking Language
[1] - [http://xml.silmaril.ie/] - Peter Flynn FAQ
[1] - [http://www.xmlbooks.com/] - a summary of available printed books
[1] - [http://www.CraneSoftwrights.com/links/trn-20110211.htm] - training material
[1] - [http://www.CraneSoftwrights.com/resources] - free resources
[1] - [http://XMLGuild.info] - consulting and training expertise
[1] - [http://wiki.eclipse.org/PsychoPathXPathProcessor] - standalone XPath 2.0 processor
[1] - [http://xml.coverpages.org/elementsAndAttrs.html] - a summary of opinions
[1] - [http://google-styleguide.googlecode.com/svn/trunk/xmlstyle.html] - a corporate perspective
]
Related initiatives and specifications
[[1] - [http://www.w3.org/TR/2004/REC-xml-infoset-20040204] - XML Information Set
[1] - [http://www.w3.org/TR/xmlschema-0/] - W3C XML Schema
[1] - [http://www.relax-ng.org] - ISO/IEC 19757-2 RELAX NG (based on RELAX and TREX)
[1] - [http://www.schematron.com] - ISO/IEC 19757-3 Schematron
[1] - [http://www.nvdl.org] - ISO/IEC 19757-4 Namespace-based Validation Dispatching Language (NVDL)
[1] - [http://www.w3.org/TR/DOM-Level-2/] - Document Object Model Level 2
[1] - [http://www.saxproject.org] - Simple API for XML
]
1.1.3 XML Path Language (XPath)
[> 1.1.4][> 1.2][> 2.][< 1.1.2][^][^^][^^^]
Representing structured information
[[1] - [P1.0][http://www.w3.org/TR/xpath]
[1] - [P2.0][http://www.w3.org/TR/xpath20]
[1] - a data model for representing the information found in an XML document as an abstract node tree
[[2] - the original markup syntax is not preserved
[2] - the user constraints on the document model (e.g. DTD content models) are not germane
[2] - any logical or physical modularization (the use of entities) is not preserved
][1] - a mechanism for addressing information found in the document node tree
[[2] - the address specifies how to traversal the data model of the instance
][1] - a core upon which extended functionality specific to each of XPointer, XSLT and XQuery is added
[[2] - an expression of Boolean, numeric, string and node values as different data types
[2] - a set of functions working on the values
][1] - [P2.0]annotated with W3C Schema data type information when available
[1] - [P2.0]data model defined for use with XSLT and XQuery:
[[2] - [http://www.w3.org/TR/xpath-datamodel/]
]]
Addressing and finding structured information
[[1] - common semantics and syntax for addressing a logical hierarchy
[[2] - document order, a.k.a. parse order, a.k.a. depth first order
][1] - no representation of the physical hierarchy of an XML document
[1] - a compact non-XML syntax
[[2] - for use in languages needing to address information found in an XML document
[2] - id('start')//question[@answer='y']
[[3] - address all question elements whose answer attribute is "y" that are descendants of the element in the current document whose unique identifier is "start"
[3] - the result is an address of element nodes
][2] - [P2.0]for $each in id('start')//question[@answer='y']
return if ($each/@weight) then $each/@weight * 100.
else 100.
[[3] - for all question elements whose answer attribute is "y" that are descendants of the element in the current document whose unique identifier is "start", return a sequence of numbers where, if that element has a weight attribute return the weight multiplied by 100, otherwise
just return 100
[3] - the result is a sequence of numbers suitable for processing, such as an argument to the avg() function
]]]
[P1.0]XPath 1.0 is an addressing language and is not a query language
[[1] - only based on XML 1.0 and Namespaces in XML 1.0
[[2] - expressed in terms of the XML Information Set
[2] - [http://www.w3.org/TR/xml-infoset]
][1] - only addresses information that needs to be found in an XML document
[1] - other aspects of querying involve working with the information that is addressed before returning a result to the requestor
[[2] - instructions in XSLT perform query functionality
][1] - XPath is used only to address components of an XML instance, and in and of itself does not provide any traditional query capabilities
(though hopefully would be considered as the addressing scheme by those defining such capabilities)
]
[P2.0]XPath 2.0 is very much a query language
[[1] - based on W3C Schema XSD 1.0 perspective of an XML document
[1] - supports conditional expressions, actions on the result set, etc.
[1] - very powerful and expressive language for manipulating all types of information before returning the result of manipulation
for action
]
1.1.4 Styling structured information
[> 1.1.5][> 1.2][> 2.][< 1.1.3][^][^^][^^^]
Styling is transforming and formatting information
[[1] - the application of two processes to information to create a rendered result
[1] - the ordering of information for creation isn't necessarily (or shouldn't be constrained to) the ordering of information for
presentation or other downstream processes
[[2] - it is a common (though misdirected) first step for people working with these technologies to focus on presentation
[2] - the ordering should be based on business rules and inherent information properties, not on artificial presentation requirements
[2] - downstream arrangements can be derived from constraints imposed upstream in the process
[2] - information created richly upstream can be manipulated into less-richly distinguished information downstream, but not easily
the other way around
[2] - exception when the business rules are presentation or appearance oriented (e.g. book publishing)
][1] - the need to present information in more than one arrangement requires transformation
[1] - the need to present information in more than one appearance requires formatting
]
W3C XSL Working Group
[[1] - chartered to define a style specification language that covers at least the formatting functionality of both CSS and DSSSL
[1] - not intended to replace CSS, but to provide functionality beyond that defined by CSS
[[2] - e.g. add element reordering and pagination semantics
]]
Two W3C Recommendations
[[1] - designed to work together to fulfill these two objectives
[1] - XSL Transformations (XSLT) - versions 1.0 and 2.0
[[2] - transforming information obtained from a source into a particular reorganization of that information to be used as a result
][1] - Extensible Stylesheet Language (XSL/XSL-FO) - versions 1.0 and 1.1
[[2] - specifying and interpreting formatting semantics for the rendering of paginated information
[2] - the acronym XSL-FO is unofficial but in wide use, including at the W3C, for just the formatting objects, properties and property
values
[2] - XSL normatively includes XSLT by reference in chapter 2
[[3] - XSLT has specific features designed to be used with XSL-FO
]]]
XSLT and XSL-FO are endorsed by members of WSSSL
[[1] - an association of researchers and developers passionate about markup technologies
]
1.1.5 Extensible Stylesheet Language (XSL/XSL-FO)
[> 1.1.6][> 1.2][> 2.][< 1.1.4][^][^^][^^^]
[[1] - [http://www.w3.org/TR/2001/REC-xsl-20011015/]
[1] - [http://www.w3.org/TR/xsl11] ([http://www.w3.org/TR/xsl])
]
Paginated flow and formatting semantics vocabulary
[[1] - capturing agreed-upon formatting semantics for rendering information in a paginated form on different types of media
[1] - XSLT is normatively referenced as an integral component of XSL as a language to transform an instance of an arbitrary vocabulary
into the XSL-FO XML vocabulary
[1] - XSL-FO can be regarded simply as a "pagination markup language"
[1] - flow semantics from the DSSSL heritage
[[2] - e.g. headers, footers, page numbers, page number citations, columns, etc.
][1] - formatting semantics from the CSS heritage
[[2] - e.g. visual properties (font, color, etc.) and aural properties (speak, volume, etc.)
]]
Target of transformation
[[1] - the stylesheet writer transforms a source document into a hierarchy that uses only the formatting vocabulary in the result
tree
[1] - stylesheet is responsible for constructing the result tree that expresses the desired rendering of the information found in
the source tree
[[2] - the XML document gets transformed into its appearance
][1] - stylesheet cannot use any user constructs as they would not be recognized by an XSL rendering processor
[[2] - for example, the rendering engine doesn't know what an invoice number or customer number is that may be represented in the
source XML
[2] - the rendering engine does know what a block of text is and what properties of the block can be manipulated for appearance's
sake
[2] - the stylesheet transforms the invoice number and customer number into two blocks of text with specified spacing, font metrics,
and area geometry
]]
Device-independent formatting constructs
[[1] - the XSL-FO vocabulary describes two media interpretations for objects and properties:
[[2] - visual media
[2] - aural media
[2] - a further distinction is also made at times for interactive media
][1] - the results of applying a single stylesheet can be rendered on different types of rendering devices, e.g.: print, display,
audio, etc.
[1] - may still be appropriate to have separate stylesheets for dissimilar media
[[2] - device independence allows the information to be rendered on different media, but a given rendering may not be conducive to
consumption
]]
1.1.6 Extensible Stylesheet Language Transformations (XSLT)
[> 1.1.7][> 1.2][> 2.][< 1.1.5][^][^^][^^^]
Addressing, querying and publishing structured information
[[1] - [T1.0][http://www.w3.org/TR/xslt]
[[2] - addressing structured information
][1] - [T2.0][http://www.w3.org/TR/xslt20]
[[2] - querying structured information
][1] - a framework for complex and intelligent querying of structured content
[[2] - with a powerful syntax for modular and extensible stylesheet writing
][1] - works on XML documents
[1] - works on any source of information projected as if it were an XML document
[[2] - such projection is defined by the vendor, not by the specification
[2] - the specification sees all information as if it had been in an XML document
[2] - e.g. database tables, rows and columns
[2] - e.g. unstructured documents
[2] - e.g. proprietary binary formats
[2] - any information can be fit (or shoehorned) into an XML document by using data projection
][1] - numerous features for publishing information for human consumption
[[2] - e.g. formatting numbers, dates and times
[2] - e.g. polymorphism of stylesheet constructs for specialization of behaviors
[2] - e.g. elaborate grouping criteria
[2] - e.g. multiple result trees
]]
Shares the same data model as XQuery
[[1] - built on XPath 2.0 with additional functions not available in XQuery expressions
]
Shares the same basic processing model as XQuery
[[1] - some XSLT and XQuery implementations share the same core engine
[[2] - e.g. Saxon 9 [http://saxon.sf.net] treats XSLT and XQuery merely as different syntax skins over the same implementation engine
]]
Shares the same serialization specification as XQuery
[[1] - used to frame query results as structured or non-structured output of transformation
]
Syntactically, XSLT is an XML vocabulary
[[1] - an XSLT stylesheet is a well-formed XML document
[1] - all use of XPath 2.0 is in attributes of XSLT and other XML elements
]
Transformation specifications are termed "XSLT stylesheets"
[[1] - describing how new results are constructed from old inputs
[1] - termed generically as "a transform" in this training material
]
Transformation using construction by example
[[1] - a vocabulary for specifying templates of the result that are filled-in with information from the source
[[2] - the stylesheet includes examples of each of the components of the result
[2] - the stylesheet writer declares how the XSLT processor builds the result from the supplied examples
][1] - the primary memory management and manipulation (node traversal and node creation) is handled by the XSLT processor using declarative
constructs, in contrast to a transformation programming language or interface (e.g. the DOM - Document Object Model) where
the programmer is responsible for handling low-level manipulation using imperative constructs
[1] - includes constructs to reposition over structures and information found in the source
[1] - the information being transformed can be traversed in different ways any number of times required to construct the desired
result
[1] - straightforward problems are solved in straightforward ways without needing to know programming
[[2] - useful, commonly-required facilities are implemented by the processor and can be triggered by the stylesheet
[2] - the language is Turing complete, thus arbitrarily complex algorithms can be implemented (though not necessarily in a pretty
fashion)
][1] - includes constructs to manage stylesheets by sharing components in different fragments
[1] - [T2.0]XSLT 2.0 has many more programming features and function calls than XSLT 1.0
]
Many language features for modularization and leveraging stylesheets
[[1] - supports forms of polymorphism for stylesheet constructs
[1] - supports extensive re-use of stylesheet fragments for generalized transformations or specific transformations
[1] - overriding template rules
[[2] - allows one to create "onion skins" of modifications to stylesheet libraries
][1] - testing the presence of extensions before using them
[[2] - allows one to run one stylesheet with multiple XSLT processors
]]
Illustration of templates triggered in source-tree order constructing a result:
[Figure 1.4: Construction of result tree by triggered stylesheet templates
The figure is split in three vertical panes: a source node tree on the left, a stylesheet of tree fragments in the middle,
and three incrementally-building result trees on the right.
Arrows connect the source tree nodes to the tree fragments in the stylesheet, and other arrows connect the tree fragments
with their use in the result tree.
]
Of note:
[[1] - the source tree contains nodes of six different types, labeled "1" through "6"
[[2] - a number of nodes are found multiple times in the source tree
][1] - the stylesheet contains fragmented examples of the result tree
[[2] - each example template is associated with a node in the source tree
][1] - the nodes in the source tree trigger the building of the result from the example templates
[[2] - some examples are used multiple times in the result
][1] - in this example, the source tree is visited strictly in parse order to generate the result tree
[[2] - the stylesheet can visit the source tree in whatever order is required to trigger the assembly of the result tree in result
parse order
[2] - result parse order is indicated by the letters "A" through "Z"
]]
1.1.7 XSLT properties
[> 1.1.8][> 1.2][> 2.][< 1.1.6][^][^^][^^^]
Expression syntax is iconic
[[1] - using XML markup allows one to manifest the output
[[2] - XML is a first-class data type and output expression syntax
[2] - the syntax itself is abstracted into a tree of nodes
[2] - syntax not related to the information in the document is not preserved
][1] - using other languages one must describe the creation of the output
[[2] - XML is created using function calls, not built into the language syntax
]]
Abstract structure result of nodes, not markup
[[1] - external result markup (if needed) is determined from the result node tree
[1] - the result of transformation is a tree of nodes built from instantiated templates as an internal hierarchy that may be serialized
externally as markup
[1] - the processor may, but is not obliged to, externalize the result tree in XML or some other type of syntax if requested by
the transform writer
[[2] - the transform writer has little or no control over the syntactic constructs chosen by the processor for serialization
[2] - the transform writer can request certain behaviors that the processor can ignore
[2] - final result is guaranteed to comply with lexical requirements of the output method
[[3] - when not coerced by certain transform controls
][2] - source tree markup syntax preservation cannot be implemented with a transform
[[3] - because the source tree syntax is translated into source tree nodes and forgotten
]][1] - the processing model allows the processor to immediately serialize the result tree as markup while it is being built by the
transform, and not maintain the complete result in memory
[1] - the transform may request the processor emit the result tree using built-in available lexical conventions (XML, HTML or text-only
conventions)
[1] - [T2.0]multiple result trees may be constructed and serialized
]
Not intended for syntactic general purpose XML transformations
[[1] - designed for downstream-processing and subsequent transformations or interpretation
[[2] - does not include certain features appropriate for syntax-level general purpose transformations
[[3] - unsuitable for original markup syntax preservation requirements
][2] - [T2.0]XSLT 2.0 has more syntax serialization features than XSLT 1.0
[2] - includes facilities for working with the XSL vocabulary easily
][1] - still powerful enough for most downstream-processing transformation needs
[[2] - where the syntax choices when using XML are not important
[2] - absolutely general purpose when the output is going to be input to an XML processor
]]
Document model and vocabulary independent
[[1] - a transform is independent of any Document Type Definition (DTD) or schema that may have been used to constrain the instance
being processed
[1] - a processor can process well-formed XML documents without a model
[[2] - behavior is specified against the presence of markup in an instance as the implicit model, not against the allowed markup
prescribed by any explicit model
][1] - one transform can process instances of different document models
[1] - multiple instances of different models can be used in a single transformation
[1] - different transforms can process a given single instance to produce different results
]
Source files and transforms
[[1] - one or more source files and one or more transform fragments
[[2] - starting with a single source file and the top-most transform fragment
][1] - [T1.0]all stylesheets and source files must be well-formed XML
[1] - [T2.0]stylesheets must be XML, source files may be simple text or well-formed XML
[[2] - zero or more source files and one or more stylesheet fragments
[2] - starting with the top-most stylesheet fragment and optionally a source file
][1] - the processor is allowed to deliver well-formed XML from any data source
[1] - Recommendation does not support SGML instances as input
[[2] - see [http://www.w3.org/TR/NOTE-sgml-xml-971215] for a comparison of SGML and XML
[2] - see [http://tidy.sourceforge.net/] for interpretation and conversion of instances of the HTML vocabulary into XHTML markup conventions
[2] - see [http://www.ccil.org/~cowan/XML/tagsoup] for interpretation and conversion of streams of arbitrary HTML constructs
[2] - see [http://www.jclark.com/sp/sx.htm] in the SP package [http://www.jclark.com/sp] for conversion of SGML instances to XML instances without document type declarations
[2] - see [http://www.CraneSoftwrights.com/resources/n2x] for conversion of SGML instances to XML instances with document type declarations
]]
Validation unnecessary (but convenient)
[[1] - an XSLT processor need not implement a validating XML processor
[1] - must implement at least a non-validating XML processor to ensure well-formedness
[1] - validation is convenient when debugging transform development
[[2] - if the source document does not validate to the model expected by the transform writer, then a correctly functioning transform
may exhibit incorrect behavior
[2] - time spent debugging the working transform is wasted if the source is incorrect
][1] - can selectively validate input documents and result documents using W3C Schema
]
Multiple source files possible
[[1] - [T1.0]one mandatory primary source file
[1] - one optional primary source file
[[2] - in the absence of a source file a named template must be specified as where to start processing
][1] - transform may access arbitrary other source files
[[2] - including itself as a source file
[2] - names of resources hardwired within the transform
[2] - names of resources found within source files
][1] - multiple accesses to the same resource refer to a single abstract representation
[[2] - one is not built for each access to a named resource
][1] - [T2.0]simple text files can be input into the process
]
Extensible language design supplements processing
[[1] - a processor may support extensions specified in the transform but is not obliged to do so
[[2] - extended functions
[2] - extended serialization conventions
[2] - extended sorting schemes
[2] - extended instructions
][1] - access to non-standardized extensions is specified in standardized ways
[1] - transform user-defined functions can be declared and used
]
Single-pass construction of the result node-tree
[[1] - unlike the Document Object Model (DOM)
[[2] - reified node-tree manipulation (read/write) interface with syntax serialization
][1] - unlike the Simple API for XML (SAX)
[[2] - single-pass input event-handling interface with single-pass result markup syntax
][1] - transform must construct the result tree in result-tree parse order in one pass
[[2] - no revisiting of the result tree after construction
[2] - no revisiting an element's start tag after beginning that element's content
[2] - recall the result tree building shown on [Figure 1.4]
][1] - the source trees can be traversed in any order (not necessarily in parse order)
[[2] - information in the source trees can be ignored or selectively processed
][1] - the result tree is emitted as if constructed chronologically in parse order
[[2] - this is not an implementation constraint, but an implementation must act as if the tree were created in parse order
[[3] - an important distinction for parallelism where partial trees may be constructed in parallel
]]]
1.1.8 Historical development of the XSL and XQuery Recommendations
[> 1.1.9][> 1.2][> 2.][< 1.1.7][^][^^][^^^]
Recommendation release history:
[[1] - first concept description floated in August 1997 with no official status within the World Wide Web Consortium (W3C)
[[2] - [http://www.w3.org/TR/NOTE-XSL.html]
][1] - the XSL Working Group officially chartered in early 1998
[[2] - [http://www.w3.org/Style/XSL/]
][1] - agreed upon requirements for XSL by the Working Group:
[[2] - [http://www.w3.org/TR/WD-XSLReq]
][1] - the XSL 1.0 Recommendation (XSL-FO) published October 15, 2001
[[2] - [http://www.w3.org/TR/2001/REC-xsl-20011015/]
][1] - the XSL 1.1 Recommendation (XSL-FO) published December 5, 2006
[[2] - [http://www.w3.org/TR/2006/REC-xsl11-20061205/]
][1] - the XSLT/XPath 1.0 Recommendations published November 16, 1999
[[2] - [http://www.w3.org/TR/1999/REC-xslt-19991116]
[[3] - [http://www.w3.org/1999/11/REC-xslt-19991116-errata] - errata
][2] - [http://www.w3.org/TR/1999/REC-xpath-19991116]
[[3] - [http://www.w3.org/1999/11/REC-xpath-19991116-errata] - errata
]][1] - XSLT 1.1 (work abandoned)
[[2] - [http://www.w3.org/TR/2000/WD-xslt11req-20000825] - requirements
[2] - [http://www.w3.org/TR/2001/WD-xslt11-20010824]
[2] - no incompatible changes to XSLT 1.0 in XSLT 1.1, only additional functionality
[2] - too many interactions with plans for XSLT 2.0, so functionality to be folded into XSLT 2.0 release
][1] - XSLT 2.0/XPath 2.0/XQuery 1.0 originally published January 23, 2007, followed by editorial editions:
[[2] - [http://www.w3.org/TR/2007/REC-xslt20-20070123/]
[2] - [http://www.w3.org/TR/2010/REC-xpath20-20101214/]
[2] - [http://www.w3.org/TR/2010/REC-xpath-datamodel-20101214/]
[2] - [http://www.w3.org/TR/2010/REC-xpath-functions-20101214/]
[2] - [http://www.w3.org/TR/2010/REC-xslt-xquery-serialization-20101214/]
[2] - [http://www.w3.org/TR/2010/REC-xquery-20101214/]
[2] - [http://www.w3.org/TR/2010/REC-xquery-semantics-20101214/]
[2] - [http://www.w3.org/TR/2010/REC-xqueryx-20101214/]
]]
1.1.9 XSL information links
[> 1.1.10][> 1.2][> 2.][< 1.1.8][^][^^][^^^]
Links to useful information
[[1] - [http://xml.coverpages.org/xsl.html] - Robin Cover
[1] - [http://www.mulberrytech.com/xsl/xsl-list/] - mail list
[1] - [http://www.dpawson.co.uk] - an XSL/XSLT FAQ
[1] - [http://www.zvon.org/HTMLonly/XSLTutorial/Books/Book1/index.html] - numerous example XSLT scripts and fragments
[1] - [http://www.openmath.org/cocoon/openmath/] - OpenMath project work by David Carlisle
[1] - [http://www.CraneSoftwrights.com/links/trn-20110211.htm] - comprehensive XSLT/XPath and XSL-FO training material
[1] - [http://XMLGuild.info] - consulting and training expertise
[1] - [http://www.CraneSoftwrights.com/resources]- free XSLT and XSL-FO resources
[1] - [http://incrementaldevelopment.com/xsltrick/] - "Stupid XSLT Tricks"
[1] - [http://xml.coverpages.org/xslSoftware.html] - list of tools
[1] - [http://www.exslt.org/] - community effort for XSLT extensions
[1] - [http://exslfo.sf.net] - community effort for XSL-FO extensions
[1] - [http://foa.sourceforge.net/] - open source FO GUI authoring tool
[1] - [http://www.xslfast.com/] - commercial FO GUI authoring tool
[1] - [http://www.inventivedesigners.com/] - commercial FO GUI authoring tool
[1] - [http://www.abisource.com/] - word processing with "Save As..." for XSL-FO
[1] - [http://www.AntennaHouse.com/XSLsample/XSLsample.htm] - paginating XHTML
[1] - ISBN 1-56609-159-4 - "The Non-Designer's Design Book", Robin Williams, Peachpit Press, Inc., 1994
[1] - ISBN 0-8230-2121-1/0-8230-2122-X - "Graphic design for the electronic age; The manual for traditional and desktop publishing", Jan V. White, Xerox Press,
1988 (out of print but worthwhile to search for as a used book)
]
1.1.10 Namespaces
[> 1.1.11][> 1.2][> 2.][< 1.1.9][^][^^][^^^]
[[1] - [http://www.w3.org/TR/REC-xml-names]
]
An important role in information representation:
[[1] - vocabulary distinction in a single XML document
[[2] - mixing information from different document models
[2] - labels in the hierarchy are globally unique and identifiable
[2] - a metaphor is that each namespace is a dictionary with words
[[3] - each dictionary may have a different definition for the same word as found in other dictionaries
[3] - the namespace identifies which dictionary of words is in use
]][1] - possible use for resource discovery being considered
[[2] - generalized associated information regarding information in an instance
[2] - possible access to document model, transforms, validation algorithms, access libraries, etc.
]]
Vocabulary distinction
[[1] - specifies a simple method for qualifying element and attribute names used in XML documents
[1] - allows the same element type name to be used from different vocabularies in a given document
[[2] - consider two vocabularies each defining the element type named "<set>", each with very different semantics
[[3] - following the metaphor, the one word has two different definitions and interpretations, one from each dictionary
[3] - in SVG (Scalable Vector Graphics) the element <set> refers to setting a value within the scope of contained markup
[3] - in MathML (Mathematical Markup Language) <set> refers to a collection of constructs treated as a set
][2] - any document needing to mix elements from the two vocabularies may need to use the same name
[[3] - without namespaces an application cannot distinguish which construct is being used
][2] - a namespace prefix differentiates the element type name suffix in an instance
[[3] - <svg:set>
[3] - <math:set>
][2] - composite name lexically parses as an XML name
[[3] - the use of the colon is defined by the namespaces recommendation
]][1] - also used to uniquely distinguish identification labels in some Recommendations
[[2] - e.g.: customized sort scheme label
]]
URI value association
[[1] - associates element type name prefixes with Universal Resource Identifier (URI) references whether or not any kind of resource
exists at the URI
[[2] - following the metaphor, the URI uniquely identifies the dictionary of the words
[[3] - supplemental documentation defines the meaning of each of the words
][2] - URI domain ownership under auspices of established organization
[2] - URI conflicts avoided if rules followed
][1] - examples:
[[2] - xmlns:svg="http://www.w3.org/2000/svg-20000629"
[2] - xmlns:math="http://www.w3.org/1998/Math/MathML"
[2] - xmlns:ex1="urn:isbn:978-1-894049:example"
[2] - xmlns:ex2="urn:X-Crane:namespaces:documents:example2"
[2] - xmlns:ex3="ftp://ftp.CraneSoftwrights.com/ns/example3"
[2] - xmlns:ex4="mailto:gkholman@CraneSoftwrights.com"
][1] - explicitly does not require to de-reference any kind of information from the URI
[[2] - note that the Resource Description Framework (RDF) recommendation does have a convention of looking to the URI for information,
though this is outside the scope of the Namespaces recommendation
][1] - according to the recommendation, the URI is only used to disambiguate otherwise identical unqualified members of different
vocabularies
]
The choice of the prefix is arbitrary and can be any lexically valid name
[[1] - the prefix is never a mandatory aspect of any Recommendation
[1] - the prefix is discarded by the XML namespace-aware processor along the lines of:
[[2] - <{http://www.w3.org/2000/svg-20000629}set>
[2] - <{http://www.w3.org/1998/Math/MathML}set>
[2] - the above use of "{" and "}" are a common convention but not standard
[2] - note how the "/" characters of the URI would be unacceptable given the lexical rules of names, thus, the URI could never be used directly
in the XML tags
][1] - the prefix is a syntactic shortcut preventing the need to specify long distinguishing strings
]
Different views of the name of <svg:set>:
[[1] - "set" is the local name
[1] - "svg:set" is the qualified name
[[2] - a name subject to namespace interpretation (prefixed or un-prefixed)
[2] - the lexical space for the W3C Schema QName data type
][1] - "{http://www.w3.org/2000/svg-20000629}set" is the expanded name
[[2] - combination of namespace URI (also called "namespace name") and the local part
[2] - the value space for the W3C Schema QName data type
[2] - the use of "{" and "}" is not standard, but is used by some tools such as Saxon
][1] - "http://www.w3.org/2000/svg-20000629#set" is a URI value convention
]
An example of using namespaces in a Universal Business Language (UBL) invoice:
[[1] - for space reasons the lengthy namespace URI strings have been abbreviated
[1] - note that namespaces are important because there are two elements with the same local name "Location", one in each of two different namespaces
]
[Example 1-5: 01 <Invoice xmlns="urn:oasis:...:xsd:Invoice-2"
02 xmlns:cbc="urn:oasis:...:xsd:CommonBasicComponents-2"
03 xmlns:cac="urn:oasis:...:xsd:CommonAggregateComponents-2"
04 xmlns:ext="urn:oasis:...:xsd:CommonExtensionComponents-2"
05 xmlns:demo="urn:x-Demo:Demo">
06 <ext:UBLExtensions>
07 <ext:UBLExtension>
08 <cbc:ID>Demo1</cbc:ID>
09 <cbc:Name>Demonstration</cbc:Name>
10 <ext:ExtensionAgencyID>CSL</ext:ExtensionAgencyID>
11 <ext:ExtensionAgencyName>Crane Softwrights Ltd.
12 </ext:ExtensionAgencyName>
13 <ext:ExtensionVersionID>0.1</ext:ExtensionVersionID>
14 <ext:ExtensionAgencyURI>http://www.CraneSoftwrights.com/
15 links/res-dev.htm</ext:ExtensionAgencyURI>
16 <ext:ExtensionURI>urn:x-Demo:Demo:0.1</ext:ExtensionURI>
17 <ext:ExtensionReasonCode listURI="urn:x-Demo:Demo:ReasonCodes">1
18 </ext:ExtensionReasonCode>
19 <ext:ExtensionReason>Illustration</ext:ExtensionReason>
20 <ext:ExtensionContent>
21 <demo:Demo>
22 <demo:Thing>This is a test</demo:Thing>
23 <cbc:ID>DemoTest</cbc:ID>
24 <demo:Total currencyID="GBP">100.00</demo:Total>
25 </demo:Demo>
26 </ext:ExtensionContent>
27 </ext:UBLExtension>
28 </ext:UBLExtensions>
29
30 <cbc:ID>A00095678</cbc:ID>
31 <cbc:IssueDate>2005-06-21</cbc:IssueDate>
32 <cbc:Note>sample</cbc:Note>
33 <cac:AccountingSupplierParty>
34 <cac:Party>
35 <cac:PartyName>
36 ...
]
Namespaces in XSLT and XSL-FO
[[1] - both files are in well-formed XML syntax
[[2] - require all namespaces used to be declared; there are no defaults
][1] - recommendations utilize namespaces to distinguish the desired result tree vocabularies from the transformation instruction
vocabularies
[1] - http://www.w3.org/1999/XSL/Transform
[[2] - XSL transformation instruction vocabulary
[2] - the use of any archaic URI values for the vocabulary will not be recognized by an XSLT processor
][1] - http://www.w3.org/1999/XSL/Format
[[2] - XSL formatting result vocabulary
[2] - the year represents when the W3C allocated the URI to the working group, not the version of XSL the URI represents
]]
Extension identification
[[1] - processors are allowed to recognize other namespaces in order to implement extensions not defined by the Recommendations:
[[2] - functions
[2] - XSLT instructions
[2] - XSLT system properties
[2] - collations
[2] - serialization methods
][1] - e.g.: http://www.jclark.com/xt
[[2] - extensions available when using XT
][1] - e.g.: http://saxon.sf.net/
[[2] - extensions available when using Saxon
]]
Naming of top-level constructs in XSLT
[[1] - libraries of transform fragments can isolate their constructs by using unique namespace URI strings
[1] - building upon an existing library is done without risking the integrity of the existing stylesheets when one is disciplined
about the naming of constructs
[1] - in the following example, two different variables are declared because of the unique namespace URI strings (the prefixes are
immaterial)
[[2] - the first is in namespace "urn:X-a" and the second is in namespace "urn:X-b"
][1] - [Example 1-6: Two variables in different namespaces01 <xsl:variable name="a:thing" select="'abc'" xmlns:a="urn:X-a"/>
02 <xsl:variable name="a:thing" select="'def'" xmlns:a="urn:X-b"/>
]
[1] - [T2.0]stylesheet-defined function names must be namespace qualified
[1] - the default namespace is never used for naming top-level constructs
]
1.1.11 Stylesheet association
[> 1.2][> 2.][< 1.1.10][^][^^][^^^]
[[1] - [http://www.w3.org/TR/xml-stylesheet]
]
Relating documents to stylesheets
[[1] - associating one or more stylesheets with a given XML document
[1] - same pseudo-attributes and semantics as in the HTML 4.0 recommendation elements:
[[2] - <LINK REL="stylesheet">
[2] - <LINK REL="alternate stylesheet">
]]
Ancillary markup
[[1] - not part of the structural markup of an instance, thus it is marked up using a processing instruction rather than first-class
(declared or declarable in a document model) markup
]
[Example 1-7:
Associating an XSL stylesheet with an unregistered MIME type
Typical examples of use:
01 <?xml-stylesheet type="text/xsl" href="../xs/xslstyle-docbook.xsl"?>
]
[Example 1-8:
Associating a CSS stylesheet01 <?xml-stylesheet type="text/css" href="normal.css"?>
]
[Example 1-9:
Alternative stylesheet association
Less typical examples provided for by the design:
01 <?xml-stylesheet alternate="yes" title="small"
02 href="small.xsl" type="application/xslt+xml"?>
]
[[1] - provide the processor with an alternate stylesheet if some external stimulus triggers it by name
]
[Example 1-10:
Associating an internal stylesheet01 <?xml-stylesheet href="#style1" type="application/xslt+xml"?>
]
[[1] - instruct the processor to find the stylesheet embedded in the source document at the named location
]
Important note about type= values for associating XSLT:
[[1] - type="text/xsl" is not a registered MIME type
[[2] - the only type recognized by IE for the use of XSLT
][1] - type="application/xslt+xml" has been proposed in IETF RFC 3023
[1] - type="text/xml" is reported to be supported by some processors
]
See [XSLStyle™ - Section D.1.1] for an embedded XSLT documentation methodology
1.2 Transformation data flows
[> 2.][< 1.1.11][^^][^^^]
1.2.1 Transformation from XML to XML
[> 1.2.2][> 2.][< 1.1.11][^^][^^^]
The basic behavior is to transform a hierarchical input into a hierarchical result tree:
[[1] - that result tree may be emitted as an XML instance
]
[Figure 1.5: Transformation from XML to XML
Three blocks are shown, each labeled "Transform Process". To the left of the blocks are triangles, each representing XML structures,
two being source files and two being transform files labeled "XSLT/XQ", each connected to one of the blocks. The transform
file of the first process is also used as the transform file of the second process. The source structure of the second process
is also used as the source structure of the third process. This shows that one transform can be used with multiple source
structures and one source structure can be used with multiple transforms.
Each Transform Process block has embedded a dotted triangle representing the result tree. A solid line leads from result tree
to a triangle on the right of each block representing the XML serialization of the result tree.
The first source structure is shown to have been derived from an XML file.
The second source structure is shown to have been projected from an arbitrary data source that is not an XML file. Three
possibilities are shown: data bases, flat files and feeds.
]
Of note:
[[1] - a given transform can be applied to more than one XML structure
[1] - a given XML structure can have more than one transform applied
[1] - a given XML structure can be derived from an XML file or projected from some other data source identified by the transform
[1] - the result of construction is the abstract result tree within the transform process serialized to the emitted XML under the
control of the process
]
[[1] - the dotted triangle in the process represents the abstract node tree of the result
]
Diagram legend
[[1] - processes represented by rectangles
[1] - hierarchical structures represented by triangles
[[2] - a tree structure with the single root at the left point and the tree expanding and getting larger towards the leaves at the
right edge
[2] - XML files are drawn with a solid line, node structures are drawn with a dotted line
][1] - unstructured files represented by parallelograms
]
1.2.2 Transformation from XML to non-XML
[> 1.2.3][> 2.][< 1.2.1][^][^^][^^^]
A processor may choose to recognize the transform's request to serialize a non-XML representation of the result tree:
[[1] - triggered through using an output serialization method supported by the processor
]
Shared serialization specification between XSLT 2.0 and XQuery 1.0
[[1] - [http://www.w3.org/TR/xslt-xquery-serialization/]
]
At least two non-XML tree serialization methods common to all specifications:
[Figure 1.6: Transformation from XML to Aware Non-XML
Two separate figures are shown, each being a Transform Process with two triangle inputs (source nodes and transform), the
dotted triangle result tree, and one output. The figure on the left shows the output as a triangle labeled "HTML". The figure
on the right shows the output as a parallelogram labeled "Text".
]
[[1] - html
[[2] - HTML markup and structural conventions
[[3] - some older HTML user agents (e.g. browsers) will not correctly recognize elements in the HTML vocabulary when the instance
is marked up using XML conventions (e.g. <br/> must be <br>), thus necessitating the interpretation of HTML semantics when the result tree is emitted
[3] - using this will not validate the result tree output as being HTML
[[4] - if the result is declared HTML but the desired output isn't HTML, the HTML semantics could interfere with the markup generated
]][2] - HTML built-in character entities (e.g.: accented letters, non-breaking space, etc.)
][1] - text
[[2] - simple text content with all element start and end tags removed and ignored
[2] - none of the characters are escaped on output
[2] - example of use: creating operating system batch and script files from structured XML documents
]]
[T1.0]No standardized support for XHTML lexical conventions
[[1] - a processor could offer a custom extension, but many (possibly all?) do not
]
[T2.0]Standardized support for XHTML lexical conventions
[[1] - xhtml
[[2] - browser compatibility guidelines for empty tags for elements defined to be empty
[2] - no markup minimization for empty elements for elements not defined to be empty
]]
[Figure 1.7: Transformation from XML to XHTML
A flow is shown with an XSLT process with two triangle inputs (source and stylesheet), the dotted triangle result tree, and
one output being XHTML.
]
[T1.0]Only standardized support for a single result tree
[[1] - most XSLT processors offer a custom extension, but there is no obligation to do so and it is not standardized
]
[T2.0]Standardized support for multiple result trees
[[1] - each result tree can have the same or different serialization
[1] - multiple result trees are not accessible to a single XSL-FO process
]
[Figure 1.8: Transformation from XML to multiple result trees
A flow is shown with an XSLT process with two triangle inputs (source and stylesheet), and two dotted triangle result trees,
one serialized output being XML and the other serialized output being simple text.
]
1.2.3 Transforming and rendering XML information using XSLT and XSL-FO
[> 1.2.4][> 2.][< 1.2.2][^][^^][^^^]
When the XSLT result tree is specified to utilize the XSL-FO formatting vocabulary:
[[1] - the normative behavior is to interpret the result tree according to the formatting semantics defined in XSL for the XSL-FO
formatting vocabulary
[1] - an inboard XSLT processor can effect the transformation to an XSL-FO result tree
[1] - the XSL-FO result tree need not be serialized in XML markup to be conforming to the recommendation (though useful for diagnostics
to evaluate results of transformation)
]
[Figure 1.9: Transformation from XML to XSL Formatting Semantics
A large block represents an XSL-FO process. Two triangle inputs from the left are the source file and the stylesheet file,
the stylesheet file indicates it contains only XSLT and XSL-FO vocabularies.
The first block inside the XSL-FO process is an XSLT process taking the two inputs and producing a dotted triangle XSL-FO
result tree output. A solid line leads from this result tree out the bottom of the large box to an XML serialization of the
XSL-FO tree. Three arrows also lead from the result tree to three process boxes, one each for aural, print and display interpretation
of the XSL formatting and flow object semantics in each domain. Each such process box has an arrow leading out of the large
box to a depiction of a speaker, a piece of paper, and the electronic display.
]
Of note:
[[1] - the stylesheet contains only the XSLT transformation vocabulary, the XSL formatting vocabulary, and extension transformation
or foreign object vocabularies
[1] - the source XML contains the user's vocabularies
[1] - the result of transformation contains exclusively the XSL formatting vocabulary and any extension formatting vocabularies
[[2] - does not contain any constructs of the source XML or XSLT vocabularies
][1] - the rendering processes implement for each medium the common formatting semantics described by the XSL recommendation
[[2] - for example, space specified before blocks of text can be rendered visually as a vertical gap between left-to-right line-oriented
paragraphs or aurally as timed silence before vocalized content
]]
1.2.4 XML to binary or other formats
[> 1.2.5][> 2.][< 1.2.3][^][^^][^^^]
Some non-XML requirements are neither text nor HTML
[[1] - need to produce composition codes for legacy system
[1] - binary files with complex encoding
[1] - custom files with complex or repetitive sequences
]
One can capture the semantics of the required output format in a custom XML vocabulary
[[1] - e.g.: "CVML" for "Custom Vocabulary Markup Language"
[1] - designed specifically to represent meaningful concepts for output
]
[Figure 1.10: Transformation from XML to an arbitrary format
An XSLT process on the left shows a triangle XML input and a triangle stylesheet input, where the stylesheet is a combination
of XSLT and CVML. The dotted triangle in the XSLT process leads to a solid triangle output indicating the CVML serialization
of the result tree. This leads by a solid line to a CVML Interpreter written either using SAX or DOM to interpret the semantics
of the custom vocabulary. The output of this interpretation is a parallelogram labeled "Non-XML" as the result file.
]
A single translation program (drawn as "CVML Interpreter"):
[[1] - can interpret all XML instances using the custom vocabulary markup language (e.g. CVML) to produce the output according to
the programmed semantics
[1] - is independent of the XSLT stylesheets used to produce the instances of the custom vocabulary
[1] - allows any number of stylesheets to be written without impacting the translation to the final output
[1] - divorces the need to know syntactic output details
[[2] - output is described abstractly by semantics of the vocabulary
[2] - output is serialized following specific syntactic requirements
]]
The XSLT recommendation is extensible providing for vendor-specific or application-specific output methods:
[[1] - xmlns:prefix="processor-recognized-URI"
[1] - prefix:serialization-method-name
[[2] - vendors can choose to support additional built-in tree serialization methods
[2] - output can be textual, binary, dynamic process (e.g.: database load), auditory, or any desired activity or result
]]
The ability to specify vendor-specific or implementation-specific output methods allows custom semantics to be interpreted
within the modified XSLT processor, thus not requiring the intermediate file:
[Figure 1.11: Built-in Transformation from XML to Arbitrary Non-XML
An customized XSLT process in the center as a large box with a triangle XML input and a triangle stylesheet input on the left,
where the stylesheet is a combination of XSLT and CVML. The dotted triangle in the XSLT process leads directly to a CVML Interpreter
(still inside the box) written either using SAX or DOM to interpret the semantics of the custom vocabulary. The output of
this interpretation leads outside the process box to a parallelogram labeled "Non-XML" as the result file.
]
1.2.5 XSLT as an application front-end
[> 1.2.6][> 2.][< 1.2.4][^][^^][^^^]
A legacy application can utilize an XSLT processor to accommodate arbitrary XML vocabularies
[[1] - making an application XML-aware involves using an XML processor to accommodate a vocabulary expressing application data semantics
[[2] - event driven using SAX processing and programming
[2] - tree driven using DOM processing and programming
[2] - without XSLT, each different XML vocabulary would need to be accommodated by different application integration logic
][1] - an application can engage an XSLT processor and directly access the result tree
[[2] - single process programmed to interpret a single markup language
[2] - each different XML vocabulary is accommodated by only writing a different XSLT stylesheet
[2] - each stylesheet produces the same application-oriented markup language
][1] - no reification of the result tree is required
]
[Figure 1.12: XSLT as an application front-end
Three separate figures are shown. In the top left figure a collection of parallelograms for non-XML files are input to an
application process box with an embedded application semantics box.
Top right shows a collection of triangle XML files input into a larger application box within which is both a CVML Interpreter
written using either SAX or DOM and an arrow to the application semantics box that is also embedded.
Below both is a large diagram showing an XSLT process inside an application box whose dotted result tree leads to a CVML Interpreter
and application semantics box. Two pairs of inputs are shown, one being the source from user 1 and the corresponding stylesheet,
and the other being the source from user 2 and the corresponding stylesheet.
]
1.2.6 Three-tiered architectures
[> 1.2.7][> 2.][< 1.2.5][^][^^][^^^]
To support a legacy of user agents that do not support XML:
[[1] - web servers can detect the level of support of user agents
[1] - where XML and XSLT or XQuery are not supported in a user agent:
[[2] - the host can take on the burden of transformation
][1] - where XML and XSLT or XQuery are supported in a user agent
[[2] - the burden of transformation can be distributed to the agent
[2] - the XML information can be massaged before being sent to the agent
][1] - allows information to be maintained in XML yet still be available to all users
]
[Figure 1.13: Server-side Transformation Architecture
The center shows a large box representing a web host server. To the left are a number of triangles representing source and
stylesheet files. Transform processes in the host server pass information to the right to user agents.
The first shows a stylesheet and source file through the Transform Process on the server passing HTML to an HTML user agent.
The second shows the same stylesheet and source file being passed through the server untouched to an Transform Process being
run on an XML user agent.
Third is just the XML file being sent to an XML user agent where a local transform is being used.
Fourth is a different transform being used on the XML on the server to send XML to an XML user agent where the same local
transform is being used.
Finally the same transform is being used on the XML on the server to send XML to an XML user agent but a second transform
is being sent by the host to be used with the result of the first transformation.
]
Always performing server-side transformation:
[[1] - good business sense in some cases
[[2] - even if technically it is possible to send semantically-rich information
][1] - never send unprocessed semantically-rich XML
[[2] - or only send it to those who are entitled to it
[[3] - for security reasons
[3] - for payment reasons
]][1] - translation into a presentation-orientation
[[2] - using a markup language inherently supported by the user agent (e.g. HTML)
[2] - using a custom, semantic-less markup language with an associated transformation
][1] - "semantic firewall"
[[2] - to protect the investment in rich markup from being seen where not desired
[2] - no consensus in the community that semantic firewalls are a "good thing"
]]
1.2.7 XSLT and XQuery on the wire
[> 2.][< 1.2.6][^][^^][^^^]
XSLT and XQuery have a role in a large or small network cloud:
[[1] - simple transformation services can be made available to users on the network, unburdening the user's own infrastructure
]
[Figure 1.14: XSLT and XQuery on the wire
The center shows a network cloud with four entities around the edges.
The top-left entity illustrates Publish/Subscribe showing an XML document going into the cloud to a Transform Process that
distributes the XML to all who have subscribed to receive it.
The bottom right entity illustrates Aggregation showing a number of sources of XML going to a Transform Process in the cloud
and a single stream of XML out of the cloud to the entity.
The bottom left entity is sending and receiving documents through a Transform Process in the cloud to and from the top right
entity.
]
Publish/Subscribe
[[1] - a network service can accept subscription requests from across the network
[1] - the XML document from the publisher is routed to all subscription destinations
[1] - a subscriber can request a transformation process so as to receive the published information in the desired structure
]
Aggregation
[[1] - a network service can accept XML documents from across the network
[1] - a user of the service can receive the aggregate of all of the information
[1] - the information can be transformed into a homogenous collection for ease of processing and analysis
]
Transformation
[[1] - a user of the network can utilize wire-speed transformation of outgoing and incoming documents to a peer
]
*** This is a free preview excerpt of a commercial publication. ***
This is an accessible version of Crane's commercial training material.
The content has been specifically designed to assist screen reader software
in viewing the entire textual content. Figures are replaced with text
narratives.
Navigation hints are in square brackets:
[Tx.x] and [Fx.x] are textual representations of the applicability icons;
[digit] indicates list depth for nested lists;
[link [URL]] indicates the URL of a hyperlink if different than link;
[EXAMPLE] indicates an example listing of code;
[FIGURE] indicates the presence of a figure replaced by its description;
[>] jumps forward;
[<] jumps backward;
[^] jumps to start of the section;
[^^] jumps to the start of the chapter;
[^^^] jumps to the table of contents.
Suggestions for improvement are welcome:
[info@CraneSoftwrights.com]
Book sales: [http://www.CraneSoftwrights.com/links/trn-acc.htm]
Information: [http://www.CraneSoftwrights.com/links/info-acc.htm]
This content is protected by copyright and, as there are no means to protect
this accessible version from plagiarism, please do not make any
commercial edition available to others.
+//ISBN 978-1-894049::CSL::Courses::PTUX//DOCUMENT Practical Transformation Using XSLT and XPath 2011-02-11 21:00UTC//EN
Practical Transformation Using XSLT and XPath
Fourteenth Edition - 2011-02-11
ISBN 978-1-894049-24-5
Copyright © Crane Softwrights Ltd.