Selecting XML Data: XPath
- XPath is a syntax for specifying a collection of elements or other information contained within an XML document.
- Conceptually, XPath assumes that the XML document has been parsed into an internal tree representation and XPath expressions will be applied to this XML document.
- The XPath tree model is similar to the DOM tree.
- The root of the XPath parse tree is known as the document root.
- The nodes in an XPath tree are of different types, such as element, text, and comment nodes.
- Nodes are also used to represent attribute name value pairs.
<?xml version=”1.0″ encoding=”UTF-8″?>
<message>Hello World!</message>
- Example – the root element is of type message. Thus, for this example XML document, an XPath element node of type message will be a child of the document root node in the XPath parse tree.
Location Paths
- In the markup <xsl:template match=”/”> the value of the match attribute (/) is an XPath expression that represents the XPath document root.
- In general, a location path can consist of multiple location steps separated by slash (/) characters.
- Each location step consists of at least two parts: an axis name followed by two colons (::) and a node test.
- In the markup <xsl:value-of select=”child::message” /> child is the axis name, message is the node test and the XPath parse tree node for which this location path is defined is the context node.
- The axis names correspond to a list of element, text, and/or comment nodes. (Exception – attribute axis – context node corresponds to attribute name-value pairs)
- The node test portion of a location step is generally one of two types.
- Name test –
- specify a qualified name that represents an element type name – nodes having the specified name are included in the node list.
- Using (*) in name test – all element nodes in that axis are included in the node list
- Type test – Standard type tests –
- text() – select nodes representing character data
- comment() – select nodes representing comments
- node() – select nodes of any type.
- Example – expression descendant::text() evaluates to a list of all of the text nodes that are descendants of the context node, while descendant::node() represents all element, text, and comment nodes that are descendants of the context node.
- A location step can contain one or more predicates enclosed in square brackets.
- An XPath predicate is a boolean function which is applied to every node in a node list.
- If the predicate returns true for a node, then that node is copied to a new list.
- If the predicate returns false for a node, it is not copied to the filtered list.
- When multiple predicates are used the successive predicates are applied to the filtered nodes from the previous predicates.
- The filtered list produced by the final predicate is the value of the location step.
- Example, In expression
child::chapter[attribute::display=”visible”][position()=last()]
- (1) Generate a list of all of the element nodes that are children of the context node and whose element type is chapter.
- (2) filter node list to include only those elements which contain an attribute named display having value visible.
- (3) Again filter node list to include the node that occurs in the last position of the document.
- Note
- String literals in XPath expressions can be enclosed in either single or double quotes.
- last() function always returns the number of nodes in the filtered list
- value of position() function is numeric – [position()=1] indicates the first element and [position()=last()] indicates the last element in the in the ordering.
- = is used for equality testing in XPath predicates.
- To use relational operator such as <= in an XPath expression, XML syntax rules require to escape the < symbol using < reference.
- Predicates can also be combined using the Boolean operators and and or.
- Example – position() != 1 or descendant::para
Location Paths with Multiple Steps
- As noted earlier, a location path can consist of several location steps separated by / characters – Example child::para/child::strong
- When multiple location steps are involved – the first step is evaluated to produce a node list. The nest step is evaluated for all the child element of the nodes in the previous list. The final list will now contain a union of the results of these evaluations.
- From example, the final list contains every node that is a strong element and that has a para element as its immediate parent node.
Absolute and Relative Location Paths
- Two types of location path
- relative location path – defines a set of nodes relative to the context node,
- absolute location path – begins with a slash (/) – defines a set of nodes relative to the document root.
Combining Node Lists
- The pipe symbol (|) can be used to represent a node set produced by taking the union of the node sets returned by multiple location paths.
- Example – the XPath expression child::strong|descendant::emph represents a node set containing all of the strong children of the context node as well as all of its emph descendants.
Function Calls as XPath Expressions
- XPath expressions are often location paths that evaluate to node lists
- Function calls can be used as XPath expressions itself or as predicates in XPath expressions.
- Three popular XPath functions
- string() function – returns a string representing the concatenation of all text contained in the first node of the list, including that node’s descendants. This text is concatenated in the order in which it appears in the document corresponding to the XPath tree.
- normalize-space() function – takes a string as its argument and returns a normalized version of the string –
- leading and trailing white space is removed,
- all remaining XML white space characters (tabs, newlines, and carriage returns) are converted to space characters, and
- consecutive space characters are collapsed to a single space.
- id() function – takes a string of white-space-separated identifiers as its argument – returns a list of the nodes having id attributes with these identifiers as their values.