Transforming XML Documents

Transforming XML Documents

  • A SAX parser can be viewed as a mechanism for transforming an XML text document into a stream of events corresponding to the markup and character data contained in the original document.
  • Similarly, a DOMparser transforms an XML document into a DOMtree.
  • In fact, JAXP provides standardized APIs for transforming from any of these three representations – XML document, SAX event stream, or DOM tree – to either of the others.
  • Furthermore, JAXP allows a Java program to use the Extensible Stylesheet Language (XSL) to extract data from one XML document, process that data, and produce another XML document containing the processed data.
  • XSL can be used, for example, to extract information from an XML document and embed it within an XHTML document so that the information can be viewed using a web browser.
  • In this section, we will learn how to perform JAXP transformations between XML representations (text, DOM, and SAX events) and will introduce the JAXP API for XSL.
  • In later sections we will cover two key components of XSL itself: XPath and XSLT.

Transforming between XML Representations

  • Parsing – Can convert a XML document into a DOM tree. This tree can be manipulated using Java DOM API mathods
  • XML Transformations – is the “reverse” of the parsing operation – produce a textual representation of an internal DOM tree.
  • Example program – Reading a text XML document into a DOM Document object and then modifying this object
  • TransformerFactory – JAXP factory class – used to create an instance of Transformer.
  • The Transformer instance then calls the transform() method and performs the actual conversion from the DOM Document object to a text XML document.
  • The transform() method takes two arguments,
    • Object of class implementing the javax.xml.transform.Source interface
    • Object of class implementing javax.xml.transform.Result interface
  • JAXP supplies several classes implementing the Source interface:
    • xml.transform.dom.DOMSource, (DOM representation of a XML document)
    • xml.transform.sax.SAXSource, (SAX representation of a XML document) and
    • xml.transform.stream.StreamSource, (text representation of a XML document).
  • The Result interface is similarly implemented by JAXP classes DOMResult, SAXResult, and StreamResult, each located in the same package as its Source counterpart.

// JAXP classes

import javax.xml.transform.*;

import javax.xml.transform.dom.*;

import javax.xml.transform.stream.*;

import javax.xml.parsers.*;

// DOM classes

import org.w3c.dom.*;

// JDK classes

import java.io.*;

/** Input an RSS document, remove the first “item” element, and

output the resulting RSS document to System.out */

class DOMtoText

{

public static void main(String args[])

{

try

{

               // Input an RSS document into a DOM Document object

               DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();

               DocumentBuilder parser = docBuilderFactory.newDocumentBuilder();

               Document document = parser.parse(new File(args[0]));

               // Use the DOM API to remove the first item element

               NodeList items = document.getElementsByTagName(“item”);

               items.item(0).getParentNode().removeChild(items.item(0));

               // Use JAXP methods to output the modified Document object

               TransformerFactory tFactory = TransformerFactory.newInstance();

               Transformer transformer = tFactory.newTransformer();

               transformer.transform(new DOMSource(document), new StreamResult(System.out));

}

catch (Exception e)

{

 e.printStackTrace();

}

return;

     }

}

FIGURE 7.11 Program converting a DOM Document object to an XML text representation.