Event-oriented Parsing: SAX

XML Processing

  • There are several standard approaches to processing XML documents in Java.
    • based on the Document Object Model
    • Java based XML packages.

Figure 7.1 is an XML document that represents a RSS (rich site summary) feed, a document designed to provide summary information about a Web site.
It is used as the input file to explain the parsing methods

<!DOCTYPE rss

SYSTEM “http://my.netscape.com/publish/formats/rss-0.91.dtd”>

<rss version=”0.91″>

<channel>

<title>www.example.com</title>

<link>http://www.example.com/</link>

<description>

www.example.com is not a site that changes often…

</description>

<language>en-us</language>

<item>

<title>Announcing a Sibling Site!</title>

<link>http://www.example.org/</link>

<description>

Were you aware that example.com is not the only site in the example family?

</description>

</item>

<item>

<title>We’re Up!</title>

<link>http://www.example.net/</link>

<description>

Our new RSS feed is up. Visit us today!

</description>

</item>

</channel>

</rss>


Event-oriented Parsing: SAX

  • The DOM approach to XML processing is to first read and parse an entire XML document into a tree representation and then process this tree.
  • In effect, all communication between the parser and the application is by way of the document tree.
  • An alternative to the DOM approach is to have the parser interact with an application as it reads an XML document.
  • This is the approach taken by SAX (Simple API for XML).
  • In the SAX view of XML processing, as an XML parser is reading an XML document, certain events occur.
  • For example,
    • reading an element start tag is an event, as is reading its end tag or reading text contained within an element.
  • SAX allows an application to register event listeners with the XML parser.
  • A SAX parser calls these listeners as events occur and passes them information about the events.

SAX Parser – Steps

  1. Obtain the parser. Use the JAXP factory approach to obtain the parser. The SAXParserFactory produces a nonvalidating parser by default.
  2. Create the SAX XMLReader object
  3. Process the document
    1. Create an instance of a Java class that defines the event-handling methods and pass this instance to the parser via the setContentHandler() method
    2. Call the parse() method of the parser and pass the URL of the document as an argument
  4. Output the result.

Extra materials

DefaultHandler

  • DefaultHandler is the default base class for SAX2 event handlers. It provides a default set of empty event handler methods for all of the core SAX2 events. A Java program using the SAX2 API typically creates a subclass of DefaultHandler that overrides some or all of the default event handler methods.

Important methods:

  • void characters(char[] ch, int start, int length)
    • Receive notification of character data inside an element.
  • void endDocument()
    • Receive notification of the end of the document.
  • void endElement(String uri, String localName, String qName)
    • Receive notification of the end of an element.
  • void startDocument()
    • Receive notification of the beginning of the document.
  • void startElement(String uri, String localNmae, String qName, Attributes attr)
    • Receive notification of the start of an element.
  • void error(SAXParseExceptione)
    • Receive notification of a recoverable parser error.

  • Arguments passed to methods
    • ch – The characters.
    • start – The start position in the character array.
    • length – The number of characters to use from the character array.
    • qName – holds the qualified name of the element that is being started
    • uri – The Namespace URI of the element.
    • localName – The local name of the element (without prefix).
    • qName – The qualified name of the element that is being started
    • attr – The attributes attached to the element. If there are no attributes, it shall be an empty Attributes object. Methods provided by this object –
      • getLength() – returns the number of attribute specifications
      • getQName() – given an integer index, returns the qualified name
      • getURI() – given an integer index, returns the namespace name (URI)
      • getLocalName() – given an integer index, returns the local name of the attribute

Example, if a start tag such as the following is processed by a SAX parser:

  • <a id=”anc34″ href=”link details”>
    • Index of id is 0 and index of href is 1
    • getValue(0) and attr.getValue(“id”) will return anc34
    • getValue(1) and attr.getValue(“href”) will return “link details”.