XML Processing
- There are several standard approaches to processing XML documents in Java.
- based on the Document Object Model
- Java based XML packages.
Figure 7.1 is an XML document that represents a RSS (rich site summary) feed, a document designed to provide summary information about a Web site.
It is used as the input file to explain the parsing methods
<!DOCTYPE rss
SYSTEM “http://my.netscape.com/publish/formats/rss-0.91.dtd”>
<rss version=”0.91″>
<channel>
<title>www.example.com</title>
<link>http://www.example.com/</link>
<description>
www.example.com is not a site that changes often…
</description>
<language>en-us</language>
<item>
<title>Announcing a Sibling Site!</title>
<link>http://www.example.org/</link>
<description>
Were you aware that example.com is not the only site in the example family?
</description>
</item>
<item>
<title>We’re Up!</title>
<link>http://www.example.net/</link>
<description>
Our new RSS feed is up. Visit us today!
</description>
</item>
</channel>
</rss>
Event-oriented Parsing: SAX
- The DOM approach to XML processing is to first read and parse an entire XML document into a tree representation and then process this tree.
- In effect, all communication between the parser and the application is by way of the document tree.
- An alternative to the DOM approach is to have the parser interact with an application as it reads an XML document.
- This is the approach taken by SAX (Simple API for XML).
- In the SAX view of XML processing, as an XML parser is reading an XML document, certain events occur.
- For example,
- reading an element start tag is an event, as is reading its end tag or reading text contained within an element.
- SAX allows an application to register event listeners with the XML parser.
- A SAX parser calls these listeners as events occur and passes them information about the events.
SAX Parser – Steps
- Obtain the parser. Use the JAXP factory approach to obtain the parser. The SAXParserFactory produces a nonvalidating parser by default.
- Create the SAX XMLReader object
- Process the document
- Create an instance of a Java class that defines the event-handling methods and pass this instance to the parser via the setContentHandler() method
- Call the parse() method of the parser and pass the URL of the document as an argument
- Output the result.
Extra materials
DefaultHandler
- DefaultHandler is the default base class for SAX2 event handlers. It provides a default set of empty event handler methods for all of the core SAX2 events. A Java program using the SAX2 API typically creates a subclass of DefaultHandler that overrides some or all of the default event handler methods.
Important methods:
- void characters(char[] ch, int start, int length)
- Receive notification of character data inside an element.
- void endDocument()
- Receive notification of the end of the document.
- void endElement(String uri, String localName, String qName)
- Receive notification of the end of an element.
- void startDocument()
- Receive notification of the beginning of the document.
- void startElement(String uri, String localNmae, String qName, Attributes attr)
- Receive notification of the start of an element.
- void error(SAXParseExceptione)
- Receive notification of a recoverable parser error.
- Arguments passed to methods
- ch – The characters.
- start – The start position in the character array.
- length – The number of characters to use from the character array.
- qName – holds the qualified name of the element that is being started
- uri – The Namespace URI of the element.
- localName – The local name of the element (without prefix).
- qName – The qualified name of the element that is being started
- attr – The attributes attached to the element. If there are no attributes, it shall be an empty Attributes object. Methods provided by this object –
- getLength() – returns the number of attribute specifications
- getQName() – given an integer index, returns the qualified name
- getURI() – given an integer index, returns the namespace name (URI)
- getLocalName() – given an integer index, returns the local name of the attribute
Example, if a start tag such as the following is processed by a SAX parser:
- <a id=”anc34″ href=”link details”>
- Index of id is 0 and index of href is 1
- getValue(0) and attr.getValue(“id”) will return anc34
- getValue(1) and attr.getValue(“href”) will return “link details”.