XML Vocabulary, Declarations and Namespace

XML Vocabulary, Declarations and Namespace
Figure 7.1 is an XML document that represents a RSS (rich site summary) feed, a document designed to provide summary information about a Web site.

<!DOCTYPE rss

SYSTEM “http://my.netscape.com/publish/formats/rss-0.91.dtd”>

<rss version=”0.91″>

<channel>

<title>www.example.com</title>

<link>http://www.example.com/</link>

<description>

www.example.com is not a site that changes often…

</description>

<language>en-us</language>

<item>

<title>Announcing a Sibling Site!</title>

<link>http://www.example.org/</link>

<description>

Were you aware that example.com is not the only site in the example family?

</description>

</item>

<item>

<title>We’re Up!</title>

<link>http://www.example.net/</link>

<description>

Our new RSS feed is up. Visit us today!

</description>

</item>

</channel>

</rss>

  • An application program processing this file needs to be aware that RSS feed documents have element types such as rss, title, link, and so on, and needs to understand what each of these elements represents.

Note

  • SYSTEM keyword is used instead of PUBLIC –
  • PUBLIC form of DOCTYPE is used for widely used DTDs (such as XHTML), while the SYSTEM form of DOCTYPE is generally used for DTDs that are intended for use within, say, a single corporation.

XML Vocabulary

  • An XML vocabulary is created by specifying a complete description of the elements and attributes for a specific type of XML document.
  • An XML vocabulary can be specified in a variety of ways.
  • Simple XML vocabularies intended to be used by a small group of developers may be specified informally using natural language.
  • Vocabularies intended to be made publicly available are normally specified more formally and may include an XML DTD, the attributes for each element type and their data types, entity definitions, and so on.

XML Versions and the XML Declaration

  • XML was developed under the auspices of the World Wide Web Consortium as a language for representing documents to be communicated over the World Wide Web.
  • The group that developed XML was formed in 1996 and published a first working draft of XML in the same year.
  • XML 1.0 was officially adopted as a W3C recommendation in early 1998.
  • Despite the relatively short development cycle during a time of rapid change for the Web in general, XML 1.0 has been widely adopted without the need for substantial modification. Although the W3C released an XML 1.1 recommendation in 2004, the W3C at the same time encouraged those who do not need 1.1’s new features to continue to use XML 1.0.
  • The XML recommendations suggest that every XML document begin with a special tag known as an XML declaration.
  • This is used to specify the version of XML used to write the document and optionally some additional meta-information about the document, such as the character set/encoding
  • The default character encoding for an XML document is UTF-16 if the document begins with the two-byte character 0xfeff or UTF-8 if the document begins with any other character.
  • An encoding other than one of these default encodings may be specified by including an encoding declaration within the XML declaration.
  • Example – If a minimal XML 1.0 declaration was added and the “Hello World!” XML document is encoded using the ISO-8859-1 character set the result would be

<?xml version=”1.0″ encoding=”ISO-8859-1″?>

<text>

Hello World!

</text>

  • Unlike attributes in element tags, the version and encoding declarations must appear in the order shown.

XML Namespaces

  • XML allows document authors to create custom elements.
  • This extensibility can result in naming collisions among elements in an XML document that have the same name.
    • For example, in an XML document, we may use the element address to mark up data about a residential address. Another document may use the element address to mark up data about an email id.
  • Using both of these elements in the same document could create a naming collision, making it difficult to determine which kind of data each element contains.
  • XML namespaces provide a means for document authors to unambiguously refer to different elements with the same name (i.e., prevent collisions).
  • An XML namespace is a collection of element and attribute names associated with a particular XML vocabulary through an absolute URI known as the namespace name.
  • An example namespace name is http://www.w3.org/1999/xhtml, which is specified as the namespace name for XHTML 1.0 by its recommendation [W3C-XHTML-1.0].
  • Recall that RSS is one example of an XML vocabulary.
  • Some key element type names specified for RSS are given in Table 7.1.
  • The W3C’s XMLNamespace recommendation [W3C-XML-NAMESPACE-1.1] provides a mechanism for identifying each element and attribute name within a document with a specific XML vocabulary.
  • Example – XHTML 1.0 requires that every XHTML document be associated with this namespace name by including an xmlns attribute specification in the root element of the document:

<html xmlns=”http://www.w3.org/1999/xhtml”>

  • When specified on a root element as shown, the xmlns attribute specifies a default namespace for the entire document. So, in an XHTML document, the xmlns specification indicates that all element type names within the document – including html – belong by default to the XML namespace having namespace name http://www.w3.org/1999/xhtml.
  • In documents containing embedded elements, such as XHTML elements within an RSS document, the document must associate a namespace prefix with the namespace containing the embedded element types.
  • Example –

<rss version=”0.91″ xmlns:xhtml=”http://www.w3.org/1999/xhtml”>

<item>

<title>Announcing a Sibling Site!</title>

<link>http://www.example.org/</link>

<description>Please visit our

<xhtml:a href=”homepage.com”> homepage</xhtml:a>

</description>

</item>

  • In example elements <item>, <title>, <link> and <description> belong to RSS namespace and element <xhtml:a> refers to xhtml namespace.
  • An xmlns attribute specification of this form is called a namespace declaration.