Defining XHTML’s Abstract Syntax: XML
- Element Type Declarations
- Attribute List Declarations
- Entity declaration
- DTD files
Defining XHTML’s Abstract Syntax: XML
-
HTML is SGML-based – Standard Generalized Markup Language
-
XHTML is derived from HTML to conform to XML standards.
-
XHTML – EXtensible HyperText Markup Language – Stricter version of HTML – does not allow user to get away with lapses in coding and structure.
-
XML – eXtensible Markup Language. XML is a markup language much like HTML. XML was designed to store and transport data. XML was designed to be self-descriptive.
XML
-
The abstract syntax of a version of XHTML is defined by a set of text files known collectively as an XML Document Type Definition (DTD).
-
Basic elements of DTD:
-
Element Type Declaration
-
Attribute list declaration
-
Entity declaration
-
-
Consider a simple example of XHTML DTD:
<!ELEMENT html (head, body)>
<!ATTLIST html
lang NMTOKEN #IMPLIED
xml:lang NMTOKEN #IMPLIED
dir (ltr|rtl) #IMPLIED
id ID #IMPLIED
xmlns CDATA #FIXED ‘http://www.w3.org/1999/xhtml’>
<!ENTITY gt “>”>
Element Type Declaration:
<!ELEMENT html (head, body)>
-
Element Type Declarations – are used to specify the set of all valid elements in the language defined by the DTD.
-
The XHTML DTD contains exactly one element type declaration for each element in the language.
-
The string immediately following ELEMENT is the name of the element type being declared, in this case html.
-
The information following the element type name is known as the content specification for the element;
-
it provides information about the valid content of the element type being declared.
-
From example – the html element must have two children, a head element followed by a body element.
-
-
Several basic XML content specifications are shown in Table 2.6.
-
Example – the element type declaration for the <br/> element and <p> is
<!ELEMENT br EMPTY>
<!ELEMENT p ANY>
-
The keyword #PCDATA (“Parsed Character DATA”) used in defining the character data and mixed content types represents any string of characters (excluding less-than and ampersand, which are excluded because they represent the start characters for markup.)
-
More sophisticated specifications may be formed by appending one of the iterator characters of Table 2.7 to the basic sequence and choice content specification types.
-
Example <!ELEMENT select (optgroup|option)+>
-
is a choice specification type with the + iteration character
-
A select element may contain any number of optgroup and option elements in any order, as long as one or the other of these two elements appears at least once
-
-
Sequence and choice specifications can be nested, and an element name within a sequence or choice may have an iterator character suffixed to it.
-
Example –
<!ELEMENT table (caption?, (col*|colgroup*), thead?, tfoot?, (tbody+|tr+))>
-
A table may optionally begin with a caption, followed optionally by either a sequence of col or colgroup elements, followed optionally by a thead and then optionally by a tfoot and finally a sequence of one or more tbody or tr elements.
Attribute list declaration.
<!ATTLIST html
lang NMTOKEN #IMPLIED
xml:lang NMTOKEN #IMPLIED
dir (ltr|rtl) #IMPLIED
id ID #IMPLIED
xmlns CDATA #FIXED ‘http://www.w3.org/xhtml’>
-
An attribute list declaration is included in the DTD for each element that has attributes.
-
An attribute list declaration
-
begins with the keyword ATTLIST
-
followed by an element type name and
-
Three values
-
the attribute name – specifies the names for all the valid attributes of the named element,
-
the attribute type – specifies the type of data that may be used to specify the attribute value and the valid set of values for each attribute,
-
the default value declaration
-
-
-
From example – the html element has five attributes: lang, xml:lang, dir, id, and xmlns.
-
Table 2.8 gives the attribute types used in the definition of XHTML
-
NMTOKEN (name token) – a string of characters representing a name (“word”).
-
The ASCII characters that can be used in a NMTOKEN are letters, digits, and the four characters period (.), hyphen (-), underscore( ), and colon (:).
-
-
Enumerated attribute type – the allowable values for an enumerated type are separated by OR (|) symbols. The attribute can be assigned only one of specified values
-
ID attribute type – supplies an identifying name for its element. – May begin with a letter, underscore, or colon. – Must be unique
-
IDREF attribute type (an id reference) indicates that the value of the associated attribute must be identical to the value of the id of some element of the document. – used for linking one element with another element.
-
The IDREFS attribute type – similar to IDREF, except that it allows for a white-space-separated list of id values rather than the single id value allowed by IDREF.
-
CDATA attribute type – represents any string of characters that excludes the less-than(<), ampersand(&) and quoting characters(” or ‘)
-
The default declaration for an attribute specifies what value should be used
-
If no value is specified for the attribute in an element of the document or
-
If a value is assigned but does not conform to the attribute’s type.
-
-
The default declaration for an attribute can take one of the forms as shown in Table 2.9.
-
#IMPLIED
-
attribute need not be assigned a value in the start tag for the element and the DTD does not define a default value for the attribute
-
Application reading the XML document (Browser) may assign a default value of its choice to the attribute
-
-
#FIXED
-
the default value of an attribute and is not allowed to be overridden by the document
-
-
Default provided by DTD:
-
The DTD itself can also supply a default value for an attribute, which can be overridden by the user.
-
Example
-
<!ATTLIST form
…
method (get|post) “get”
…
>
-
#REQUIRED
-
a value must be specified for the corresponding attribute whenever the element containing that attribute appears in a valid document.
-
Entity declaration
-
Entity declaration begins with the keyword ENTITY followed by an entity name and its replacement text
-
Entity declaration is essentially a macro definition
-
From example – associating the name gt (an entity) with the string >.
-
An application reading a document containing an entity reference simply replaces the reference with the string represented by the entity, and then recursively processes this string.
-
XML also provides for a different type of entity that can be referenced from within DTDs and not from documents. Such entities are called parameter entities.
-
A parameter entity declaration is indicated in the DTD by following the ENTITY keyword with a percent sign (%)
-
Example:
<!ENTITY % URI “CDATA”>
-
The XHTML attribute list declaration for the html element is
<!ATTLIST html
lang NMTOKEN #IMPLIED
xml:lang NMTOKEN #IMPLIED
dir (ltr|rtl) #IMPLIED
id ID #IMPLIED
xmlns %URI; #FIXED ‘http://www.w3.org/1999/xhtml’>
-
This is equivalent to the version of this declaration given earlier.
DTD Files
-
Example Document Type Declaration for XHTML:
<!DOCTYPE html
PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
“http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”>
-
The string immediately following the PUBLIC keyword is called the formal public identifier for the DTD.
-
The URL at the end of the tag is the location of a copy of the DTD for the document instance that follows the DOCTYPE tag and is known as the system identifier for the DTD.