NJ TIDE Center - Overview

Overview: In the hierarchical data model all data are described using parent-child relationships. Hence, each piece of data only takes on its full significance when seen in context.

In general, to describe data using the hierarchical model we define classes (sometimes called segments), their attributes (sometimes called fields), and the relationship between classes.

For example, data about trains are naturally hierarchical. A train has a consist (which is a group of locomotives), one or more cars, one or more cabooses, and a crew (which is a group of employees). This is illustrated in UML below:

(Note: It is important not to confuse this "has-a" relationship with the "is-a" relationship. A flatcar is a car (as is a boxcar and a refrigerated car). A flatcar has a container or a trailer on it.)

Operations on Hierarchical Data: We typically perform the following operations on hierarchical data:

Get	Direct retrieval
Get Next	Sequential retrieval across parents
Get Within	Sequential retrieval within a parent
Insert	Add an object
Delete	Delete an object (and its children)
Replace	Replace an object

Of course, we may also need to perform operations that have little or nothing to do with its hierarchical nature.

Organizing Hierarchical Data: When organizing hierarchical data you need to think about:

Concepts
Associations between concepts
Attributes of concepts

Typical concepts include physical objects, places, organizations, events, records, and containers. Typical associations include "part-of", "contained-in", "member-of", and "related-to".

It is important not to confuse concepts and attributes. In general, if if you think of it as numbers or letters it is probably an attribute, otherwise it is probably a concept. For example:

In other words, attributes should, in general be simple/pure/primitive data types (i.e., in which unique identity is not meaningful). The most important exceptions to this are when units are important (e.g., distances) and when delimiters are useful (e.g., phone numbers, social security numbers).

It is also important to think about the possible uses of the data. For example:

Which do you think is better? When?

Extensible Markup Language: The Extensible Markup Language (XML) is a subset of the Standard Generalized Markup Language (SGML). As such, it is really a meta-language (i.e., it is used to define other languages). This is very different from HTML, which should be viewed as an application of SGML/XML (i.e, a language defined using SGML/XML).

XML is simply a way of describing hierarchical data. Unlike HTML which can only be used to describe documents (which are hierarchical in nature), it can be used to describe hierarchical data of any kind.

Most XML elements have the following syntax:

< tag [ attribute="value " ] ... > [ component ... ] </tag >

In the following example:

    <Employee job="Engineer">John Smith</Employee>

The tag is Employee, the single attribute is job, the value of the job attribute is set to "Engineer", and the component is the string "John Smith". For elements that have no components, you can also use the following abbreviated syntax:

< tag [ attribute="value " ] ... />

For example:

    <Caboose id="857-931" type="limited" />

In a well-formed XML document:

All elements must have a start tag and a close tag (or must use the abbreviated syntax for empty tags);
There must be a single root element; and
The first line must be the XML declaration <?xml version="1.0" ?>

Note that XML tags and attributes are case-sensitive.

    
<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="null" ?>
<GRADES>
<STUDENT>
     <NAME>Kim Maschino</NAME>
     <ID>321-00-0021</ID>
     <GRADE>A+</GRADE>
</STUDENT>
<STUDENT>
     <NAME>Doug Sideler</NAME>
     <ID>001-31-9998</ID>
     <GRADE>B</GRADE>
</STUDENT>
<STUDENT>
     <NAME>Wayne Tromer</NAME>
     <ID>100-23-4167</ID>
     <GRADE>C</GRADE>
</STUDENT>
</GRADES>

XML Namespaces: Imagine that you had developed a payment database that contained the following element:

    
    <payment>
      <amount>21.35</amount>
      <check bank="157-1" >150</check>
    </payment>

and that someone at another organization developed a validation database that included the following element:

    
    <validation>
      <check location="store">
        <checker>
	  Mary Jones
        </checker>
      </check>
    </validation>

Now, suppose you wanted to combine the two systems so that payments could be validated. Unfortunately, a problem would arise. In particular, both databases use the "check" tag, but they use it in a very different way.

In order to ensure that tags and attributes are unique, XML allows the use of namespaces through the xmlns attribute. In fact, XML allows you to use namespaces in two different ways, unqualified and qualified.

An unqualified namespace applies to the element that has the xmlns attribute and all of its children. In the following example:

    <syllabus xmlns="http://www.cs.jmu.edu/users/bernstdh/">
      <header>
        <coursename>Programming for the WWW</coursename>
        <coursenumber>CS685</coursenumber>
      </header>
      <body>
        <title>Course Overview</title>
      </body>
    </syllabus>

all of the tags are within the given namespace.

A qualified namespace, on the other hand, allows you to apply the name space to particular elements. In the following example:

    <syllabus xmlns:dhb="http://www.cs.jmu.edu/users/bernstdh/">
      <dhb:header>
        <coursename>Programming for the WWW</coursename>
        <coursenumber>CS685</coursenumber>
      </dhb:header>
      <dhb:body>
        <dhb:title>Course Overview</dhb:title>
      </dhb:body>
    </dhb:syllabus>

The give namespace does not apply to the coursename and coursenumber elements since they do not include the qualifier.

The two types of namespaces can be used in a single document.

Providing Structure with DTDs: One way to impose structure on an XML document is to use a Document Type Definition (DTD). You associate a DTD with an XML document by including a DOCTYPE declaration in the XML document.

The DOCTYPE declaration (in the XML document) has the following syntax:

<!DOCTYPE

root

SYSTEM

"

dtd

"

>

where root is the root element of the XML document and dtd is the URL of the DTD. For example:

    <!DOCTYPE Train SYSTEM "train.dtd" >

A DTD typically has three types of declarations:

DOCTYPE	The root declaration.
ELEMENT	Declares an element type.
ATTLIST	Declares an attribute list for an element type.

The DOCTYPE declaration (in the DTD) has the following syntax:

<!DOCTYPE

root

[

declarations

]>

where root is the root element of the XML document and declarations are the ELEMENT and ATTLIST declarations that provide the structure.

The ELEMENT declaration has the following syntax:

<!ELEMENT

tag

(

contents

)

>

where tag is the tag that will be used for this element and content is a description of the allowed contents. For example:

    <!ELEMENT Train (Consist, Car, Caboose, Crew) >

The ATTLIST declaration has the following syntax:

<!ATTLIST

tag

attribute

values

#modifier

"

default

"

>

where tag is the relevant tag; attribute is the name of the attribute being declared; values is a description of the allowed values (or CDATA for any text), modifier is either REQUIRED, IMPLIED (i.e., optional with no default) or FIXED; and default is the default value of this attribute. For example:

    <!ATTLIST Car Owner CDATA #REQUIRED "JMU" >

When describing the allowable contents of element (i.e., the allowed child elements) you can use the following operators:

+	The element can appear one or more times.
*	The element can appear zero or more times.
?	The element can appear zero or one times.

In the following example:

    <!ELEMENT Train (Consist, Car+, Caboose*, Crew) >

a train must have exactly one consist, one or more cars, zero or more cabooses, and exactly one crew.

In addition, the following keywords can be used:

#PCDATA	The element can contain any text (i.e., parsed character data).
EMPTY	The element has no content.
ANY	The element can have any type of content.

Finally, lists of possible elements/values can be included if the items are separated by | characters. For example:

    <!ATTLIST Car Type (boxcar | flatcar | refrigerated | tanker) >

     
 <!ELEMENT timetable (train*) >
  <!ATTLIST timetable subtitle CDATA #IMPLIED>
  <!ATTLIST timetable title CDATA #IMPLIED>



<!ELEMENT station (#PCDATA) >


<!ELEMENT stop (station, time*) >



<!ELEMENT time (#PCDATA) >
  <!ATTLIST time status (Ar | Dp) #IMPLIED>



<!ELEMENT train (stop+) >
  <!ATTLIST train businessclass (YES | NO) "NO" >
  <!ATTLIST train normaldays (Daily | Mo-Fr | Sa | Su) #REQUIRED >
  <!ATTLIST train number CDATA #REQUIRED >
  <!ATTLIST train reservations (YES | NO) "NO" >

Providing Structure with Schemas: DTDs are somewhat difficult to use because they have a very different syntax from XML. Some people have argued that we should use XML to specify structure. This led to the development of schemas. Schemas are just like DTDs except that they use the syntax of XML. To link a schema to an XML document you include an xmlns attribute in the root element of the XML document as follows:

<

root_tag

xmlns="x-schema:

url

" >

where url denotes the URL of the schema.

A schema can be modeled (in UML) as follows:

The Schema element plays the same role as the DOCTYPE in a DTD. The xmlns attribute specifies the default namespace for the elements and attributes and the xmlns:prefix specifies the namespace for the data type attributes in the Schema. (Note: It is not necessary to use the string "prefix" - any string can be used as long as it is used consistently. A common string to use is "dt" which is short for data type.) For example:

    <Schema name="timetable_schema"
            xmlns="urn:schemas-microsoft-com:xml-data"
	    xmlns:dt="urn:schemas-microsoft-com:datatypes">
      .
      .
      .
    </Schema>

The ElementType element plays the same role as the ELEMENT in a DTD. It contains description elements, element elements, group elements, AttributeType elements, and attribute elements. The content attribute can be either "empty", "textOnly", "eltOnly" (i.e., elements only), or "mixed". The mode attribute can be either "open" (i.e., undefined content can appear) or "closed" (i.e., only content defined in the schema can be used. The order attribute defines how sequences of the element can appear. It can be either "one", "seq" (i.e., they must appear in the specified order), or "many" (i.e., none, any or all of the elements can appear in any order). For example:

    <ElementType name="station" content="textOnly" 
                 dt:type="string" model="closed" />

AttributeType element plays the same role as the ATTRIBUTE in a DTD. It contains description elements and datatype elements. The prefix:values attribute contains an enumerated list of possible values. The required attribute can be either "yes" or "no". For example:

    <AttributeType name="status" required="no" 
                   dt:type="enumeration" dt:values="Ar Dp"/>

The group element is used to "combine" several elements so that they can be assigned a sequence.

Aside: Style Sheets for XML Documents: Though it's not really relevant to this discussion of hierarchical data, it is interesting to note that you can attach a CSS style sheet to an XML document. To do so, just include an xml-stylesheet directive in the XML document as follows:

<?xml-stylesheet type="text/css" href="

url

" ?>

where url denotes the URL of the stylesheet.

    
<?xml version="1.0" ?>
<?xml-stylesheet type="text/css" href="reference.ss" ?>

<bibliography>

<reference>
  <author>
    <firstname>C.E.</firstname>
    <lastname>White</lastname>
  </author>,
  <author>
    <firstname>D.</firstname>
    <lastname>Bernstein</lastname>
  </author>and
  <author>
    <firstname>A.L.</firstname>
    <lastname>Kornhauser</lastname>
  </author>
  (<year>2000</year>)
  "<title>Some Map-Matching Algorithms</title>",
  <journal>Transportation Research C</journal>.
</reference>


<reference>
  <author>
    <firstname>D.</firstname>
    <lastname>Bernstein</lastname>
  </author>and
  <author>
    <firstname>I.</firstname>
    <lastname>El Sanhouri</lastname>
  </author>
  (<year>1999</year>)
  "<title>Driver Information and Congestion Pricing Systems</title>"in
  <book>Impacts of Driver Information Systems</book>
  (<editor>
    <firstname>P.</firstname>
    <lastname> Nijkamp</lastname>
  </editor>and
  <editor>
    <firstname>R.</firstname>
    <lastname>Emerink</lastname>
  </editor>, eds.)
  <publisher>John Wiley and Sons</publisher>,
  <pages>371-392</pages>.
</reference>



<reference>
  <author>
    <firstname>J.</firstname>
    <lastname>Padmos</lastname>
  </author>and
  <author>
    <firstname>D.</firstname>
    <lastname>Bernstein</lastname>
  </author>
  (<year>1997</year>)
  "<title>Personal Travel Assistants and the WWW</title>"
  <journal>Transportation Research Record</journal>,
  <volume>1573</volume>:
  <pages>52-56</pages>.
</reference>


<reference>
  <author>
    <firstname>R.</firstname>
    <lastname>Guensler</lastname>
  </author>and
  <author>
    <firstname>D.</firstname>
    <lastname>Bernstein</lastname>
  </author>
  (<year>1996</year>)
  "<title>Transportation Resources on the Internet</title>",
  <journal>ITE Journal</journal>,
  <volume>66</volume>:
  <pages>42-47</pages>.
</reference>



<reference>
  <author>
    <firstname>I.</firstname>
    <lastname>El Sanhouri</lastname>
  </author>and
  <author>
    <firstname>D.</firstname>
    <lastname>Bernstein</lastname>
  </author>
  (<year>1994</year>)
  "<title>Driver Information Systems with Congestion Pricing</title>",
  <journal>Transportation Research Record</journal>,
  <volume> 1450</volume>:
  <pages>44-52</pages>.
</reference>


<reference>
  <author>
    <firstname>D.</firstname>
    <lastname>Bernstein</lastname>
  </author>and
  <author>
    <firstname>A.</firstname>
    <lastname>Kanaan</lastname>
  </author>
  (<year>1993</year>)
  "<title>Automatic Vehicle Identification</title>",
  <journal>IVHS Journal</journal>,
  <volume> 1</volume>:
  <pages>191-204</pages>.
</reference>

</bibliography>

    
bibliography {

    font-family: serif; 
    font-size: 12pt;
}


journal {

    font-style: italic;
}


reference {

    display: block;
    margin-bottom: 1em;
    margin-left: 2em;
    text-indent: -2em;
}


volume {

    font-weight: bold;
}