Tutorial 9 Case Study 2 Xml

Processing XML with jQuery

Build dynamic, XML-based UI apps with jQuery, XML, DOM, and Ajax

Aleksandar Kolundzija
Published on February 01, 2011

Before you start

While this tutorial is useful to seasoned developers looking to pick up or sharpen their jQuery and XML processing skills, it also provides a practical overview of essential DOM scripting concepts, which can bring even the most junior JavaScript coders up to speed and allow them to grasp the full scope of the tutorial.

About this tutorial

Frequently used acronyms
  • Ajax: Asynchronous JavaScript + XML
  • DOM: Document Object Model
  • HTML: Hypertext Markup Language
  • JSON: JavaScript Object Notation
  • UI: User interface
  • W3C: World Wide Web Consortium
  • XML: Extensible Markup Language

As advanced media- and data-rich web applications grow in population within the browser, technologies such as XML and jQuery become important components in their architecture due to their wide adoption and flexibility. In this tutorial, you explore DOM processing within the browser, narrowing the focus to how these paradigms apply to XML in particular and how the jQuery library can speed up development and increase robustness.

The tutorial covers these specific topics:

  • Introduction to the DOM
  • XML and JavaScript in the browser
  • jQuery and XML
  • Case study: LiveXMLEditor


This tutorial assumes that you have a basic understanding of HTML and JavaScript. To follow along with the code you need the following:

  • Your favorite text editor for writing and editing code.
  • The jQuery library. You can either download it and serve it locally, or include and serve it directly from the Google CDN.
  • A good browser. While most browsers in use today are supported, review the jQuery Browser Compatibility page for recommended browsers. Many UI engineers choose Firefox for development due to its useful plug-ins, of which the most popular is Firebug. Firebug is not required for this tutorial, but it is highly recommended.
  • Familiarity with a server-side language (PHP in particular) helps with specific sections, but it is not essential.

See Related topics for links to all the tool downloads.

Introduction to the Document Object Model (DOM)

Before you dig into jQuery and XML, let's go over the basics of the concepts that you explore in this tutorial.

The DOM is the structure of an HTML or XML file represented in a way that allows it to be modified programmatically. In this section, you take a basic HTML document and explore DOM traversal and manipulation methods with JavaScript through a few simple examples.

Brief introduction to DOM manipulation in JavaScript

JavaScript provides a complete set of simple (though verbose) methods for accessing, traversing, and manipulating the DOM. I do not go into great detail on this topic (see Related topics for a list of recommended resources and tutorials), but will quickly go over a couple of simple examples.

Assume that you are working with a very basic HTML document that looks like Listing 1.

Listing 1. A simple HTML document
<!DOCTYPE html> <html> <head> <title>This is the page Title</title> </head> <body class="signed-out"> <div id="header"> <ul id="nav"> <li><a href="/home" class="home">Home</a></li> <li><a href="/about" class="about">About</a></li> <li><a href="/auth" class="auth">Sign In</a></li> </ul> </div> <div id="article"> <h1>A Plain DOM</h1> <h2>An sample <b>HTML</b> document.</h2> <div id="section"></div> </div> </body> </html>

In Listing 1, a header contains a list of navigation links. To get a JavaScript object representation of the DIV tag (or element) with id , you can run the code in Listing 2.

Listing 2. Getting a DOM element by id
<script type="text/javascript"> var article = document.getElementById("article"); </script>

The method is the fastest way to retrieve a DOM element in JavaScript. The id attribute of an HTML element provides the browser with direct access to that element. As browsers provide no warning for duplicate ids, and as Microsoft® Internet Explorer® treats the name attribute as an id, avoiding duplicates and watching out for the Internet Explorer collisions is your responsibility. That said, in practice these issues are generally simple to avoid and therefore not a big concern.

A second method of DOM element retrieval to look at is . This more versatile method is essential for processing XML because with that format you do not have the luxury of relying on element id attributes. Look at how works. To retrieve the contents of the H1 node you might execute the code in Listing 3.

Listing 3. Getting a set of elements by tag name
<script type="text/javascript"> var headers = document.getElementsByTagName("h1"); // returns array of H1 elements alert(headers[0].innerHTML); // alerts inner html of the first H1 tag: "A Plain DOM" </script>

Note two interesting things here. First, returns an array (a collection of elements, as the name implies). Because this example has only a single H1 element, you can retrieve it at index 0. That's almost never a safe assumption to make, though, because the element might not exist and the code above might throw an error. Instead, always check that the element exists before you attempt to access its properties (or methods).

Second, you've likely noticed the property. As the name implies, this property provides access to the contents of an element, which in the case above is just a string. Had the H1 element contained other elements (tags), its value would contain those as well as a part of the string (see Listing 4).

Listing 4. Using innerHTML to retrieve value with HTML elements
<script type="text/javascript"> var subheaders = document.getElementsByTagName("h2"); // returns array of H1 elements alert(subheaders[0].innerHTML); // "An sample <b>HTML</b> document." </script>

In addition to innerHTML, browsers also provide a property for retrieving only the text contents of an element. However, this property is named in Internet Explorer and in other browsers. To use it safely across all browsers you might do something similar to Listing 5.

Listing 5. Using and properties across different browsers
<script type="text/javascript"> var headers = document.getElementsByTagName("h1"); // returns array of H1 elements var headerText = headers[0].textContent || headers[0].innerText; </script>

The variable gets the value of if it exists, and the value of otherwise. A more sensible way to treat this task is to create a cross-browser function that does that, but as you see later, jQuery already provides one.

The HTML page also has a Sign In link. Suppose that the user has logged in using a separate process, and you want to reflect that in the navigation by changing the Sign In label to Sign Out. In the previous example, you retrieved the text value of the node using . This property can be written to it as well. The bit of JavaScript in Listing 6 achieves that step.

Listing 6. Updating the innerHTML value of a DOM node
<script type="text/javascript"> var authElem = document.getElementById("auth"); // returns single element authElem.innerHTML = "Sign Out"; // Updates element contents </script>

In addition to updating the values of existing nodes, you can create completely new DOM elements and append them to the DOM using an existing element (see Listing 7).

Listing 7. Creating and injecting a new DOM node
<script type="text/javascript"> var ulElems = document.getElementsByTagName("ul"); // returns array of UL elements var ul = ulElems[0]; // get the first element (in this case, the only one) var li = document.createElement("li"); // create a new list item element var text = document.createTextNode("List Item Text"); // create a text node li.appendChild(text); // append text node to li node (still not added to DOM) ul.appendChild(li); // append new list node to UL, which inserts it into DOM </script>

While methods and are often executed on the document object, you can also (more efficiently, in fact) execute them on any other element and reduce the retrieval scope to the current element's children. This approach obviously assumes that elements you are accessing are children of the element that the methods are called on. Keep this notion of context in mind as it comes up later when you look at processing XML with jQuery.

Removing elements from the DOM is also trivial. To remove a node, first retrieve it, then remove it by referencing it through its parent (see Listing 8). (See more on and related properties below.)

Listing 8. Removing an element from the DOM
<script type="text/javascript"> var firstList = document.getElementsByTagName("ul")[0]; firstList.parentNode.removeChild(firstList); </script>

Last, let's go over attribute assigning and removing, using and (see Listing 9).

Listing 9. Setting and removing an element attribute
<script type="text/javascript"> var firstUL = document.getElementsByTagName("ul")[0]; firstUL.setAttribute("class", "collapsed"); firstUL.getAttribute("class"); // returns "collapsed" </script>

DOM traversal in JavaScript

Aside from element selection and manipulation, JavaScript provides a complete set of traversal properties. You saw an example use of in the previous listing. Given an element, you can navigate from it to surrounding elements within the DOM using these basic references (see Listing 10).

Listing 10. JavaScript's DOM traversal properties
firstChild lastChild nextSibling parentNode previousSibling

See Related topics for a link to a complete node property listing.

A DOM representation of these elements, in reference to a given <node>, looks like Listing 11.

Listing 11. Relationship of related DOM elements
<parent_node> <previous_sibling/> <node> <first_child/> ... <last_child/> </node> <next_sibling/> </parent_node>

Last, consider the tree representation of this relationship as shown in Figure 1. First is the parentNode, which includes the node plus its previousSibling and nextSibling elements. The node can contain one or more child nodes (with firstChild and lastChild elements)

Figure 1. Tree representation of adjacent nodes

View image at full size

A couple of these JavaScript node references come in handy when creating a function for traversing the DOM from a starting node (see Listing 12). You'll revisit this function when you look at parsing an entire XML document later in this tutorial.

Listing 12. JavaScript function for traversing the DOM
<script type="text/javascript"> function traverseDOM(node, fn){ fn(node); // execute passed function on current node node = node.firstChild; // get node's child while (node){ // if child exists traverseDOM(node, fn); // recursively call the passed function on it node = node.nextSibling; // set node to its next sibling } } // example: adds "visited" attribute set to "true" to every node in DOM traverseDOM(document, function(curNode){ if (curNode.nodeType==="3"){ // setAttribute() only exists on an ELEMENT_NODE // add HTML5 friendly attribute (with data- prefix) curNode.setAttribute("data-visited", "true"); } }); </script>

By now, you should have a good understanding of the basics of DOM traversal and manipulation as they apply to HTML documents. In the next section, you'll look at how this applies to XML documents.

XML DOM and JavaScript in the browser

Before you can process XML, you first need to expose it to JavaScript in the browser. This section covers different ways to achieve that and explores how JavaScript can then process the imported XML DOM.

XML node types

Before digging into processing XML, let's go over the different XML node types and their named constants. While this is an easy topic to ignore when dealing with HTML, it is crucial when processing XML due to that format's extensible, and therefore, unpredictable structure. It is precisely this difference that requires the custom methods that I cover here for the XML processor.

Here are the 12 different XML node types:

    You can use JavaScript to access an element's property and check its type. The function in Listing 13 returns true if the passed node is a comment node and false otherwise. Although this function has no jQuery dependencies you'll explore it further when you look at parsing XML node values.

    Listing 13. JavaScript function for determining if the node element is a comment
    <script type="text/javascript"> function isNodeComment(node){ return (node.nodeType===8); } </script>

    I do not go into details about each of the node types in this tutorial, but being familiar with the node types is essential for handling nodes and their values accordingly.

    Client-side XML processing with JavaScript

    Many of the same JavaScript methods used previously to process HTML apply directly when working with XML; however, you can't rely on referencing elements by id and should use the more generic method of retrieval by tag name. Note that when processing XML, tag names are case sensitive.

    Assume that you are working with this simple XML file in Listing 14.

    Listing 14. A simple XML file
    <?xml version="1.0" encoding="UTF-8" ?> <item content_id="1" date_published="2010-05-25"> <description></description> <body></body> <related_items> <related_item content_id="2"></related_item> <related_item content_id="3"></related_item> </related_items> </item>

    Exposing XML to JavaScript in the browser

    The first step toward parsing the XML in Listing 14 is exposing it to JavaScript. You can do this several ways:

    1. Server-side rendering of XML as a JavaScript string variable
    2. Server-side rendering of XML into a textarea element
    3. Loading XML into the browser through Ajax

    The detailed steps for each option are:

    1. Server-side rendering of XML as a JavaScript string variable

      Using a server-side programming language such as PHP, you can output the XML string into a JavaScript variable. This isn't the most elegant or even the most practical approach, but it works. The advantage of this approach is that you can load the XML from any URL as well as from the local server (see Listing 15).

      Listing 15. Writing XML into a JavaScript variable from PHP
      <?php $xmlPath = "/path/to/file.xml"; // or http://www.somedomain.com/path/to/file.xml $xmlFile = file_get_contents($xmlPath); ?> <script type="text/javascript"> var xmlString = "<?=$xmlFile?>"; </script>
    2. Server-side rendering of XML into a textarea element

      A slightly different approach consists of loading the XML into a <textarea> field (which does not need to be visible). Then, using the property covered earlier, retrieve the string and expose it to JavaScript.

      You can output the PHP variable () defined here to an HTML textarea field with an id for easy reference:

      Then, using the methods covered earlier, you can simply pull the value out into the JavaScript variable for further processing (see Listing 16).

      Listing 16. Exposing XML to JavaScript from a textarea element
      <script type="text/javascript"> var xmlString = document.getElementById("xml").innerHTML; </script>

      Due to the differences in browser support for XML, use the following JavaScript function for creating a DOM from an XML string (see Listing 17).

      Listing 17. Cross-browser JavaScript function for converting an XML string into a DOM object
      <script type="text/javascript"> /** * Converts passed XML string into a DOM element. * @param xmlStr {String} */ function getXmlDOMFromString(xmlStr){ if (window.ActiveXObject && window.GetObject) { // for Internet Explorer var dom = new ActiveXObject('Microsoft.XMLDOM'); dom.loadXML(xmlStr); return dom; } if (window.DOMParser){ // for other browsers return new DOMParser().parseFromString(xmlStr,'text/xml'); } throw new Error( 'No XML parser available' ); } var xmlString = document.getElementById("xmlString").innerHTML; var xmlData = getXmlDOMFromString(xmlString); </script>

      Let's also look at the function for reversing this process. Given an XML DOM object, the function in Listing 18 returns a string.

      Listing 18. Cross-browser JavaScript function for returning a string representation of an XML DOM object
      <script type="text/javascript"> /** * Returns string representation of passed XML object */ function getXmlAsString(xmlDom){ return (typeof XMLSerializer!=="undefined") ? (new window.XMLSerializer()).serializeToString(xmlDom) : xmlDom.xml; } </script>
    3. Loading XML into the browser through Ajax

      The last method of exposing the XML to JavaScript is through Ajax. Because I use jQuery to perform this operation I cover this method in more detail after introducing that library.

    Processing XML with JavaScript

    Let's see how standard JavaScript methods for DOM processing shown earlier apply to XML. To retrieve the description of the current item and ids of related items you can do something similar to Listing 19.

    Listing 19. XML Processing using JavaScript
    <script type="text/javascript"> // get value of single node var descriptionNode = xmlData.getElementsByTagName("description")[0]; var description = descriptionNode.firstChild && descriptionNode.firstChild.nodeValue; // get values of nodes from a set var relatedItems = xmlData.getElementsByTagName("related_item"); // xmlData is an XML doc var relatedItemVals = []; var tempItemVal; for (var i=0,total=relatedItems.length; i<total; i++){ tempItemVal = relatedItems[i].firstChild ? relatedItems[i].firstChild.nodeValue : ""; relatedItemVals.push(tempItemVal); } // set and get attribute of a node description.setAttribute("language", "en"); description.getAttribute("language"); // returns "en" </script>

    Look more closely at this code. The method , which you saw before, is essential for processing XML because it allows you to select all XML elements of a given name. (Again, keep in mind that when you process XML it is case sensitive.) You then safely retrieve the description value by first checking if the has a firstChild. If so, you go on to access its . When you try to access a specific node's text value, things start to get a little tricky. Although some browsers support the previously covered property for XML documents, most do not. You first have to check whether it has a (, comment or child node) and if it does, retrieve that . If the value doesn't exist, you set it to an empty string. (You can ignore empty values and only store actual values, but for the purposes of this example let's maintain the number of items and keep the indexes in sync.)

    Last, you see that and methods work as they did with an HTML file.

    You've now seen how to process HTML and XML Document Object Models using plain old JavaScript. In the next section, I introduce jQuery and show how this powerful library not only simplifies processing but also increases your control over all aspects of DOM interaction.

    jQuery and XML

    Likely the main reasons for jQuery's huge popularity are its fast and simple traversal engine and its slick selector syntax. (Excellent documentation also really helps.) And although its primary use is HTML processing, in this section you explore how it works and how to apply it to processing XML files as well.

    DOM manipulation and traversal with jQuery

    To access any of jQuery's features you first need to make sure that the file jquery.js is included on the page. Having done that, you simply call or the shorthand version and pass it a selector as the first argument. A selector is usually a string that specifies an element or a collection of elements if more than an element matches the given selector. Listing 20 shows some basic jQuery selectors.

    Listing 20. Basic jQuery selectors
    <script type="text/javascript"> var allImages = $("img"); // all IMG elements var allPhotos = $("img.photo"); // all IMG elements with class "photo" var curPhoto = $("img#currentPhoto"); // IMG element with id "currentPhoto" </script>

    Keep in mind that the return value of the jQuery function always returns a jQuery object. This object is what allows the chaining of methods (see Listing 21), a feature it shares with a few other popular JavaScript frameworks (likely influenced by the Ruby programming language).

    Listing 21. Basic jQuery operation with chained method calls
    <script type="text/javascript"> $("img").css({"padding":"1px", "border": "1px solid #333"}) .wrap("<div class='img-wrap'/>"); </script>

    This code selects all images, sets padding and border on each of them, then wraps each in a DIV with class . As you can tell, that's quite a bit of cross-browser functionality reduced to just a single line of code. For thorough information on jQuery selectors and methods, check out the excellent documentation on the jQuery website (see Related topics).

    Listing 22 shows how jQuery simplifies examples from the previous section.

    Listing 22. Creating and injecting a DOM node with jQuery
    <script type="text/javascript"> alert($("h1:first").html()); // .text() also works and might be better suited here $("#auth").text("Sign Out"); var $li = $("<li>List Item Text</li>"); // $ is used as var prefix to indicate jQuery object $("ul#nav").append($li); </script>

    Processing XML with jQuery

    I mentioned that the first argument passed to the function is the string selector. The less common second argument allows you to set the context, or starting node for jQuery, to use as a root when making the selection. By default, jQuery uses the document element as the context, but optimizing code is possible by restricting the context to a more specific (and therefore smaller) subset of the document. To process XML, you want to set the context to the root XML document (see Listing 23).

    Listing 23. Retrieving values from an XML document with jQuery
    <script type="text/javascript"> // get value of single node (with jQuery) var description = $("description", xmlData).text(); // xmlData was defined in previous section // get values of nodes from a set (with jQuery) var relatedItems = $("related_item", xmlData); var relatedItemVals = []; $.each(relatedItems, function(i, curItem){ relatedItemVals.push(curItem.text()); }); </script>

    That code cleans things up quite a bit. By passing the node name to the core function and setting the context,, you quickly get access to the node set you want. Getting the value of the node, though, is something that needs some exploration.

    As the property does not work for non-HTML documents, you cannot rely on jQuery's method to retrieve the contents of a node. jQuery also provides a method for cross-browser retrieval of the text of an HTML node. The method, as mentioned earlier, is a cross-browser wrapper for the property, but even it behaves inconsistently across browsers when processing XML. Internet Explorer, for example, ignores what it considers the empty node values (spaces, tabs, breaks) as the contents of a node. This approach might seem more intuitive than Firefox's handling of the same, which interprets the element from the sample XML file as a set of text nodes along with the nodes. To get around this inconsistency, create custom methods for treating text nodes consistently. In doing so (see Listing 24) you make use of a few handy jQuery methods: , and .

    Listing 24. Cross-browser JavaScript functions for accurate text value retrieval of a node

    Now look at how to set the node value (see Listing 25). Two things to keep in mind are that this operation is potentially destructive, as setting the text value of the root node overwrites all of its children. Also note that if a specific node has no prior text value, instead of setting it using , set it with because Internet Explorer doesn't like the first method (the property doesn't exist when blank).

    Listing 25. Cross-browser JavaScript function for accurate setting of the text value of a node
    <script type="text/javascript"> function setNodeValue(node, value){ var textNodes = getTextNodes(node); if (textNodes.get(0)){ textNodes.get(0).nodeValue = value; } else { node["textContent"] = value; } } </script>

    DOM attributes and jQuery

    Processing attributes of DOM elements is already pretty straightforward with plain old JavaScript as shown in examples from the previous section. As expected, jQuery provides simple equivalents for these, but furthermore, attributes can be used in selectors—a very powerful feature (see Listing 26).

    Listing 26. Getting and setting DOM element attributes with jQuery
    <script type="text/javascript"> var item = $("item[content_id='1']", xmlData); // select item node with content_id attribute set to 1 var pubDate = item.attr("date_published"); // get value of date_published attribute item.attr("archive", "true"); // set new attribute called archive, with value set to true </script>

    As you can see, jQuery's method supports both the retrieval and setting of attributes. More importantly, jQuery provides excellent access to element retrieval by allowing attributes in selectors. In the example above, you selected the item with attribute set to , from the context.

    Loading XML through Ajax with jQuery

    As you probably already know, Ajax is a web technology for asynchronous retrieval of XML (or text) from the server using JavaScript. Ajax itself relies on the (XHR) API to send a request to and receive a response from the server. In addition to providing excellent DOM traversal and manipulation methods, jQuery also offers thorough, cross-browser Ajax support. That said, the loading of XML through Ajax is as native as Ajax gets, so you're on familiar ground. The way this works in jQuery is shown in Listing 27.

    Listing 27. Loading an external XML file with jQuery's Ajax method

    The method has a number of additional options and can also be called indirectly through shortcut methods such as , which imports and executes a JavaScript file, , which loads a JSON data file and makes it available to the success script, and so on. When requesting a file of type XML, though, you're stuck with the core method that has the advantage of forcing you to know only its syntax for any circumstance. In the example above, you simply request file /path/to/data.xml, specifying that the is "xml" and that the request method is . After the browser receives a response from the server, it triggers either the success or the error callback function accordingly. In this example, a success callback alerts the total number of nodes. jQuery's star selector (*) matches all nodes. The key point is to note that the success callback function receives the data from the server as the first argument. The name of the variable is up to you, and as described earlier, that value becomes the context passed to any jQuery call intended to process the XML.

    An important thing to keep in mind when processing Ajax in general is the cross-domain restriction, which prevents retrieval of files from different domains. The previously covered methods of server-side XML retrieval might be viable alternatives in your application.

    Processing external XHTML as XML

    Because XHTML is a subset of valid XML, there's no reason why you can't process it the same way you process XML. Why exactly you would want to is a separate topic, but the point is that you could. For instance, scrapping a (valid) XHTML page and extracting data from it is perfectly doable using this technique, even though I encourage a more robust approach.

    While primarily intended for HTML DOM traversal and manipulation, jQuery can also be used for processing XML as well, though it requires the additional step of a getting the file to the browser. The topics covered in this section explain the different methods and provide the methods essential for processing the XML effectively.

    Case Study: Live XML Edit

    In this section, you apply what you learned to create a browser-based XML editor.

    Live XML Edit

    While I don't recommend editing XML by hand, I can think of too many cases where precisely that approach was taken. So, in part as an academic exercise and in part as a useful tool, I set out to build a browser-based XML editor. A primary goal was to process the XML directly, rather than convert it to a different format such as JSON, make the updates, then transform back to XML. Making the edits live ensures that the only affected parts of the file are those that were actually edited, which means less room for error and faster processing. The techniques covered in this tutorial were essential in putting this together. Take a closer look at how they applied.

    Figure 2 shows Live XML Edit.

    Figure 2. Live XML Edit

    View image at full size

    Uploading and loading XML through Ajax

    LiveXMLEdit uses Ajax to get the XML into the page. The user is required to upload the XML file he wants to edit, which is then saved on the server, and brought in using described in Listing 27. A reference to the original XML object is saved and edited directly. This approach means that after the user is finished editing the file no transformations are necessary as the updated DOM already exists.

    Rendering collapsible and editable HTML tree representation of the XML

    After the XML is available to JavaScript, it is traversed using the method (see Listing 11), and each of the nodes along with their attributes is rendered as nested unordered lists (UL). jQuery is then used to assign handlers for expanding and collapsing of elements, which simplifies the display and editing of larger documents. Also rendered are action buttons that provide editing features.

    Adding methods to handle (and store) live edits

    Along with rendering buttons for editing and deleting nodes and making updates for edited fields, the success handler of the Ajax call that loads the XML also assigns handlers for processing various user interactions and events. jQuery provides different means of assigning handlers, but for unpredictably large DOMs by far the most efficient is the method or its younger (and even more performant) sibling, . Rather than catch events at the target element, these methods handle events at the document or the specified element, respectively. This approach has a number of benefits—faster binding and unbinding, and support for existing as well as future nodes that match the selector (key in this case because users can create new XML nodes that should behave just like existing ones.)

    Server-side script to save the updated file

    Although server-side processing is beyond the scope of this article, it is necessary for saving the edited file. For the code sample, check out the entire application on GitHub (see Related topics), but as far as the browser processing is concerned, simply convert the updated XML DOM to a string and post it to a server script. The script itself retrieves the post and saves it as a file.


    The DOM provides a powerful means of activating HTML and XML structures within the browser. You saw how to do this with plain old JavaScript and also how jQuery's fast, robust, and well supported feature set can greatly simplify development while ensuring cross-browser compatibility. You also reviewed different ways to expose the XML to JavaScript in the browser and methods for accurately processing it with the help of jQuery. The list of resources helped me put together both this article and the Live XML Edit application.

    Downloadable resources

    Related topics

    • DOM objects and methods tutorial (Mark "Tarquin" Wilton-Jones, January 2009): Find a comprehensive listing of all properties, collections, and methods of the W3C DOM.
    • The Mozilla Developer Center: Visit a great resource for web developers.
    • LiveXMLEditor: View more information about LiveXMLEditor.
    • JavaScript and DOM basics: View Siarhei Barysiuk's excellent slides.
    • jQuery JavaScript Library: Visit jQuery's thorough documentation website.
    • Process XML in the browser using jQuery (Uche Ogbuji, developerWorks, December 2009): Find key information for processing namespaced XML and navigate some major pitfalls to gain the benefits of the popular Web application API.
    • JavaScript and the Document Object Model (Nicholas Chase, developerWorks, July 2002): Look at the JavaScript approach to DOM and the building of a web page to which the user can add notes and edit note content.
    • JavaScript tutorial: Learn how to use the scripting language of the web.
    • jQuery: Download the jQuery JavaScript library, available under either the MIT or GNU Public License. You can either download it and serve it locally, or include and serve it directly from the Google CDN.
    • jQuery Browser Compatibility: Visit this page for a list of recommended browsers.
    • Firebug: Get the essential debugging tool for Firefox users.
    • LiveXMLEditor: Try the XML editor created by the author of this tutorial.
    • PHP: Hypertext Preprocessor: Get the widely-used scripting language that is well suited for web development and can be embedded into HTML. This tutorial uses PHP 5.2 or higher.
    • IBM certification: Find out how you can become an IBM-Certified Developer.
    • XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks. Also, read more XML tips.

    Introduction to XML

    Doug Tidwell
    Published on August 07, 2002

    About this tutorial

    Should I take this tutorial?

    This newly revised tutorial discusses what XML is, why it was developed, and how it's shaping the future of electronic commerce. Along the way, it also takes a look at several XML standards and programming interfaces, shows how you can get started with this technology, and describes how a couple of companies have built XML-based solutions to simplify and streamline their enterprises.

    In this tutorial, you'll learn:

    • Why XML was created
    • The rules of XML documents
    • How to define what an XML document can and cannot contain
    • Programming interfaces that work with XML documents
    • What the main XML standards are and how they work together
    • How companies are using XML in the real world

    What is XML?


    XML, or Extensible Markup Language, is a markup language that you can use to create your own tags. It was created by the World Wide Web Consortium (W3C) to overcome the limitations of HTML, the Hypertext Markup Language that is the basis for all Web pages. Like HTML, XML is based on SGML -- Standard Generalized Markup Language. Although SGML has been used in the publishing industry for decades, its perceived complexity intimidated many people that otherwise might have used it (SGML also stands for "Sounds great, maybe later"). XML was designed with the Web in mind.

    Why do we need XML?

    HTML is the most successful markup language of all time. You can view the simplest HTML tags on virtually any device, from palmtops to mainframes, and you can even convert HTML markup into voice and other formats with the right tools. Given the success of HTML, why did the W3C create XML? To answer that question, take a look at this document:

    <p><b>Mrs. Mary McGoon</b> <br> 1401 Main Street <br> Anytown, NC 34829</p>

    The trouble with HTML is that it was designed with humans in mind. Even without viewing the above HTML document in a browser, you and I can figure out that it is someone's postal address. (Specifically, it's a postal address for someone in the United States; even if you're not familiar with all the components of U.S. postal addresses, you could probably guess what this represents.)

    As humans, you and I have the intelligence to understand the meaning and intent of most documents. A machine, unfortunately, can't do that. While the tags in this document tell a browser how to display this information, the tags don't tell the browser what the information is. You and I know it's an address, but a machine doesn't.

    Rendering HTML

    To render HTML, the browser merely follows the instructions in the HTML document. The paragraph tag tells the browser to start rendering on a new line, typically with a blank line beforehand, while the two break tags tell the browser to advance to the next line without a blank line in between. While the browser formats the document beautifully, the machine still doesn't know this is an address.

    Figure 1. HTML address

    Processing HTML

    To wrap up this discussion of the sample HTML document, consider the task of extracting the postal code from this address. Here's an (intentionally brittle) algorithm for finding the postal code in HTML markup:

    If you find a paragraph with two tags, the postal code is the second word after the first comma in the second break tag.

    Although this algorithm works with this example, there are any number of perfectly valid addresses worldwide for which this simply wouldn't work. Even if you could write an algorithm that found the postal code for any address written in HTML, there are any number of paragraphs with two break tags that don't contain addresses at all. Writing an algorithm that looks at any HTML paragraph and finds any postal codes inside it would be extremely difficult, if not impossible.

    A sample XML document

    Now let's look at a sample XML document. With XML, you can assign some meaning to the tags in the document. More importantly, it's easy for a machine to process the information as well. You can extract the postal code from this document by simply locating the content surrounded by the and tags, technically known as the element.

    <address> <name> <title>Mrs.</title> <first-name> Mary </first-name> <last-name> McGoon </last-name> </name> <street> 1401 Main Street </street> <city>Anytown</city> <state>NC</state> <postal-code> 34829 </postal-code> </address>

    Tags, elements, and attributes

    There are three common terms used to describe parts of an XML document: tags, elements, and attributes. Here is a sample document that illustrates the terms:

    <address> <name> <title>Mrs.</title> <first-name> Mary </first-name> <last-name> McGoon </last-name> </name> <street> 1401 Main Street </street> <city state="NC">Anytown</city> <postal-code> 34829 </postal-code> </address>
    • A tag is the text between the left angle bracket () and the right angle bracket (). There are starting tags (such as ) and ending tags (such as )
    • An element is the starting tag, the ending tag, and everything in between. In the sample above, the element contains three child elements: , , and .
    • An attribute is a name-value pair inside the starting tag of an element. In this example, is an attribute of the element; in earlier examples, was an element (see A sample XML document).

    How XML is changing the Web

    Now that you've seen how developers can use XML to create documents with self-describing data, let's look at how people are using those documents to improve the Web. Here are a few key areas:

    • XML simplifies data interchange. Because different organizations (or even different parts of the same organization) rarely standardize on a single set of tools, it can take a significant amount of work for applications to communicate. Using XML, each group creates a single utility that transforms their internal data formats into XML and vice versa. Best of all, there's a good chance that their software vendors already provide tools to transform their database records (or LDAP directories, or purchase orders, and so forth) to and from XML.
    • XML enables smart code. Because XML documents can be structured to identify every important piece of information (as well as the relationships between the pieces), it's possible to write code that can process those XML documents without human intervention. The fact that software vendors have spent massive amounts of time and money building XML development tools means writing that code is a relatively simple process.
    • XML enables smart searches. Although search engines have improved steadily over the years, it's still quite common to get erroneous results from a search. If you're searching HTML pages for someone named "Chip," you might also find pages on chocolate chips, computer chips, wood chips, and lots of other useless matches. Searching XML documents for elements that contained the text would give you a much better set of results.

    I'll also discuss real-world uses of XML in Case studies .

    XML document rules

    Overview: XML document rules

    If you've looked at HTML documents, you're familiar with the basic concepts of using tags to mark up the text of a document. This section discusses the differences between HTML documents and XML documents. It goes over the basic rules of XML documents, and discusses the terminology used to describe them.

    One important point about XML documents: The XML specification requires a parser to reject any XML document that doesn't follow the basic rules. Most HTML parsers will accept sloppy markup, making a guess as to what the writer of the document intended. To avoid the loosely structured mess found in the average HTML document, the creators of XML decided to enforce document structure from the beginning.

    (By the way, if you're not familiar with the term, a parser is a piece of code that attempts to read a document and interpret its contents.)

    Invalid, valid, and well-formed documents

    There are three kinds of XML documents:

    • Invalid documents don't follow the syntax rules defined by the XML specification. If a developer has defined rules for what the document can contain in a DTD or schema, and the document doesn't follow those rules, that document is invalid as well. (See Defining document content for a proper introduction to DTDs and schemas for XML documents.)
    • Valid documents follow both the XML syntax rules and the rules defined in their DTD or schema.
    • Well-formed documents follow the XML syntax rules but don't have a DTD or schema.

    The root element

    An XML document must be contained in a single element. That single element is called the root element, and it contains all the text and any other elements in the document. In the following example, the XML document is contained in a single element, the element. Notice that the document has a comment that's outside the root element; that's perfectly legal.

    <?xml version="1.0"?> <!-- A well-formed document --> <greeting> Hello, World! </greeting>

    Here's a document that doesn't contain a single root element:

    <?xml version="1.0"?> <!-- An invalid document --> <greeting> Hello, World! </greeting> <greeting> Hola, el Mundo! </greeting>

    An XML parser is required to reject this document, regardless of the information it might contain.

    Elements can't overlap

    XML elements can't overlap. Here's some markup that isn't legal:

    <!-- NOT legal XML markup --> <p> <b>I <i>really love</b> XML. </i> </p>

    If you begin a element inside a element, you have to end it there as well. If you want the text to appear in italics, you need to add a second element to correct the markup:

    <!-- legal XML markup --> <p> <b>I <i>really love</i></b> <i>XML.</i> </p>

    An XML parser will accept only this markup; the HTML parsers in most Web browsers will accept both.

    End tags are required

    You can't leave out any end tags. In the first example below, the markup is not legal because there are no end paragraph () tags. While this is acceptable in HTML (and, in some cases, SGML), an XML parser will reject it.

    <!-- NOT legal XML markup --> <p>Yada yada yada... <p>Yada yada yada... <p>...

    If an element contains no markup at all it is called an empty element; the HTML break () and image () elements are two examples. In empty elements in XML documents, you can put the closing slash in the start tag. The two break elements and the two image elements below mean the same thing to an XML parser:

    <!-- Two equivalent break elements --> <br></br> <br /> <!-- Two equivalent image elements --> <img src="../img/c.gif"></img> <img src="../img/c.gif" />

    Elements are case sensitive

    XML elements are case sensitive. In HTML, and are the same; in XML, they're not. If you try to end an element with a tag, you'll get an error. In the example below, the heading at the top is illegal, while the one at the bottom is fine.

    <!-- NOT legal XML markup --> <h1>Elements are case sensitive</H1> <!-- legal XML markup --> <h1>Elements are case sensitive</h1>

    Attributes must have quoted values

    There are two rules for attributes in XML documents:

    • Attributes must have values
    • Those values must be enclosed within quotation marks

    Compare the two examples below. The markup at the top is legal in HTML, but not in XML. To do the equivalent in XML, you have to give the attribute a value, and you have to enclose it in quotes.

    <!-- NOT legal XML markup --> <ol compact> <!-- legal XML markup --> <ol compact="yes">

    You can use either single or double quotes, just as long as you're consistent.

    If the value of the attribute contains a single or double quote, you can use the other kind of quote to surround the value (as in ), or use the entities for a double quote and for a single quote. An entity is a symbol, such as , that the XML parser replaces with other text, such as .

    XML declarations

    Most XML documents start with an XML declaration that provides basic information about the document to the parser. An XML declaration is recommended, but not required. If there is one, it must be the first thing in the document.

    The declaration can contain up to three name-value pairs (many people call them attributes, although technically they're not). The is the version of XML used; currently this value must be . The is the character set used in this document. The character set referenced in this declaration includes all of the characters used by most Western European languages. If no is specified, the XML parser assumes that the characters are in the set, a Unicode standard that supports virtually every character and ideograph from the world's languages.

    <?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>

    Finally, which can be either or , defines whether this document can be processed without reading any other files. For example, if the XML document doesn't reference any other files, you would specify . If the XML document references other files that describe what the document can contain (more about those files in a minute), you could specify . Because is the default, you rarely see in XML declarations.

    Other things in XML documents

    There are a few other things you might find in an XML document:

    • Comments: Comments can appear anywhere in the document; they can even appear before or after the root element. A comment begins with and ends with . A comment can't contain a double hyphen () except at the end; with that exception, a comment can contain anything. Most importantly, any markup inside a comment is ignored; if you want to remove a large section of an XML document, simply wrap that section in a comment. (To restore the commented-out section, simply remove the comment tags.) Here's some markup that contains a comment:
    <!-- Here's a PI for Cocoon: --> <?cocoon-process type="sql"?>
    • Processing instructions: A processing instruction is markup intended for a particular piece of code. In the example above, there's a processing instruction (sometimes called a PI) for Cocoon, an XML processing framework from the Apache Software Foundation. When Cocoon is processing an XML document, it looks for processing instructions that begin with , then processes the XML document accordingly. In this example, the attribute tells Cocoon that the XML document contains a SQL statement.
    <!-- Here's an entity: --> <!ENTITY dw "developerWorks">
    • Entities: The example above defines an entity for the document. Anywhere the XML processor finds the string , it replaces the entity with the string . The XML spec also defines five entities you can use in place of various special characters. The entities are:
      • for the less-than sign
      • for the greater-than sign
      • for a double-quote
      • for a single quote (or apostrophe)
      • for an ampersand.


    XML's power comes from its flexibility, the fact that you and I and millions of other people can define our own tags to describe our data. Remember the sample XML document for a person's name and address? That document includes the element for a person's courtesy title, a perfectly reasonable choice for an element name. If you run an online bookstore, you might create a element for the title of a book. If you run an online mortgage company, you might create a element for the title to a piece of property. All of those are reasonable choices, but all of them create elements with the same name. How do you tell if a given element refers to a person, a book, or a piece of property? With namespaces.

    To use a namespace, you define a namespace prefix and map it to a particular string. Here's how you might define namespace prefixes for our three elements:

    <?xml version="1.0"?> <customer_summary xmlns:addr="http://www.xyz.com/addresses/" xmlns:books="http://www.zyx.com/books/" xmlns:mortgage="http://www.yyz.com/title/" > ... <addr:name><title>Mrs.</title> ... </addr:name> ... ... <books:title>Lord of the Rings</books:title> ... ... <mortgage:title>NC2948-388-1983</mortgage:title> ...

    In this example, the three namespace prefixes are , , and . Notice that defining a namespace for a particular element means that all of its child elements belong to the same namespace. The first element belongs to the namespace because its parent element, , does.

    One final point: The string in a namespace definition is just a string. Yes, these strings look like URLs, but they're not. You could define and that would work just as well. The only thing that's important about the namespace string is that it's unique; that's why most namespace definitions look like URLs. The XML parser does not go to to search for a DTD or schema, it simply uses that text as a string. It's confusing, but that's how namespaces work.

    Defining document content

    Overview: Defining document content

    So far in this tutorial you've learned about the basic rules of XML documents; that's all well and good, but you need to define the elements you're going to use to represent data. You'll learn about two ways of doing that in this section.

    • One method is to use a Document Type Definition, or DTD. A DTD defines the elements that can appear in an XML document, the order in which they can appear, how they can be nested inside each other, and other basic details of XML document structure. DTDs are part of the original XML specification and are very similar to SGML DTDs.
    • The other method is to use an XML Schema. A schema can define all of the document structures that you can put in a DTD, and it can also define data types and more complicated rules than a DTD can. The W3C developed the XML Schema specification a couple of years after the original XML spec.

    Document Type Definitions

    A DTD allows you to specify the basic structure of an XML document. The next couple of sections look at fragments of DTDs. First of all, here's a DTD that defines the basic structure of the address document example in the section, What is XML? :

    <!-- address.dtd --> <!ELEMENT address (name, street, city, state, postal-code)> <!ELEMENT name (title? first-name, last-name)> <!ELEMENT title (#PCDATA)> <!ELEMENT first-name (#PCDATA)> <!ELEMENT last-name (#PCDATA)> <!ELEMENT street (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT postal-code (#PCDATA)>

    This DTD defines all of the elements used in the sample document. It defines three basic things:

    • An element contains a , a , a , a , and a . All of those elements must appear, and they must appear in that order.
    • A element contains an optional element (the question mark means the title is optional), followed by a and a element.
    • All of the other elements contain text. ( stands for parsed character data; you can't include another element in these elements.)

    Although the DTD is pretty simple, it makes it clear what combinations of elements are legal. An address document that has a element before the element isn't legal, and neither is one that has no element.

    Also, notice that DTD syntax is different from ordinary XML syntax. (XML Schema documents, by contrast, are themselves XML, which has some interesting consequences.) Despite the different syntax for DTDs, you can still put an ordinary comment in the DTD itself.

    Symbols in DTDs

    There are a few symbols used in DTDs to indicate how often (or whether) something may appear in an XML document. Here are some examples, along with their meanings:

    • The element must contain a , a , and a element, in that order. All of the elements are required. The comma indicates a list of items.

    • This means that the element contains an optional element, followed by a mandatory and a element. The question mark indicates that an item is optional; it can appear once or not at all.

    • An element contains one or more elements. You can have as many elements as you need, but there has to be at least one. The plus sign indicates that an item must appear at least once, but can appear any number of times.

    • A element contains zero or more elements. The asterisk indicates that an item can appear any number of times, including zero.

    • A element contains an optional element, followed by a element, possibly followed by either a or a element, followed by a element. In other words, both and are optional, and you can have only one of the two. Vertical bars indicate a list of choices; you can choose only one item from the list. Also notice that this example uses parentheses to group certain elements, and it uses a question mark against the group.

    • The element can contain one of two sequences: An optional , followed by a and a ; or a , a , and a .

    A word about flexibility

    Before going on, a quick note about designing XML document types for flexibility. Consider the sample name and address document type; I clearly wrote it with U.S. postal addresses in mind. If you want a DTD or schema that defines rules for other types of addresses, you would have to add a lot more complexity to it. Requiring a element might make sense in Australia, but it wouldn't in the UK. A Canadian address might be handled by the sample DTD in Document Type Definitions, but adding a element is a better idea. Finally, be aware that in many parts of the world, concepts like title, first name, and last name don't make sense.

    The bottom line: If you're going to define the structure of an XML document, you should put as much forethought into your DTD or schema as you would if you were designing a database schema or a data structure in an application. The more future requirements you can foresee, the easier and cheaper it will be for you to implement them later.

    Defining attributes

    This introductory tutorial doesn't go into great detail about how DTDs work, but there's one more basic topic to cover here: defining attributes. You can define attributes for the elements that will appear in your XML document. Using a DTD, you can also:

    • Define which attributes are required
    • Define default values for attributes
    • List all of the valid values for a given attribute

    Suppose that you want to change the DTD to make an attribute of the element. Here's how to do that:

    <!ELEMENT city (#PCDATA)> <!ATTLIST city state CDATA #REQUIRED>

    This defines the element as before, but the revised example also uses an declaration to list the attributes of the element. The name inside the attribute list tells the parser that these attributes are defined for the element. The name is the name of the attribute, and the keywords and tell the parser that the attribute contains text and is required (if it's optional, will do the trick).

    To define multiple attributes for an element, write the like this:

    <!ELEMENT city (#PCDATA)> <!ATTLIST city state CDATA #REQUIRED postal-code CDATA #REQUIRED>

    This example defines both and as attributes of the element.

    Finally, DTDs allow you to define default values for attributes and enumerate all of the valid values for an attribute:

    <!ELEMENT city (#PCDATA)> <!ATTLIST city state CDATA (AZ|CA|NV|OR|UT|WA) "CA">

    The example here indicates that it only supports addresses from the states of Arizona (AZ), California (CA), Nevada (NV), Oregon (OR), Utah (UT), and Washington (WA), and that the default state is California. Thus, you can do a very limited form of data validation. While this is a useful function, it's a small subset of what you can do with XML schemas (see XML schemas).

    XML schemas

    With XML schemas, you have more power to define what valid XML documents look like. They have several advantages over DTDs:

    • XML schemas use XML syntax. In other words, an XML schema is an XML document. That means you can process a schema just like any other document. For example, you can write an XSLT style sheet that converts an XML schema into a Web form complete with automatically generated JavaScript code that validates the data as you enter it.
    • XML schemas support datatypes. While DTDs do support datatypes, it's clear those datatypes were developed from a publishing perspective. XML schemas support all of the original datatypes from DTDs (things like IDs and ID references). They also support integers, floating point numbers, dates, times, strings, URLs, and other datatypes useful for data processing and validation.
    • XML schemas are extensible. In addition to the datatypes defined in the XML schema specification, you can also create your own, and you can derive new datatypes based on other datatypes.
    • XML schemas have more expressive power. For example, with XML schemas you can define that the value of any attribute can't be longer than 2 characters, or that the value of any element must match the regular expression . You can't do either of those things with DTDs.

    A sample XML schema

    Here's an XML schema that matches the original name and address DTD. It adds two constraints: The value of the element must be exactly two characters long and the value of the element must match the regular expression . Although the schema is much longer than the DTD, it expresses more clearly what a valid document looks like. Here's the schema:

    <?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="address"> <xsd:complexType> <xsd:sequence> <xsd:element ref="name"/> <xsd:element ref="street"/> <xsd:element ref="city"/> <xsd:element ref="state"/> <xsd:element ref="postal-code"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="name"> <xsd:complexType> <xsd:sequence> <xsd:element ref="title" minOccurs="0"/> <xsd:element ref="first-Name"/> <xsd:element ref="last-Name"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="title" type="xsd:string"/> <xsd:element name="first-Name" type="xsd:string"/> <xsd:element name="last-Name" type="xsd:string"/> <xsd:element name="street" type="xsd:string"/> <xsd:element name="city" type="xsd:string"/> <xsd:element name="state"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:length value="2"/> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="postal-code"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:pattern value="[0-9]{5}(-[0-9]{4})?"/> </xsd:restriction> </xsd:simpleType> </xsd:element> </xsd:schema>

    Defining elements in schemas

    The XML schema in A sample XML schema defined a number of XML elements with the element. The first two elements defined, and , are composed of other elements. The element defines the sequence of elements that are contained in each. Here's an example:

    <xsd:element name="address"> <xsd:complexType> <xsd:sequence> <xsd:element ref="name"/> <xsd:element ref="street"/> <xsd:element ref="city"/> <xsd:element ref="state"/> <xsd:element ref="postal-code"/> </xsd:sequence> </xsd:complexType> </xsd:element>

    As in the DTD version, the XML schema example defines that an contains a , a , a , a , and a element, in that order. Notice that the schema actually defines a new datatype with the element.

    Most of the elements contain text; defining them is simple. You merely declare the new element, and give it a datatype of :

    <xsd:element name="title" type="xsd:string"/> <xsd:element name="first-Name" type="xsd:string"/> <xsd:element name="last-Name" type="xsd:string"/> <xsd:element name="street" type="xsd:string"/> <xsd:element name="city" type="xsd:string"/>

    Defining element content in schemas

    The sample schema defines constraints for the content of two elements: The content of a element must be two characters long, and the content of a element must match the regular expression . Here's how to do that:

    <xsd:element name="state"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:length value="2"/> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="postal-code"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:pattern value="[0-9]{5}(-[0-9]{4})?"/> </xsd:restriction> </xsd:simpleType> </xsd:element>

    For the and elements, the schema defines new data types with restrictions. The first case uses the element, and the second uses the element to define a regular expression that this element must match.

    This summary only scratches the surface of what XML schemas can do; there are entire books written on the subject. For the purpose of this introduction, suffice to say that XML schemas are a very powerful and flexible way to describe what a valid XML document looks like.

    XML programming interfaces

    Overview: XML programming interfaces

    This section takes a look at a variety of programming interfaces for XML. These interfaces give developers a consistent interface for working with XML documents. There are many APIs available; this section looks at four of the most popular and generally useful ones: the Document Object Model (DOM), the Simple API for XML (SAX), JDOM, and the Java API for XML Parsing (JAXP).

    The Document Object Model

    The Document Object Model, commonly called the DOM, defines a set of interfaces to the parsed version of an XML document. The parser reads in the entire document and builds an in-memory tree, so your code can then use the DOM interfaces to manipulate the tree. You can move through the tree to see what the original document contained, you can delete sections of the tree, you can rearrange the tree, add new branches, and so on.

    The DOM was created by the W3C, and is an Official Recommendation of the consortium.

    DOM issues

    The DOM provides a rich set of functions that you can use to interpret and manipulate an XML document, but those functions come at a price. As the original DOM for XML documents was being developed, a number of people on the XML-DEV mailing list voiced concerns about it:

    • The DOM builds an in-memory tree of an entire document. If the document is very large, this requires a significant amount of memory.
    • The DOM creates objects that represent everything in the original document, including elements, text, attributes, and whitespace. If you only care about a small portion of the original document, it's extremely wasteful to create all those objects that will never be used.
    • A DOM parser has to read the entire document before your code gets control. For very large documents, this could cause a significant delay.

    These are merely issues raised by the design of the Document Object Model; despite these concerns, the DOM API is a very useful way to parse XML documents.

    The Simple API for XML

    To get around the DOM issues, the XML-DEV participants (led by David Megginson) created the SAX interface. SAX has several characteristics that address the concerns about the DOM:

    • A SAX parser sends events to your code. The parser tells you when it finds the start of an element, the end of an element, text, the start or end of the document, and so on. You decide which events are important to you, and you decide what kind of data structures you want to create to hold the data from those events. If you don't explicitly save the data from an event, it's discarded.
    • A SAX parser doesn't create any objects at all, it simply delivers events to your application. If you want to create objects based on those events, that's up to you.
    • A SAX parser starts delivering events to you as soon as the parse begins. Your code will get an event when the parser finds the start of the document, when it finds the start of an element, when it finds text, and so on. Your application starts generating results right away; you don't have to wait until the entire document has been parsed. Even better, if you're only looking for certain things in the document, your code can throw an exception once it's found what it's looking for. The exception stops the SAX parser, and your code can do whatever it needs to do with the data it has found.

    Having said all of these things, both SAX and DOM have their place. The remainder of this section discusses why you might want to use one interface or the other.

    SAX issues

    To be fair, SAX parsers also have issues that can cause concern:

    • SAX events are stateless. When the SAX parser finds text in an XML document, it sends an event to your code. That event simply gives you the text that was found; it does not tell you what element contains that text. If you want to know that, you have to write the state management code yourself.
    • SAX events are not permanent. If your application needs a data structure that models the XML document, you have to write that code yourself. If you need to access data from a SAX event, and you didn't store that data in your code, you have to parse the document again.
    • SAX is not controlled by a centrally managed organization. Although this has not caused a problem to date, some developers would feel more comfortable if SAX were controlled by an organization such as the W3C.


    Frustrated by the difficulty in doing certain tasks with the DOM and SAX models, Jason Hunter and Brett McLaughlin created the JDOM package. JDOM is a Java technology-based, open source project that attempts to follow the 80/20 rule: Deliver what 80% of users need with 20% of the functions in DOM and SAX. JDOM works with SAX and DOM parsers, so it's implemented as a relatively small set of Java classes.

    The main feature of JDOM is that it greatly reduces the amount of code you have to write. Although this introductory tutorial doesn't discuss programming topics in depth, JDOM applications are typically one-third as long as DOM applications, and about half as long as SAX applications. (DOM purists, of course, suggest that learning and using the DOM is good discipline that will pay off in the long run.) JDOM doesn't do everything, but for most of the parsing you want to do, it's probably just the thing.

    The Java API for XML Parsing

    Although DOM, SAX, and JDOM provide standard interfaces for most common tasks, there are still several things they don't address. For example, the process of creating a object in a Java program differs from one DOM parser to the next. To fix this problem, Sun has released JAXP, the Java API for XML Parsing. This API provides common interfaces for processing XML documents using DOM, SAX, and XSLT.

    JAXP provides interfaces such as the and the that provide a standard interface to different parsers. There are also methods that allow you to control whether the underlying parser is namespace-aware and whether it uses a DTD or schema to validate the XML document.

    Which interface is right for you?

    To determine which programming interface is right for you, you need to understand the design points of all of the interfaces, and you need to understand what your application needs to do with the XML documents you're going to process. Consider these questions to help you find the right approach.

    • Will your application be written in Java? JAXP works with DOM, SAX, and JDOM; if you're writing your code in Java, you should use JAXP to isolate your code from the implementation details of various parsers.
    • How will your application be deployed? If your application is going to be deployed as a Java applet, and you want to minimize the amount of downloaded code, keep in mind that SAX parsers are smaller than DOM parsers. Also be aware that using JDOM requires a small amount of code in addition to the SAX or DOM parser.
    • Once you parse the XML document, will you need to access that data many times? If you need to go back to the parsed version of the XML file, DOM is probably the right choice. When a SAX event is fired, it's up to you (the developer) to save it somehow if you need it later. If you need to access an event you didn't save, you have to parse the file again. DOM saves all of the data automatically.
    • Do you need just a few things from the XML source? If you only need a few things out of the XML source, SAX is probably the right choice. SAX doesn't create objects for everything in the source document; you can decide what's important. With SAX, you can look at each event to see if it's relevant to your needs, then process it appropriately. Even better, once you've found what you're looking for, your code can throw an exception to stop the SAX parser altogether.
    • Are you working on a machine with very little memory? If so, SAX is your best choice, despite all the other factors that you might consider.

    Be aware that XML APIs exist for other languages; the Perl and Python communities in particular have very good XML tools.

    XML standards

    Overview: XML standards

    A variety of standards exist in the XML universe. In addition to the base XML standard, other standards define schemas, style sheets, links, Web services, security, and other important items. This section covers the most popular standards for XML, and points you to references to find other standards.

    The XML specification

    This spec, located at w3.org/TR/REC-xml, defines the basic rules for XML documents. All of the XML document rules discussed earlier in this tutorial are defined here.

    In addition to the basic XML standard, the Namespaces spec is another important part of XML. You can find the namespaces standard at the W3C as well: w3.org/TR/REC-xml-names/.

    XML Schema

    The XML Schema language is defined in three parts:

    • A primer, located at w3.org/TR/xmlschema-0, that gives an introduction to XML schema documents and what they're designed to do;
    • A standard for document structures, located at w3.org/TR/xmlschema-1, that illustrates how to define the structure of XML documents;
    • A standard for data types, located at w3.org/TR/xmlschema-2, that defines some common data types and rules for creating new ones.

    This tutorial discussed schemas briefly in Defining document content ; if you want the complete details on all the things you can do with XML schemas, the primer is the best place to start.

    XSL, XSLT, and XPath

    The Extensible Stylesheet Language, XSL, defines a set of elements (called formatting objects) that describe how data should be formatted. For clarity, this standard is often referred to as XSL-FO to distinguish it from XSLT. Although it's primarily designed for generating high-quality printable documents, you can also use formatting objects to generate audio files from XML. The XSL-FO standard is at w3.org/TR/xsl/.

    The Extensible Stylesheet Language for Transformations, XSLT, is an XML vocabulary that describes how to convert an XML document into something else. The standard is at w3.org/TR/xslt (no closing slash).

    XPath, the XML Path Language, is a syntax that describes locations in XML documents. You use XPath in XSLT style sheets to describe which portion of an XML document you want to transform. XPath is used in other XML standards as well, which is why it is a separate standard from XSLT. XPath is defined at w3.org/TR/xpath (no closing slash).


    The Document Object Model defines how an XML document is converted to an in-memory tree structure. The DOM is defined in a number of specifications at the W3C:

    • The Core DOM defines the DOM itself, the tree structure, and the kinds of nodes and exceptions your code will find as it moves through the tree. The complete spec is at w3.org/TR/DOM-Level-2-Core/.
    • Events defines the events that can happen to the tree, and how those events are processed. This specification is an attempt to reconcile the differences in the object models supported by Netscape and Internet Explorer since Version 4 of those browsers. This spec is at w3.org/TR/DOM-Level-2-Events/.
    • Style defines how XSLT style sheets and CSS style sheets can be accessed by a program. This spec is at w3.org/TR/DOM-Level-2-Style/.
    • Traversals and Ranges define interfaces that allow programs to traverse the tree or define a range of nodes in the tree. You can find the complete spec at w3.org/TR/DOM-Level-2-Traversal-Range/.
    • Views defines an interface for the document itself. See w3.org/TR/DOM-Level-2-Views/ for more information.

    SAX, JDOM, and JAXP

    The Simple API for XML defines the events and interfaces used to interact with a SAX-compliant XML parser. You can find the complete SAX specification at www.saxproject.org.

    The JDOM project was created by Jason Hunter and Brett McLaughlin and lives at jdom.org/. At the JDOM site, you can find code, sample programs, and other tools to help you get started. (For developerWorks articles on JDOM, see Related topics.

    One significant point about SAX and JDOM is that both of them came from the XML developer community, not a standards body. Their wide acceptance is a tribute to the active participation of XML developers worldwide.

    Linking and referencing

    There are two standards for linking and referencing in the XML world: XLink and XPointer:

    • XLink, the XML Linking Language, defines a variety of ways to link different resources together. You can do normal point-to-point links (as with the HTML element) or extended links, which can include multipoint links, links through third parties, and rules that define what it means to follow a given link. The XLink standard is at w3.org/TR/xlink/.
    • XPointer, the XML Pointer Language, uses XPath as a way to reference other resources. It also includes some extensions to XPath. You can find the spec at www.w3.org/TR/xptr/.


    There are two significant standards that address the security of XML documents. One is the XML Digital Signature standard (w3.org/TR/xmldsig-core/), which defines an XML document structure for digital signatures. You can create an XML digital signature for any kind of data, whether it's an XML document, an HTML file, plain text, binary data, and so on. You can use the digital signature to verify that a particular file wasn't modified after it was signed. If the data you're signing is an XML document, you can embed the XML document in the signature file itself, which makes processing the data and the signature very simple.

    The other standard addresses encrypting XML documents. While it's great that XML documents can be written so that a human can read and understand them, this could mean trouble if a document fell into the wrong hands. The XML Encryption standard (w3.org/TR/xmlenc-core/) defines how parts of an XML document can be encrypted.

    Using these standards together, you can use XML documents with confidence. I can digitally sign an important XML document, generating a signature that includes the XML document itself. I can then encrypt the document (using my private key and your public key) and send it to you. When you receive it, you can decrypt the document with your private key and my public key; that lets you know that I'm the one who sent the document. (If need be, you can also prove that I sent the document.) Once you've decrypted the document, you can use the digital signature to make sure the document has not been modified in any way.

    Web services

    Web services are an important new kind of application. A Web service is a piece of code that can be discovered, described, and accessed using XML. There is a great deal of activity in this space, but the three main XML standards for Web services are:

    • SOAP: Originally the Simple Object Access Protocol, SOAP defines an XML document format that describes how to invoke a method of a remote piece of code. My application creates an XML document that describes the method I want to invoke, passing it any necessary parameters, and then it sends that XML document across a network to that piece of code. The code receives the XML document, interprets it, invokes the method I requested, then sends back an XML document that describes the results. Version 1.1 of the SOAP spec is at w3.org/TR/SOAP/. Visit w3.org/TR/ to see all of the W3C's SOAP-related activities.
    • WSDL: The Web Services Description Language is an XML vocabulary that describes a Web service. It's possible to write a piece of code that takes a WSDL document and invokes a Web service it's never seen before. The information in the WSDL file defines the name of the Web service, the names of its methods, the arguments to those methods, and other details. You can find the latest WSDL spec at w3.org/TR/wsdl (no closing slash).
    • UDDI: The Universal Description, Discovery, and Integration protocol defines a SOAP interface to a registry of Web services. If you have a piece of code that you'd like to deploy as a Web service, the UDDI spec defines how to add the description of your service to the registry. If you're looking for a piece of code that provides a certain function, the UDDI spec defines how to query the registry to find what you want. The source of all things UDDI is uddi.org.

    Other standards

    A number of other XML standards exist that I don't go into here. In addition to widely-applicable standards like Scalable Vector Graphics (www.w3.org/TR/SVG/) and SMIL, the Synchronized Multimedia Integration Language (www.w3.org/TR/smil20/), there are many industry-specific standards. For example, the HR-XML Consortium has defined a number of XML standards for Human Resources; you can find those standards at hr-xml.org.

    Finally, for a good source of XML standards, visit Cover Pages for information on many XML schemas and other resources. This site features standards for a wide variety of industries.

    Case studies

    Real-world examples

    By this point, I hope you're convinced that XML has tremendous potential to revolutionize the way eBusiness works. While potential is great, what really counts are actual results in the marketplace. This section describes three case studies in which organizations have used XML to streamline their business processes and improve their results.

    All of the case studies discussed here come from IBM's jStart program. The jStart team exists to help customers use new technologies to solve problems. When a customer agrees to a jStart engagement, the customer receives IBM consulting and development services at a discount, with the understanding that the resulting project will be used as a case study. If you'd like to see more case studies, including case studies involving web services and other new technologies, visit the jStart web page.

    Be aware that the jStart team is no longer doing engagements for XML projects; the team's current focus is Web services engagements. Web services use XML in a specialized way, typically through the SOAP, WSDL, and UDDI standards mentioned earlier in Web services.

    Province of Manitoba

    Figure 2. Province of Manitoba

    The government of the Province of Manitoba created the Personal Property Registry to provide property owners with state-of-the-art Internet services around the clock. The main benefits of the application were faster and more convenient access to property data, fewer manual steps in the property management process, and fewer calls to the government's call center. In other words, giving customers better service while saving the government money and reducing the government's workload.

    Application design

    The application was designed as an n -tiered application, with the interface separated from the back-end logic. The data for each transaction needed to be transformed a number of different ways, depending on how it needed to be rendered on a device, presented to an application, or formatted for the back-end processing system. In other words, the application was a perfect opportunity to use XML.

    As with any application, the user interface to the application was extremely important. To simplify the first implementation, the necessary XML data was transformed into HTML. This gave users a browser interface to the application. The registry was built with VisualAge for Java, specifically the Visual Servlet Builder component. It also uses Enterprise Java Beans (EJBs), including Session beans and Entity beans.

    Generating multiple user interfaces with XML

    In addition to the HTML interface, a Java client interface and a B2B electronic interface were planned as well. For all of these interfaces, the structured XML data is transformed into the appropriate structures and documents. The initial rollout of the service allowed one business partner, Canadian Securities Registration Systems, to submit XML transaction data using the Secure Sockets Layer. The XML transaction data was transformed into the appropriate format for the back-end transactions.

    The end result is that the Province of Manitoba was able to create a flexible new application and their end users could access the property registry more easily and quickly. Because the province uses XML as the data format, the government IT team has a great deal of flexibility in designing new interfaces and access methods. Best of all, the back-end systems didn't have to change at all.

    First Union banks on XML

    Figure 3. First Union banks on XML

    First Union National Bank, one of the largest banks in the U.S., is in the process of reengineering many of its applications using Java and XML. Like most large companies, its environment is heterogeneous, with OS/390, AIX, Solaris, HP/9000, and Windows NT servers and Windows NT, Windows 98, Solaris, and AIX clients. Given this environment, First Union chose Java for platform-independent code and XML for platform-independent data.

    A messaging-based system


    Leave a Reply

    Your email address will not be published. Required fields are marked *