The eXtensible Markup Language (XML) was created to store and define complex, hiearchically structured data for exchange and storage. The XML structure begins with it's hiearchy at a root node and branches from this document root.
The Document Type Definition (DTD) is optional and defines the data to be presented in an XML document. It is often used to verify the data for completness and adherance to rules.
XML Schema (XSD) is a newer and more complete data definition with definable types. XSD will be competing with DTD as the format for data definition especially when defining complex relationships and data types.
XML parsers fall into three major catagories:
- DOM: Import/parse all data into a data structure in memory for query. The data is held as nodes in a data tree which can be traversed. While this is often easier to program than SAX invocations, it uses more memory and runs slower.
- SAX: Parse on the fly to look for the data requested. This is event driven where callbacks are invoked as elements are encountered during parsing. Programmer writes callbacks. A custom class is written for each document. This is considered to be the fastest way to parse a file.
- Xpath: (XML Path) Search data with regular expression. Very easy to use. Usage is similar to a query with regular expression. A node list is returned which matches the Xpath expression. It is usually implemented as an extension to DOM.
DTD:
Number of children:
- ? Only one element permitted.
- * allows for zero or multiple elements i.e.: <!ELEMENT name (first, middle*, last?)>
- + At least one or many elements permitted.
Attributes:
CDATA #REQUIRED | |
CDATA #IMPLIED | |
CDATA | Character Data |
PCDATA | Parsed character Data |
NMTOKEN | No whitespaces. |
NMTOKENS | One or more name tokens separated by white space |
ENUMERATION | i.e.
<date month="January" day="27" year="2004"/> |
ENTITY | |
ENTITTIES | |
ID | XML name specified: <!ATTLIST xml_name1 xml_name2 ID #REQUIRED>
xml_name2 is required. |
IDREF | attribute refers to an ID |
IDREFS | |
NOTATION |
- XML names may include _-.
- When HTML text is included use <, &, > and " to repressent <, &, >, and " respectively.
Links:
File: testLibXml2.xml
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE AppConfigData [ <!ELEMENT AppConfigData (DisplayX+)> <!ELEMENT DisplayX (AlternateName*,FieldLength,TextFont?)> <!ATTLIST DisplayX name CDATA #REQUIRED> <!ATTLIST DisplayX type CDATA #REQUIRED> <!ELEMENT AlternateName (#PCDATA)> <!ATTLIST AlternateName type CDATA #REQUIRED> <!ELEMENT FieldLength (#PCDATA)> <!ELEMENT TextFont (#PCDATA)> ]> <AppConfigData> <DisplayX name="DisplayText_A" type="Type1"> <AlternateName type="Type1">DisplayText_a</AlternateName> <FieldLength>30</FieldLength> <TextFont type="Courier"/> </DisplayX> <DisplayX name="DisplayText_B" type="Type2"> <FieldLength>30</FieldLength> <TextFont type="Arial"/> </DisplayX> <DisplayX name="DisplayText_C" type="Type1"> <AlternateName type="Type1">DisplayText_c</AlternateName> <FieldLength>30</FieldLength> <TextFont type="Courier"/> </DisplayX> <DisplayX name="DisplayText_D" type="Type2"> <FieldLength>30</FieldLength> <TextFont type="Courier"/> </DisplayX> </AppConfigData>
Note: The DTD is not required for use with the Gnome LibXml2 API. If using this API to generate XML, the DTD will not be generated.
Prerequisite (RPM) packages: pkgconfig, libxml2-devel, gnome-libs-devel
#include <stdio.h> #include <stdlib.h> #include <gtk/gtk.h> #include <libxml/xmlmemory.h> #include <libxml/parser.h> #include <libxml/tree.h> int main(int argc, char **argv) { xmlNode *cur_node, *child_node; xmlChar *fieldLength, *alternateName; char *DisplayXName, *DisplayXType, *altProp, *textFont; // -------------------------------------------------------------------------- // Open XML document // -------------------------------------------------------------------------- xmlDocPtr doc; doc = xmlParseFile("testLibXml2.xml"); if (doc == NULL) printf("error: could not parse file file.xml\n"); // -------------------------------------------------------------------------- // XML root. // -------------------------------------------------------------------------- /*Get the root element node */ xmlNode *root = NULL; root = xmlDocGetRootElement(doc); // -------------------------------------------------------------------------- // Must have root element, a name and the name must be "AppConfigData" // -------------------------------------------------------------------------- if( !root || !root->name || xmlStrcmp(root->name,"AppConfigData") ) { xmlFreeDoc(doc); return FALSE; } // -------------------------------------------------------------------------- // AppConfigData children: For each DisplayX // -------------------------------------------------------------------------- for(cur_node = root->children; cur_node != NULL; cur_node = cur_node->next) { if ( cur_node->type == XML_ELEMENT_NODE && !xmlStrcmp(cur_node->name, (const xmlChar *) "DisplayX" ) ) { printf("Element: %s \n", cur_node->name); DisplayXName = xmlGetProp(cur_node,"name"); if(DisplayXName) printf(" name=%s\n", DisplayXName); DisplayXType = xmlGetProp(cur_node,"type"); if(DisplayXType) printf(" type=%s\n", DisplayXType); // For each child of DisplayX: i.e. AlternateName, FieldLength for(child_node = cur_node->children; child_node != NULL; child_node = child_node->next) { if ( cur_node->type == XML_ELEMENT_NODE && !xmlStrcmp(child_node->name, (const xmlChar *)"FieldLength") ) { printf(" Child=%s\n", child_node->name); fieldLength = xmlNodeGetContent(child_node); if(fieldLength) printf(" Length: %s\n", fieldLength); xmlFree(fieldLength); } if ( cur_node->type == XML_ELEMENT_NODE && !xmlStrcmp(child_node->name, (const xmlChar *)"AlternateName") ) { printf(" Child=%s\n", child_node->name); alternateName = xmlNodeGetContent(child_node); if(alternateName) printf(" Name: %s\n", alternateName); altProp = xmlGetProp(child_node,"type"); if(altProp) printf(" type=%s\n", altProp); xmlFree(altProp); xmlFree(alternateName); } if ( cur_node->type == XML_ELEMENT_NODE && !xmlStrcmp(child_node->name, (const xmlChar *)"TextFont") ) { printf(" Child=%s\n", child_node->name); textFont = xmlGetProp(child_node,"type"); if(textFont) printf(" type=%s\n", textFont); xmlFree(textFont); } } xmlFree(DisplayXType); xmlFree(DisplayXName); } } // -------------------------------------------------------------------------- /*free the document */ xmlFreeDoc(doc); /* *Free the global variables that may *have been allocated by the parser. */ xmlCleanupParser(); return 0; }
Compile: gcc -g -Wall `xml2-config --cflags --libs` `gnome-config --cflags --libs gnome gnomeui xml` -o testLibXml2 testLibXml2.c
[Potential Pitfall]: The order of the directory paths referenced matters. Reference the libxml2 include path directories before the gnome directory paths. The following will result in a compilation error:
Components:
- LibXML: xml2-config --cflags --libs
(Reference this first.) - Gtk: pkg-config --cflags --libs gtk+-2.0
- Gnome: gnome-config --cflags --libs gnome gnomeui xml
Results:
$ testLibXml2 Element: DisplayX name=DisplayText_A type=Type1 Child=AlternateName Name: DisplayText_a type=Type1 Child=FieldLength Length: 30 Child=TextFont type=Courier Element: DisplayX name=DisplayText_B type=Type2 Child=FieldLength Length: 30 Child=TextFont type=Arial Element: DisplayX name=DisplayText_C type=Type1 Child=AlternateName Name: DisplayText_c type=Type1 Child=FieldLength Length: 30 Child=TextFont type=Courier Element: DisplayX name=DisplayText_D type=Type2 Child=FieldLength Length: 30 Child=TextFont type=Courier
- XSL family: has various subsets to describe XML encoded data.
W3C: XSL family- XSL: (Extensible Stylesheet Language) describes XML encoded data.
W3C: XSL - XSLT: (XSL Transformations) maps XML document from one form to another.
XSLT stylesheets are not procedural and often include a template to
define output.
W3C: XSLT - XSL-FO: (XSL Formatting Objects) define visual formatting of XML document.
XML.com: Using XSL-FO - XPath: (XML Path Language) non-XML language used to find data (XML query) within an XML document.
i.e.- Find root element: /*
- Find all elements: //*
- XSL: (Extensible Stylesheet Language) describes XML encoded data.
- XQuery: XML query language which includes XPath and procedural programming features.
W3C XQuery - XPointer: address components of XML document. i.e. element(el1/2/1)
Sun patent.
- XML Editors:
- kXMLeditor: KDE based XML editor
- Conglomerate
- XMLtype: Console based editor.
- XML Design Tools:
- Active State: Komodo - XML editor and IDE
- University of Edinburgh: XED - Schema-sensitive XML data editor [Java]
- IBM: Xeena - Schema-sensitive XML data editor
- XML:
- Gnome LibXML home page
- IBM: - Example