This academic paper understands how new generation databases have evolved with the use of XML technology. It highlights the fact that the traditional relational databases are a history now and the need of the hour is taken care of by hybrid databases using XML. When HTML became the standard language for communication over the Internet, it helped the users with a rich set of features, such as data and meta-data being put in a single document. But as the world of databases evolved, there arose a strong need to decouple the data representation aspects from the interface representation. XML's flexibility could plug-in here easily and acted as a savior. XML's DTD (Document type definition) features can be used to define a new meta-language altogether. XML allows the user to define how tags can be used by documents, customized to specific requirements of semantics of the applications as rules which connect tags with their content can be taken under control with XML. This kind of conception of XML raises questions of how data is actually transferred by XML documents, how data is read, stored and retrieved. In short, this paper explored how XML is used a database management tool.
What is XML?
XML stands for Extensible Markup Language. It had become a well-known standard for marking up text in a manner that the data can be read both by machine and human. The syntax followed by XML is very similar to HTML and it organized in a manner that the relationships among the data elements are obvious visually.
XML Vs. Relational Databases
Powerful data retrieval and storage capabilities of traditional relational databases, where data is stored in tables and there are one or more tables in single database, is another form of database management in use primarily. Relational databases come to user's rescue when handling extremely huge volumes of data whereas XML does a far more better job of delivering self-describing data format in SOA (Service Oriented Architecture) and other Web services.
The following points showcase the differences between XML data and relational data:
1. Relational database is represented as a model of logical relationships whereas XML data is hierarchical.
An XML document is shaped in a way that it carries information about the relationship of data items to each other in the form of hierarchy. Whereas in relational databases, the only type of relation that can be defined is the parent and the child table relationship.
2. Relational database is not self-descriptive where XML is visually understandable.
An XML contains the tagging of the data which makes it easier for the user to understand what the data is about. A single XML document can also have multiple types of data. On the other hand, in a relational database, the intent of the data can be defined only by the definition of its table column and one column must have the same type of data.
3. Relational database has no inherent ordering where XML does.
In an XML document, the order in which the data is specified is primarily assumed to be the order of the data in the document. There is no other manner in which the order of the data can be specified. Whereas in relational model, the actual order of the rows is not guaranteed unless ORDER BY clause is used while querying the data.
4. Relational database offer less flexibility of data as compared to XML.
If the data design of an application undergoes frequent changes, XML is a better choice because relational databases are rigid and it is very tough to normalize large tables from one to many or de-normalize many tables into one table.
5. Data performance is better with relational databases.
If performance is of primary concern, relational databases must be the first choice as the overheads of serialization and interpretation of XML data are higher.
6. XML does not support referential integrity.
No referential constraints can be defined in XML documents. Because of this limitation of XML, often relational databases are preferred.
7. Frequent data updation is poor with XML.
Full documents need to be replaced when a single column needs updation in XML. And it makes it so much tedious.
Integration of XML and Databases
One side of coin says that XML can be used by applying document-centric model where XML can be used a tool to create semi-structured documents with very irregular content. And the other side of coin presses that primary usage of XML can only to tapped when used in data-centric model, in which XML is used for defining a storage format, appears in a regular fashion and can be accessed by application using schemas. In both the models, XML is stored in a repository.
1. Data-centric model
In this model, the storage of data is done in a relational database and is extracted as XML. Middleware is used to perform this job e.g. Microsoft ADO.NET, IBM's Database DOM etc. There is also an another backup way to using middleware here where the data can be represented as XML and is then stored as XML in an XML-enabled database. Such a database understands the conversion mechanism of relational data into XML and vice-versa. Storage and fetching of data is also possible with stored procedures.
2. Document-centric model
Most common examples of XML used under this model are content management systems which can be used for version, access or revision control of documents, software with web publishing capability, applications requiring good indexing and search abilities, or applications requiring support of varied text editors like MS Word, Excel, PowerPoint, Adobe Framemaker etc. under one roof.
XML based systems also help users to create and manage web content in a very modular way with web developers and content writers to do their tasks with lesser interdependence on each other.
3. Hybrid model
Applications where both of the above models are needed, hybrid XML databases are used. As the name goes, it is essentially a combination of two or more database types or models. In this type of database, the data can be stored or fetched both as XML and relational data. Both of the types of data can be accessed via queries and both models can work together in a single application as per needs.
How XML is structured for databases
DTDs (Document Type Definitions) and XML Schemas are two different methods which are mostly used to define valid XML elements that can be part of an XML document, the order in which these elements can occur and the various constraints to be served by them. An XML document which complies to a DTD or XML Schema is a valid XML document. Here is a sample XML segment:
1. DTDs (Document Type Definitions)
Originally, XML's structure has been defined with the help of DTDs. These follow is syntax which is different from XML document itself and each line in DTD is used to construe the order and occurrence of data elements in an XML document. Given here is a sample:
2. XDR (XML Data Reduced)
DTDs received criticism because it followed different syntax than the XML and had no support for data types. XDR, which is a popular approbation for XML schemas, is better suited to W3C by Microsoft Corporation. XDR can support numerous data types and is also XML based allowing it to act as a tool in integration of relational and XML-based databases. Below is a sample XML schema built using XDR:
3. XSD (XML Schema Definitions)
XSD is a more sophisticated way of expounding structure and constraints on the entire XML content in a document. W3C XSD can support many more data types as compared to XDR. It allows lets the user define customized data types and has a strong support for object oriented concepts of polymorphism and inheritance. Because of its capabilities, XSD is most widely in current times. Here is an instance of XSD:
Extensions to XML
To extract necessary data from XML documents or rather databases, a number of languages are used e.g. XPath, XQL, XQuery, XML-QL, unQL, XDuce etc. When building applications, XML data needs to be fetched via stored procedures and exported or imported from other applications. Some database management systems have their proprietary tools also available to map XML data to the relational database systems or tapping the XML schema to understand how XML data be formatted during output.
By large, XML can be used to create data format that can be easily delivered to programs demanding data and between numerous of components of distributed systems. Also, creating XML documents from relational databases and using XML schemas over it is the most preferred way to combine both the technologies.