XML-Based standards: Promise and Pitfalls

by Robert Worden

Published in November 1999

The Promise

The greatest driver of business change today is E-business, brought about by the Internet. In the near term, this will have most impact on business-to-business transactions. The recent simultaneous announcements by Ford and General Motors that they both intend to use E-commerce for all their supply procurements – in Ford's case worth $80 billion per annum – have been marked by some commentators as the day when E-business came of age.

To conduct E-business transactions, companies need a common language to exchange structured information between their computer systems. HTML, the first-generation language of the Internet, is not suited for this, as it defines only the formatting of information, not its meaning. Enter Extensible Markup Language – XML. Like HTML, XML consists of text delimited by 'tags'; so it is easily conveyed over the Internet. In XML the tags can define the meaning and structure of the information, enabling computer tools to use that information directly.

XML has been embraced enthusiastically by the all major IT suppliers and user groups. Its standardisation and rapid take-up have been the major development in IT over the past two years. Industry enemies like IBM, Microsoft, Sun and Oracle all support the core XML 1.0 standard, are developing major products based on it, and collaborate to develop related standards. XML is now the world standard platform for E-business transactions. However, in any business application, XML itself is not the answer. It is only a standard foundation, on which answers can be built. Therein lie the pitfalls.

Pitfalls

XML today is in a position like relational databases twenty years ago. The relational data model does not pre-define how you will store your data; it gives you a standard foundation, leaving you to choose how to use it to store data. You must define the tables and columns. What relational databases do for information storage, XML does for transmission of information over the Net. XML does not of itself define how information will be structured, or what it can mean. You define the tags and their meanings, which will give structure and meaning to the data. Both XML and relational databases provide a framework for organising data which is simpler, more flexible and more powerful than what went before. To understand the future impact of XML, we must understand the past and present impact of relational databases. Because relational databases are so much better than what went before them – simpler, more powerful, more accessible – when they arrived around 1980 we started building many more new databases. New applications with new databases proliferated, grew, and became indispensable. Then we began to see the confusion we were creating. As companies built tens and hundreds of new databases, the same data were stored in many different databases with redundancies, overlaps and inconsistencies. The result today inside many big companies is information chaos – a corporate data spaghetti of system-to-system links for data exchange, application integration costs as high as 40% of the total IT budget, and above all, delays in building vital new applications. The problem with system-to-system interchanges between relational databases is this: if you have N databases, the number of possible data interchange links can grow as N squared. With even thirty different databases – and most companies have far more than that - there are nearly 1000 possible links. If even a small fraction of these interfaces have to be built, maintained and understood, this is far too much complexity to be manageable. Many companies have lost this data complexity battle and are paying a heavy price. Twenty years after the start of the relational era, we have still not solved the complexity problems it created. The potential pitfall of XML is this: by widespread use of XML we will create a greater data complexity battle – this time across whole industries rather than individual companies – and again will lose it, suffering even greater consequences.

As companies first venture into E-business, this may not seem to be a big problem. Surely, they say, we can adopt one of the emerging XML-based message standards such as FIN-XML for financial transactions, or C-XML for diverse commercial transactions, and simply build the interfaces between this message standard and our own core systems? Any one of these XML-based E-commerce standards is as complex as a medium-large relational database schema. Building interfaces between one of these standards and one of your company's large IT systems is a large piece of work – but feasible. Unfortunately there is not just one XML-based standard emerging, but many – for different industry sectors, and even several within the same sector. There are already XML-based standards wars between different industry groupings. It will not be possible to interface to just one of the standards, for many reasons: your company will not be operating in just one market sector; one standard does not address all your business needs; your business partners may back different standards; standards wars will continue, and you need to back the winners. As the new standards are used, they will grow and evolve. You will have to build and maintain interfaces between multiple systems and multiple XML standards, just to stay in E-business.

XML-based message standards are today proliferating across the world, just as relational databases proliferated from the 1980s within individual companies. The result will be the same data confusion as happened before, this time played out on a worldwide scale. Each company will have a patchwork of application databases (as before) and interfaces to many different XML message formats. This will be like its previous patchwork of relational databases, only more complex and without even the ability to control its own destiny. The costs of interfacing many packages, business processes and legacy systems to the many XML standards are multiplicative, and may soon be the key inhibitor which holds back your company from exploiting new business models and processes. This is a complexity trap at least as large, and as dangerous, as the complexity trap in multiple relational databases. In twenty years we failed to solve the relational complexity trap. How will we fare with the much bigger XML complexity trap?

There are two possible ways forward – to back some supra-standards repository 'framework' such as Microsoft's BizTalk, or to manage the XML interfaces properly within your own company. While they are not mutually exclusive, I shall discuss them separately. BizTalk to the Rescue?

The BizTalk framework, promoted by Microsoft and partners, aims to make it easier for individual companies to mix and match XML message formats from different vendors and standards groupings, picking out the sets which best meets their business needs and application mix.

It does so in three ways. First, it sets out a "canonical form" in which any application-specific set of XML message formats can be defined. Second, it provides a public repository at http://www.BizTalk.org where BizTalk-conformant XML message format sets can be validated, lodged, retrieved and freely used. Third, the creators of BizTalk- conformant message standards are encouraged to lodge XSL-based translations between their own formats and others' standard formats. (XSL is the W3C standard Extensible Style Language, which can be used to translate from one XML format to another)

The theory is this: your company subscribes to BizTalk-conformant standard A, so you build interfaces from your IT systems to that message format. Your business partner subscribes to a different standard B, which is also BizTalk-conformant. Using the XSL translation between A and B, available from the BizTalk repository, you can send XML messages in standard A which you or your partner translates from A to B, so he can understand your messages. If enough standards come under the BizTalk 'umbrella', then you can freely exchange messages with any business partner who uses it. Is this, then, the way forward for your company? Should you bet that the industry strength of Microsoft and its partners is enough to drive all the package solutions you and your business partners need into the BizTalk framework? Can you just format messages into any BizTalk-conformant standard, and then rely on BizTalk translations to do the rest? The history of relational databases suggests not, because the BizTalk translation framework does not solve the N-squared problem. Just as for databases, if there are N different 'standard' message formats, up to N(N-1) translations may be required. Currently the number of XML-based message standards defined by industry groupings (N) is well over 100, and growing. If you were the creator of one of these XML-based standards, would you spend the time to understand all the other standards – your competitors – in enough detail to create and maintain all the necessary translations? There is enough work just to keep your own XML formats abreast of changing business needs; maintaining thirty or fifty XSL translations as well would be a massive extra workload, for a limited payback.

For this reason, do not hold your breath for plug-and-play application compatibility via BizTalk XML. And do not bet your corporate IT strategy on it. If you are to avoid the N-squared trap of incompatible message formats and legacy systems, the solution lies within your own company.

Your Own Gold Standard

There is a viable way forward for individual companies, which is to take control of the problem themselves. The key is in this observation: Nobody else is in just the same business as you are. Therefore you need to build a single technology-independent logical model of all the information needed to drive your business – your own gold standard for business information – and then to map all the different technology pieces (your own IT systems, and external XML message formats) onto that logical model. Define message formats and translations from the logical model. Any data translation between system A and message format B is done not directly, but in two steps via the logical business model. By doing every data translation in two steps, you will abolish the N-squared complexity barrier. For each new IT system or XML message format, you only have to define one translation (to your logical business model), rather than N translations to other systems and message formats. The steps required to do this are:

  • Build a single logical model of the information needed to drive your business;
  • Map your main IT systems and XML message formats onto the logical model;
  • Define common XML message formats based on the logical business model;
  • Define XML message translations into and out of the common message format.

Having done this, you can then translate between any two data models or XML message formats via the common message format. The hard steps here are (1) and (2); once they are done, steps (3) and (4) are largely mechanical. Charteris have developed tools and techniques to help you through these steps, which have been proven to work for large complex enterprises. If you can succeed in this endeavour, the prize is well worth having – a coherent information architecture, to insulate your company from a growing industry-wide data spaghetti, enabling you to adapt rapidly to new business models and data needs. In the era of E-commerce, the winners will be those agile companies who can move rapidly to new successful business models. A coherent, understandable corporate information architecture is a key to that agility.

The Way Forward

The BizTalk initiative is not the only attempt to bring order to the proliferation of industry sector XML applications. Others, such as the XML/EDI group at http://www.xmledi.com/repository/ , are proposing public repositories of XML message definitions, attempting to link all definitions to a common business vocabulary so as to ease the N-squared translation problem. For instance, the XML/EDI group are proposing a large set of semantically neutral 'Bizcodes' as an intermediary for all XML translations.

Each of these 'supra-standards' repositories will help to manage the complexity of different XML dialects amongst those who subscribe to it. However, as with the XML dialects themselves, it is not at all clear which repository initiative will establish the earliest momentum, or win out in the end. None of the XML repositories will solve the N-squared translation problem for all businesses, unless it can establish a common model of all business information, agreed between all parties, which can then act as an Interlingua for all XML translations. The chances of such a massive information model being developed consistently and completely, agreed across all countries and industry sectors, and then maintained effectively, are remote.

Therefore no company can afford to wait for these cross-industry initiatives to succeed. In stead, each company can establish its own model of business information, based on its own business needs, and perform all XML translations via this model. This is a feasible undertaking, which will start to deliver results in months, and will then vastly simplify the problems of interfacing with a changing and unpredictable outside world. At the same time, it does not preclude you from taking advantage of BizTalk or any other XML repository initiative, if and when a winner emerges.