XML-Based standards: Promise and Pitfalls
by Robert Worden
Published in November 1999
The Promise
The greatest driver of business change today is E-business,
brought about by the Internet. In the near term, this
will have most impact on business-to-business transactions.
The recent simultaneous announcements by Ford and General
Motors that they both intend to use E-commerce for all
their supply procurements – in Ford's case worth $80
billion per annum – have been marked by some commentators
as the day when E-business came of age.
To conduct E-business transactions, companies need
a common language to exchange structured information
between their computer systems. HTML, the first-generation
language of the Internet, is not suited for this, as
it defines only the formatting of information, not its
meaning. Enter Extensible Markup Language – XML. Like
HTML, XML consists of text delimited by 'tags'; so it
is easily conveyed over the Internet. In XML the tags
can define the meaning and structure of the information,
enabling computer tools to use that information directly.
XML has been embraced enthusiastically by the all major IT
suppliers and user groups. Its standardisation and rapid
take-up have been the major development in IT over the
past two years. Industry enemies like IBM, Microsoft,
Sun and Oracle all support the core XML 1.0 standard,
are developing major products based on it, and collaborate
to develop related standards. XML is now the world standard
platform for E-business transactions. However, in any
business application, XML itself is not the answer. It
is only a standard foundation, on which answers can be
built. Therein lie the pitfalls.
Pitfalls
XML today is in a position like relational databases twenty
years ago. The relational data model does not pre-define
how you will store your data; it gives you a standard
foundation, leaving you to choose how to use it to store
data. You must define the tables and columns. What relational
databases do for information storage, XML does for transmission
of information over the Net. XML does not of itself define
how information will be structured, or what it can mean.
You define the tags and their meanings, which will give
structure and meaning to the data. Both XML and relational
databases provide a framework for organising data which
is simpler, more flexible and more powerful than what
went before. To understand the future impact of XML, we
must understand the past and present impact of relational
databases. Because relational databases are so much better
than what went before them – simpler, more powerful, more
accessible – when they arrived around 1980 we started
building many more new databases. New applications with
new databases proliferated, grew, and became indispensable.
Then we began to see the confusion we were creating. As
companies built tens and hundreds of new databases, the
same data were stored in many different databases with
redundancies, overlaps and inconsistencies. The result
today inside many big companies is information chaos –
a corporate data spaghetti of system-to-system links for
data exchange, application integration costs as high as
40% of the total IT budget, and above all, delays in building
vital new applications. The problem with system-to-system
interchanges between relational databases is this: if
you have N databases, the number of possible data interchange
links can grow as N squared. With even thirty different
databases – and most companies have far more than that
- there are nearly 1000 possible links. If even a small
fraction of these interfaces have to be built, maintained
and understood, this is far too much complexity to be
manageable. Many companies have lost this data complexity
battle and are paying a heavy price. Twenty years after
the start of the relational era, we have still not solved
the complexity problems it created. The potential pitfall
of XML is this: by widespread use of XML we will create
a greater data complexity battle – this time across whole
industries rather than individual companies – and again
will lose it, suffering even greater consequences.
As companies first venture into E-business, this may not
seem to be a big problem. Surely, they say, we can adopt
one of the emerging XML-based message standards such as
FIN-XML for financial transactions, or C-XML for diverse
commercial transactions, and simply build the interfaces
between this message standard and our own core systems?
Any one of these XML-based E-commerce standards is as
complex as a medium-large relational database schema.
Building interfaces between one of these standards and
one of your company's large IT systems is a large piece
of work – but feasible. Unfortunately there is not just
one XML-based standard emerging, but many – for different
industry sectors, and even several within the same sector.
There are already XML-based standards wars between different
industry groupings. It will not be possible to interface
to just one of the standards, for many reasons: your company
will not be operating in just one market sector; one standard
does not address all your business needs; your business
partners may back different standards; standards wars
will continue, and you need to back the winners. As the
new standards are used, they will grow and evolve. You
will have to build and maintain interfaces between multiple
systems and multiple XML standards, just to stay in E-business.
XML-based message standards are today proliferating across the world,
just as relational databases proliferated from the 1980s
within individual companies. The result will be the same
data confusion as happened before, this time played out
on a worldwide scale. Each company will have a patchwork
of application databases (as before) and interfaces to
many different XML message formats. This will be like
its previous patchwork of relational databases, only more
complex and without even the ability to control its own
destiny. The costs of interfacing many packages, business
processes and legacy systems to the many XML standards
are multiplicative, and may soon be the key inhibitor
which holds back your company from exploiting new business
models and processes. This is a complexity trap at least
as large, and as dangerous, as the complexity trap in
multiple relational databases. In twenty years we failed
to solve the relational complexity trap. How will we fare
with the much bigger XML complexity trap?
There
are two possible ways forward – to back some supra-standards
repository 'framework' such as Microsoft's BizTalk, or
to manage the XML interfaces properly within your own
company. While they are not mutually exclusive, I shall
discuss them separately. BizTalk to the Rescue?
The BizTalk framework, promoted by Microsoft and partners,
aims to make it easier for individual companies to mix
and match XML message formats from different vendors and
standards groupings, picking out the sets which best meets
their business needs and application mix.
It does so in three ways. First, it sets out a "canonical
form" in which any application-specific set of XML
message formats can be defined. Second, it provides a
public repository at http://www.BizTalk.org where BizTalk-conformant
XML message format sets can be validated, lodged, retrieved
and freely used. Third, the creators of BizTalk- conformant
message standards are encouraged to lodge XSL-based translations
between their own formats and others' standard formats.
(XSL is the W3C standard Extensible Style Language, which
can be used to translate from one XML format to another)
The theory is this: your company subscribes to BizTalk-conformant
standard A, so you build interfaces from your IT systems
to that message format. Your business partner subscribes
to a different standard B, which is also BizTalk-conformant.
Using the XSL translation between A and B, available from
the BizTalk repository, you can send XML messages in standard
A which you or your partner translates from A to B, so
he can understand your messages. If enough standards come
under the BizTalk 'umbrella', then you can freely exchange
messages with any business partner who uses it. Is this,
then, the way forward for your company? Should you bet
that the industry strength of Microsoft and its partners
is enough to drive all the package solutions you and your
business partners need into the BizTalk framework? Can
you just format messages into any BizTalk-conformant standard,
and then rely on BizTalk translations to do the rest?
The history of relational databases suggests not, because
the BizTalk translation framework does not solve the N-squared
problem. Just as for databases, if there are N different
'standard' message formats, up to N(N-1) translations
may be required. Currently the number of XML-based message
standards defined by industry groupings (N) is well over
100, and growing. If you were the creator of one of these
XML-based standards, would you spend the time to understand
all the other standards – your competitors – in enough
detail to create and maintain all the necessary translations?
There is enough work just to keep your own XML formats
abreast of changing business needs; maintaining thirty
or fifty XSL translations as well would be a massive extra
workload, for a limited payback.
For this reason, do not hold your breath for plug-and-play
application compatibility via BizTalk XML. And do not
bet your corporate IT strategy on it. If you are to avoid
the N-squared trap of incompatible message formats and
legacy systems, the solution lies within your own company.
Your Own Gold Standard
There is a viable way forward for individual companies, which
is to take control of the problem themselves. The key
is in this observation: Nobody else is in just the same
business as you are. Therefore you need to build a single
technology-independent logical model of all the information
needed to drive your business – your own gold standard
for business information – and then to map all the different
technology pieces (your own IT systems, and external XML
message formats) onto that logical model. Define message
formats and translations from the logical model. Any data
translation between system A and message format B is done
not directly, but in two steps via the logical business
model. By doing every data translation in two steps, you
will abolish the N-squared complexity barrier. For each
new IT system or XML message format, you only have to
define one translation (to your logical business model),
rather than N translations to other systems and message
formats. The steps required to do this are:
- Build a single logical model of the information needed to drive your business;
- Map your main IT systems and XML message formats onto the logical model;
- Define common XML message formats based on the logical business model;
- Define XML message translations into and out of the common message format.
Having done this, you can then translate between any
two data models or XML message formats via the common
message format. The hard steps here are (1) and (2);
once they are done, steps (3) and (4) are largely mechanical.
Charteris have developed tools and techniques to help
you through these steps, which have been proven to work
for large complex enterprises. If you can succeed in
this endeavour, the prize is well worth having – a coherent
information architecture, to insulate your company from
a growing industry-wide data spaghetti, enabling you
to adapt rapidly to new business models and data needs.
In the era of E-commerce, the winners will be those
agile companies who can move rapidly to new successful
business models. A coherent, understandable corporate
information architecture is a key to that agility.
The Way Forward
The BizTalk initiative is not the only attempt to bring order to the proliferation
of industry sector XML applications. Others, such as the
XML/EDI group at http://www.xmledi.com/repository/ , are
proposing public repositories of XML message definitions,
attempting to link all definitions to a common business
vocabulary so as to ease the N-squared translation problem.
For instance, the XML/EDI group are proposing a large
set of semantically neutral 'Bizcodes' as an intermediary
for all XML translations.
Each of these 'supra-standards'
repositories will help to manage the complexity of different
XML dialects amongst those who subscribe to it. However,
as with the XML dialects themselves, it is not at all
clear which repository initiative will establish the earliest
momentum, or win out in the end. None of the XML repositories
will solve the N-squared translation problem for all businesses,
unless it can establish a common model of all business
information, agreed between all parties, which can then
act as an Interlingua for all XML translations. The chances
of such a massive information model being developed consistently
and completely, agreed across all countries and industry
sectors, and then maintained effectively, are remote.
Therefore no company can afford to wait for these cross-industry initiatives to
succeed. In stead, each company can establish its own
model of business information, based on its own business
needs, and perform all XML translations via this model.
This is a feasible undertaking, which will start to deliver
results in months, and will then vastly simplify the problems
of interfacing with a changing and unpredictable outside
world. At the same time, it does not preclude you from
taking advantage of BizTalk or any other XML repository
initiative, if and when a winner emerges.