RobinBerjon

Introduction

XML Bad Practices

At the XML Prague 2009 conference I presented a paper on "Designing XML/Web Languages: A Review of Common Mistakes". Since much of the subject matter presented there is still the topic of heated discussion amongst specialists I thought it a good idea to make a pass through that paper, updating it based on feedback I have received and newer examples, and post it in blog form. I will post each section here as I go through this process. Today, we start simply with the introduction.

XML is now over ten years old and can euphemistically be dubbed a success. That being said, I don't believe I need convince readers that not all of its uses have been successful. Over time, many bright minds have attempted to describe how to best make use of it when designing vocabularies, but I believe it is safe to say that those efforts, no matter how excellent, have not been sufficient in ensuring that all applications of XML are produced in an entirely sane manner.

Part of the reason for that is education and outreach: people will often just grab XML and run, without digging around for best practices. But a larger problem is that XML combines simplicity and flexibility in such a way that a set of best practices only gets one so far in avoiding pitfalls. This does not mean that we are doomed to repeat mistakes over and over again, simply that we need to learn from our experience.

That is why this paper does not try to define a nice and simple manual as an amulet against poor vocabulary design, but rather intends to show some mistakes so that we may learn from them. As such, its organisation is more that of a shopping list rather than a treatise on XML.

Several errors outlined here use SVG as their source. This does not mean that SVG is the only language to make those mistakes, neither does it mean that SVG is a bad XML vocabulary — in fact, SVG rocks. While not at all SVG-specific, there are several reasons for me to appeal to it as an example in several instances:

  • Knowledge of SVG is quite widespread, the specifications are openly accessible to all, hundreds of thousands of examples are available on the Web, which makes verification easy.
  • SVG is a rather successful language. This shows that there is a distinction between making mistakes in vocabulary design at the syntax level and at the application level. It is of course ideal to get both right and I look forward to SVG addressing some of its issues as it moves forward, but good vocabulary design is no substitute for getting everything else wrong.
  • Being one of the earlier major Web languages to be created after XML came into existence, it stumbled upon many of the issues that should be avoided. Being such a "seasoned" language means it has also seen many of the potential errors. In fact, what is most surprising is that the many vocabularies that more or less copy parts of SVG (XAML, Flex) tend to keep the poor decisions and introduce new ones rather than the other way around.
  • All of its specifications were done by a group of smart people, and reviewed not only by a large community but also by other W3C groups (in fact by many non-W3C standards groups too) including the XML working groups. This shows that there is no shame in making some of these mistakes, only shame in not learning from them.
  • Finally, I was personally deeply involved in some stages of SVG's inception. I also use it on a very regular basis. This means that not only do I know it well, but also that when I point fingers and laugh, I know that I have my share of responsibility in some of those decisions.

As a final note before we delve into these mistakes, I would like to make it clear that this domain does not deal in absolutes. There are cases in which one may consider these mistakes to be good solutions; and a few of what I describe as bad practices may even be controversial to the point that they are considered by some as the right option in all situations. I do not see that as an issue: as a community we can discuss and disagree. What matters is that when choosing one way of designing a language over another, one be informed of the discussion so as to make one's own decision.

Share and Enjoy!

Table of Contents

Namespace Issues
XML Is For Humans
Language Issues
Wishful Thinking and Doe-Eyed Beliefs
Miscellaneous