Codex

Interested in functions, hooks, classes, or methods? Check out the new WordPress Code Reference!

HTML to XHTML

This Page is Obsolete

Since the introduction of HTML5, WordPress and most themes support HTML5. XHTML is considered obsolete.

What is and what isn't XHTML

WordPress, as a system, is based on documents written in the XHTML scripting language. XHTML 1.0 (which is currently the most widely supported version and stands for eXtensible Hyper Text Markup Language) became a W3C recommendation in the year 2000, and was intended to serve as an interim technology until XHTML 2.0 could be finalized. Eight years later XHTML 2.0 still isn't finished. This document therefore uses the phrase XHTML to refer to XHTML 1.0 only.

XHTML is very similar to HTML as both are descendants of a language called SGML. However, XHTML is also descended from XML, which is a scripting language with much stricter grammar rules than HTML, and XHTML has inherited some of that discipline. XHTML is mainly differentiated from HTML by its use of a new MIME TYPE and the addition of some new syntax rules which are explained below.

Why Should I use XHTML

WordPress prints XHTML from all its internal functions, all themes therefore are now in XHTML and so too are most plugins. So, if you want to use WordPress, you should buckle down and learn some XHTML as that's where it is right now.

What are the differences between XHTML and HTML

If you are familiar with HTML, you will be glad to know that the majority of what you know about HTML is still relevant in XHTML. The main differences are that XHTML forces webpage authors to be more consistent and to write more legible code. There are a few syntax and grammar differences and a few HTML tags have been dropped and, really, that's about it. If you know HTML then you'll be surprised at how easy it is to switch to XHTML, and the new XHTML rules will force you to become a better programmer too!

So how do I write XHTML?

Well, here's a quick check list of the important requirements of XHTML and the differences between it and HTML. This is NOT a comprehensive XHTML language reference. Please look here for that.

In these examples some code has been omitted for clarity

All tags, attributes and values must be written in lowercase.

Right:

<a href="www.kilroyjames.co.uk" >

Wrong:

<A HREF="www.kilroyjames.co.uk" >

All attribute values must be within quotes

Right:

<a href="www.kilroyjames.co.uk">

Wrong:

<a href=www.kilroyjames.co.uk>

All tags must be properly nested

Right:

<em>this emphasis just keeps getting <strong>stronger and stronger</strong></em>

Wrong:

<em>this emphasis just keeps getting <strong>stronger and stronger</em></strong>

All XHTML documents must carry a DOCTYPE definition

The DOCTYPE is an intimidating looking piece of code that must appear at the start of every XHTML document, it tells the browser how to render the document.

Rules for the DOCTYPE tag:

  • It must be the first tag in the document
  • The DOCTYPE is not actually part of the XHTML document so don't add a closing slash
  • It should point to a valid definition file called a DTD that tells the browser how to read the document
  • You must write the DOCTYPE tag correctly otherwise your document will explode* (into little pieces of HTML called "tag soup") and be unvalidateable.

* I am, of course, perfectly serious...


There are three types of valid XHTML 1.0 document: Strict, Transitional, and Frameset. If you can get your document to validate with "Strict" then do so, however some legacy tags and attributes aren't allowed in Strict so you can use "Transitional" instead.

Note: you might have a problem getting WordPress to validate as Strict because, as of version 2.6.2, some of the internally generated <form> elements still use the "name" attribute, which is not allowed under the Strict DTD, ie. <input name="my_button" />

Note also that using a Transitional DTD takes most browsers out of "Standards" mode. It is much trickier to get your web pages to look consistent across different browsers when the browsers are not in Standards mode. I'm not going to explain the minutiae of the DOCTYPE tag as it gets deeper and more complicated, just know that for best results you should use one of the following, preferably the first one (Strict):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

The HTML tag must contain an XMLNS attribute

You don't need to understand the "XML namespace" attribute, except to know that it is required in all XHTML documents. Here is an example of how to write it:

<html xmlns="http://www.w3.org/1999/xhtml">

Documents must be properly formed with HTML, HEAD, TITLE and BODY tags

In HTML it is possible to write a webpage that contains none of the above tags; in XHTML it is not. The above tags must be included and they must be nested and ordered correctly, as follows (the DOCTYPE has been omitted):

<html xmlns="http://www.w3.org/1999/xhtml">
 <head>
  <title></title>
 </head>
 
 <body>
  <p>   
    See how the TITLE must be placed in the document HEAD – the TITLE is considered 
to be a "required child" element of the HEAD.
Notice that the HEAD must also appear before the document BODY.
Notice also how both the HEAD and the BODY must be contained
within the HTML tag. Again, HEAD and BODY are "required child"
elements of the HTML tag. Finally, notice that this text is
written within a <p>paragraph</p> tag; in XHTML you may
not write text directly in the BODY tag without using a suitable
container tag, such as <p> or <div>. </p> </body> </html>

All tags must be closed, even single tags

<p>Mary had a little lamb
<p>It's fleece was white as snow

This code is not valid XHTML as the closing </p> tags have been left out. The following example is correct.

<p>Mary had a little lamb</p>
<p>It's fleece was white as snow</p>

In XHTML even single tags have to be closed - absolutely NO tag may be left open.

<p>
 Mary had a little lamb <br>
 It's fleece was white as snow
</p>

Therefore the above example is wrong because the <br> tag is not closed. To close a single tag like <br> and <hr> you simply add a forward slash before the final bracket, like so: <br /> and <hr /> (the white space is optional). To correct the above example we'd write:

<p>
 Mary had a little lamb <br />
 It's fleece was white as snow
</p>

This is correct XHTML.

Attribute minimisation isn't allowed

In HTML, attributes can be strung together almost like they were keywords, ie. <dl compact>, this is called attribute minimisation. In XHTML that is not allowed, attributes and values must be explicit, ie.

<dl compact="compact">

ID and NAME attributes

In HTML it was legal to use ID and NAME attributes interchangeably. In XHTML the NAME attribute is formally deprecated and cannot be used. In all cases where you would think to use a NAME attribute you must now use ID instead. e.g.

correct HTML

<input type="submit" name="s" value=" Search " >

and now the correct XHTML version

<input type="submit" id="s" value=" Search " />

STYLE is all in your HEAD

XHTML does not allow STYLE declarations within the body of a document they must be placed in the document HEAD instead.

Entity references

Write all literal ampersands as &amp; or they will be assumed to be part of an entity reference. e.g. &reg; is the entity reference for the symbol ®. Therefore M&S is invalid XHTML because &S is not an entity reference, you must write it as M&amp;S.

Conclusion

As was previously mentioned, this is not a comprehensive reference but it should be enough to get you up and running with XHTML pretty quickly. Good luck!

Problems with XHTML

Most people don't realise that to use XHTML properly it must be served using the new MIME TYPE "application/xhtml+xml". A MIME TYPE is simply a description that the web server sends to a browser to tell it what sort of document is coming. For instance a JPG image is sent with a MIME TYPE of "image/jpeg" and an HTML document with a MIME TYPE of "text/html". Sending an XHTML document with a MIME TYPE of "text/html" results in the document being parsed and validated as HTML, not as you would no doubt hope, as XHTML. You must use the correct MIME TYPE if you want to use XHTML otherwise you are simply using non-standard HTML.

HTML5

Because of seemingly intractable problems with the development of XHTML (mainly that XHTML 2 is incompatible with previous versions of XHTML and HTML and also the MIME TYPE issue), a competing standard supported by Mozilla (Firefox), Apple (Safari), Opera, Microsoft (Internet Explorer) and some other key Internet players has become the new favourite to succeed the old HTML 4.01 standard.

HTML5 was passed as a working draft by the W3C in January 2008 and became a candidate recommendation in December 2012.

Resources