Skip navigation.

Cast Your VoteAll recent postsInternet Explorer Wiki at Channel9

Serving XML: Le Divorce

About a week ago I hooked up an HttpModule which set Content-Type to application/xhtml+xml for those user agents that “get” it. With the decent size of this site problems reared their head really fast. Having solved a number of challenges I ran into a dead-end and had to pull the plug on application/xhtml+xml.

The biggest strength and yet the biggest weakness of this aspect of ASP.NET is server controls. They are the next best thing since sliced bread, yet you are stuck with the markup they produce. And if they produce incorrect markup… yes, you’re stuck with it.

One good example is the <asp:ValidationSummary> control. If you have a form with a bunch of fields, some of them required, some that have to pass RegEx rules, etc, the ValidationSummary control can collect all data entry errors and display a nice summary. I use this control in Tools and The Hall of Fame. It’s a great idea, but the control renders markup that breaks everything when served as XML.

For example, with one required field on a form the VaidationSummary control produces the following HTML:

<table id="_ctl0_vs" class="vsummary" cellpadding="0" 
       cellspacing="0" border="0" width="100%">
<tr><td>
<font color="Red">Enter site URL<br></font>
</td></tr>
</table>

As you see the font tag is thrown into this sauce as well as an open <br> tag. Both Opera and Mozilla choke on this code snippet because, since they attempt to parse XML, there’s no matching pair for the open <br>. A workaround would be… to ditch the validation summary control. But… it’s so helpful.

Another problem I ran into when working on the polling control was about postbacks. If there’s a control on the page which posts-back and requires handling a special snip of JavaScript is injected:

<script type="text/javascript">
<!--
 function __doPostBack(eventTarget, eventArgument) {
 var theform = document.getElementById ('__aspnetForm');
 theform.__EVENTTARGET.value = eventTarget.split("$").join(":");
 theform.__EVENTARGUMENT.value = eventArgument;
 theform.submit();
 }
// -->
</script>

Well, it looks a little different in ASP.NET 1.0 and 1.1. Back in February I explained what I did to “fix” it. Anyway, Mozilla hates it. Pardon my ignorance, but it seems that the comment tags throw it off. No postback happens and you get is __doPostBack is not defined even though it’s there.

If I switch Content-Type back to text/html Mozilla likes it again. Not having postbacks is a bummer. A big one. Pretty much everything about building custom server controls revolves around postbacks. Interestingly enough Opera doesn’t mind it.

If you dig around server controls with Reflector I’m sure you’ll find a couple more examples along the same lines. So what does all this mean?

We’re Not There Yet

ASP.NET isn’t ready yet to produce markup that can be served as XML with the application/xhtml+xml content type. No, it’s not a sin unto death and, please, don’t list us with the sons of perdition. I didn’t expect Microsoft folks to worry about in the first pass anyway, so I’m willing to cut them lotsa slack. Until ASP.NET 2.0. Nevertheless, I think there’s a bigger issue here which brings me to my next point.

Don’t Serve Web Applications as XML

It is my deepest conviction that we’re well over the hill with “web sites” of the dot com era. Businesses face much more complex tasks these days and simple HTML sites don’t cut it. The only worthy application for a strictly-HTML site I can think of as a brochure site. Everything else requires server-side processing.

I never thought of AspNetResources.com as a web site. This is a web application. It follows the now traditional n-tier architecture with a presentation layer and a data layer. I excluded a business logic layer because there’s no strong need for it yet. Five HTTP modules handle pre- and post-processing. Everything feeds off of SQL Server via stored procedures. Searching is handled by Full-Text Search. A number of user controls and custom server controls handle presentation. This is by far not the biggest and most complex web app but it’s easy enough to slip up.

When you serve your content as XML there’s no margin for error. You deviate just a little and the browser throws a parser error. It’s supposed to. However, therein lies a nasty side effect—you can’t trap the error and present a nicer message. I’m a big believer in handling server-side errors but a client-side error renders you helpless. You can’t even know something went kaboom. This is pretty ugly. User agents need to do better than this.

Imagine for a second Citi Bank, Allstate or Yahoo blowing up with a red-on-beige error that a mismatched tag was encountered. None of those companies would risk busting their business for the sake of the noble cause of markup purity.

On the other hand, if you run a web app in complete isolation and under your God-like control it is a perfect ground for XML. A blog or a personal site is a perfect candidate.

XML Is a Contract

Let me digress. The idea of sending some data from a Publisher to Subscriber is no new notion. The idea of web services, therefore, is nothing new. What is new is that we’re finally agreeing on certain protocols (read “contracts”) that make it happen. SOAP and WSDL are a good example. Being XML based, they simply define contracts a sender and a receiver should stick to. You have to stick to this contract 100%. This is where closing all tags becomes of paramount importance.

On the web there are fewer contracts. HTML is a contract but it’s being violated constantly. XHTML is a contract but as long as it says that something “may get deprecated” it will be violated.

People are nowhere near perfection. You can’t demand it from them. You can’t be vigilant about properly closing all tags or quoting attributes 24x7. The moment you turn your work over and let others drive your CMS you can forget about purity. The moment someone leaves a comment on your site you take chances.

How many times have you seen a web site you developed go to complete crap because the people in charge copy and paste chunks of text from MS Word with horrendous markup? Or they simply don’t bother typing in correct HTML and why should they? Does everybody need to know how many belts they need to change in their cars and how to get to their timing belt? Which oil do you put in your engine? 5W-30, 10W-30 or 10W-40? Is everyone fluent in typing correct XHTML comments? :)

All these problems stem from a simple fact: nobody ever agreed to your contract. Add comment validators and processors but all they do is bend others to your contract and yours alone.

DOCTYPEs Do Matter

I take the side of importance of DOCTYPEs rather than serving content as XML because I think the business case for DOCTYPEs is stronger. The purism case is weaker, but this ain’t no Zion. This is the Matrix as we know it. I’m glad we have DOCTYPE switching because it imposes stricter rules on coding practices and shows “a more excellent way.” I’m also glad it fixes a number of interpretation inconsistencies in Internet Explorer. Use DOCTYPEs (XHTML, if possible), validate, fix, validate again.

If you’re in the business of developing web applications ditch application/xhtml+xml, forget it and don’t look back. Don’t hurt your business. Back to content-type=text/html

Comments

Comment permalink 1 Darren Syzling |
I think for the Javascript to be allowed on Mozilla in xhtml strict you'll have to wrap it within a CDATA section.

You could take a look at the DOM validator controls here:
www.asp.net/ControlGallery/ControlDetail.aspx?Control=596&tabindex=2

These controls are a replacement for the default validation controls to resolve problems with use of non compliant w3c dom model javascript. Not sure if they also resolve the markup issues for the summary control - possibly not.

Unfortunately this problem is going to get worse unless control developers pay attention to this stuff. I've tried two controls recently that wouldn't even work in IE 'standards' mode - they would only work with quirks mode. One of the vendors claimed this was Microsoft's recommendation to support IE5 and 5.5. I had to explain that the same control could have been rendered in IE5+ and Mozilla in standards mode with the correct CSS and markup. If this kind of thig happens what chance have we got for controls to produce strict XHTML? May be when the ASP.NET controls provide greater accessibility and XHTML support control vendors will pay more attention to these issues.


Darren
Comment permalink 2 Milan Negovan |
Yep, I've seen a couple of controls around. One problem is I don't have a budget (no budget, that is) to purchase them. The other problem is with the idea of replacing the entire server side form control, for example, and committing to it.

I agree that there's VERY little understanding among ASP.NET control vendors about web standards. If control developers target only IE they might be out of business pretty soon, and it's a disservice on their part. The only folks who do it seriously, that I know of, are XHTML Web Controls.NET.
Comment permalink 3 Anne |
So the next most important question, and the most obvious one is: why use XHTML?

DOCTYPEs only matter in a HTML world. And since we live in such a world, they are quite important. The difference is that people think that by using a XHTML DOCTYPE they actually use XHTML. From that point of view, DOCTYPEs are not important. If you want to use XHTML, you need to use the correct MIME type. And when you do that, DOCTYPEs are a lot less important than before.
Comment permalink 4 Milan Negovan |
:)
Comment permalink 5 Asbjørn Ulsberg |
I have som experience with serving ASP.NET with XHTML MIME. It's hard, but possible. First, you need to either replace all the server controls that produce braindead and invalid code (the best example being the validator controls). This is done in the Page.OnRender() method, which I usually write in a base Page class which all ASPX pages inherit from.

Second, you need to dynamically serve them as 'application/xhtml+xml' or 'text/html' depending on what the UA 'Accept's. This is also written into the base Page class. Third, you need a validation mechanism. You can either do it proactively per request, by loading the contents of base.Render(..) into an XmlTextReader or something similar, or more reactively with a validation service that crawls the site and reports validation errors as it finds it.

There are (huge) caveats with both validation methods. The proactive one makes pages render at least twice as slow. If pages are cached and there is enough juice to serve them this way, this is the best validation method. It gets absolutely all errors, and is able to revert the content type if something goes wrong, so that the page is still displayed.

The reactive validation mechanism works as a service, running on a computer. This service crawls the website like any user agent, but will of course use the knowledge of the internal structure of the site to most efficiently crawl the site and also be sure to crawl every possible URI the site has to offer. If this service finds an erronous page, it can't fix it. It can report the error, but damage will already most likely have been done, since the problem isn't fixed before the user gets the page served.

There is actually a third mechanism that is less likely to be deployable on most ASP.NET-based sites, which involves creating static pages. The way to validate then, is on creation of each page. Each XML fragment the page consists of is validated before it is stored (wherever that is), and the full page is validated when it is «compiled». This way, validation errors are catched before the page is served to the user, and the site doesn't suffer from having to validate on each request. Additionally, this mechanism can be combined with service-based validation, to be even more secure.

ASP.NET don't give enough power to do this efficiently, unless you do static page serving. But few people do that, because then there's no real use for a dynamic server language for serving the pages in the first place. I hope Microsoft can think of something to clean up the mess the server controls are generating today, as well as a good framework to validate pages either pro- or re-actively.
Comment permalink 6 Mitchell |
Interesting article. I think there are two issues here...and they are serious problems in .NET...

First, yea, for the next year or so until IE 7(or whatever it will be called) is released and fixes the problem with Web Standards support and XHTML, dont bother serving your xhtml (which is xml by definition) in web pages as application/xhtml+xml period, either via the MIME on the server, or if possible, the meta-tags (which generally dont matter). The reason is IE doesnt render the page correctly using that mime or doctype, and to do so as xml, you need to add the xml prologue and that crashes the doctype in xhtml doctype pages as it has to fall before it in the code, and so you end up with IE 6 (the worlds current dominant browser) rendering in IE 5 quirksmode. Only IE has this major problem...most of the other vendors get is right or close to right. Its a mess! Thanks Microsoft!

Second, ASP.NET web forms/controls are another giant mess of non-compliance, forced inline styles, mishandled "style objects", forced javascripted validation functions (also non-web-standards-compliant), and html attributes that will not allow your pages to "validate as XHTML" at this time using this site, validator.w3.org. Not good!

After diving into ASP.NET and Visual Studio, I feel the application is basically useless for web application design. Why? Because of all the garbled and "old-fashioned" html that its web forms spit out. I cant tell you how many classes Ive attended and sites Ive developed with .NET people who blindly drag-and-drop these this html garbage on their pages. The designers should have done what the Macromedia/Allaire/JAVA people have done and thats simply to not force bad markup down your throat and proprietary junkyard code and let the designers control that. Its bad! Unless you go research what Web Standards and get a good understanding of that practice, what Mozilla has done with it in their browser, and where the new CSS and xhtml standards are moving towards (including XHTML 2.0 which is now released), you will not know how bad Microsoft's products are in supporting web page markup, period. I wish they would have completely gutted their objects as back end server-side objects free of front end manipulation and this would have freed up web designers and standards people to take the aspx pages and simply link in styles and markup as needed. But instead they chose to put programmers with NO UNDERSTANDING OF HTML AND STYLE SHEETS in charge of designing the mess we now have to deal with in this product. Sure, it might look close to good, but its not good, and anyone serious about implementing the new Web Standards into their web applications (which is the future) better be cautious about Visual Studio, Sharepoint, FrontPage, and .NET. They dont work with XHTML, CSS or acessibility standards either.

-Mitchell
Comment permalink 7 Milan Negovan |
Mitchell, I agree with you on all points. In regard to your last statement: that's why I'm here. :)
Comment permalink 8 Dwight Vietzke |
Agree totally with the MS shortcomings listed above. Even worse, IE 6.x will not render valid xhtml (as checked at WC3) using application/xhtml+xml http server content type. Best I could get was a display of the page as xml... What's a poor web dev to do? Oh, by the way, your page is missing alt info for several images ;)
Comment permalink 9 Windows Vista |
The reactive validation mechanism works as a service, running on a computer. This service crawls the website like any user agent, but will of course use the knowledge of the internal structure of the site to most efficiently crawl the site and also be sure to crawl every possible URI the site has to offer. If this service finds an erronous page, it can't fix it. It can report the error, but damage will already most likely have been done, since the problem isn't fixed before the user gets the page served.
Comment permalink 10 Alex |
Nice article, thanks. BTW, I have noticed that strict xhtml with he right DOCTYPE actually renders really FASTER in the browser... That's one big reason I chose xhtml over html.
Comment permalink 11 Alex |
Nice article, thanks. BTW, I have noticed that strict xhtml with he right DOCTYPE actually renders really FASTER in the browser... That's one big reason I chose xhtml over html.

Emails and Notifications

Would you like to be notified when somebody responds to this post?  Would you like to have these comments emailed to you?

TrackBacks

1 Web Standards Project BUZZ  |   
ASP.Net & Standards Part II

Wow. My post on ASP.Net and standards seems to have touched a nerve. I received a pile of feedback via...

Sorry, TrackBacks are not allowed.

Submit your comment

Please enter only text since all HTML tags except hyperlinks will be stripped. Hyperlinks will become live links. Any comments with flaming or offensive language will be deleted. Be courteous to other posters. Thank you.

Your name (required):
Your email (optional):
Your site's URL (optional):
Enter this number
Type in the number above:
Comment (required):