Web tech

Microformats and what’s wrong with them

I just came from a morning-long presentation on microformats, by André Luís and I must confess I was not very impressed at all. Not by André, who’s a great guy who obviously knows his stuff and managed to make a clear presentation and still fend off some hard questions.

I was less-than-impressed by the microformats idea itself.

For a while, I couldn’t quite put my finger on what was really wrong with microformats and then it hit me.

Microformats reminds me of the mid-90s when people started using tables to build layouts in HTML. We looked at boring block-based webpages and felt frustrated that we could not get the simplest two column layout out of HTML. And then, we noticed tables and how you could hide the borders and place stuff inside the cells thus creating all sorts of different page layouts.

That’s what we did for years on the web until, of course, CSS-based layouts became the norm. As it turned out, using tables to build layouts was just plain wrong, but it was the only way webdesigners and developers found, at the time, to do it.

It was a hack.

Years later, we’ll still fighting for people to leave that behind and build their content and presentation separately and to stop using tables for anything other than displaying tabular data.

Microformats seem to be doing exactly the same thing. Using pre-existing HTML elements and trying to give them some sort of other meaning in order to create semantic value for the content. It all seems very slapdash and confusing.

Now that everyone’s used to use classes to declare CSS properties, you can have certain kinds of special class names that mean something in a microformat context. But the values aren’t reserved, so you’re never quite sure.

You’re also never quite sure of what might happen later on, when someone decides to change property values or names around. I think that maybe, some time from now, people will be going around telling us not to use classes and links to try to define relationships and meaning in content.

On the other hand, I think there’s a huge barrier in what comes to user generated content. People just do not have the patience to tag their content silly with meaningful semantic markers. You have to build them specific, unobtrusive, backend tools to markup their content for them.

Like WYSIWYG editors write your HTML for you, you’d need something that would write microformats for the users, invisibly. But that’s almost impossible.

If you have the intelligence to ‘read’ what the user’s writing and automagically tag stuff, then you have that intelligente to build a crawler that can interpret human-written text and have no need for microformats in the first place.

While I can clearly see the need for something like microformats to exist to face that lack of intelligence, I don’t think the current implementation is simple or practical for anything other than coders playing code-masturbation.

Users, common, everyday users, don’t care. And in the end, that’s who we’re always working for.

6 replies on “Microformats and what’s wrong with them”

Pedro, first of all, thanks or the kind words.

It’s not always easy to take a step back and actually look at what we’re doing… but I’ve tried to do just that, several times. I can totally relate to the clunkiness of the spec’s of some formats.. it’s not perfect. Although, to my knowledge, using tools available today, it’s the best way we can do these kinds of tricks. The scenario is bound to evolve…

I’ve seen current engines trying to extract semantic meaning off of webpages… it currently sucks. Not only is it usually tight-coupled with the English language, but it also fails considerably to provide trustworthy data. Data you can work on.

What you guys want, in terms of implementation, I believe can be achieved by RDF and RDFa… it’s a different world though, where everyone and their mother can write their own ontologies and whichever gets picked up, wins. That’s why it’s easier to provide simple actions like these, add event, add contact, check relationships, etc, through microformats. RDF certainly has its place and it’s definitely something that’s going to be a part of the web, hopefuly even wider than RSS… but in the meantime, the only trustworthy way of adding semantic value to content we’re publishing, is by giving a lil’ push, by sprinkling these classnames and other attributes around. If they need widgets to help them add content, so be it… it’s not a too high price to pay, is it?

Of course users won’t give a rat’s ass about what an hCard is… but if they can see their needs met, like adding a show straight from the webpage of a tv schedule into their mobile phones, and they understand how to do it easily (that’s the hard part, really), I actually believe they can be positively affected by all this.

Now… microformats are not, nor do they claim to be, a solution to every problem. They’re helping increasing both semantics and also awareness… if people start there and evolve to rdf or rdfa, so much the better. But it’s certainly better than pure html, or waiting for W3C to agree on a new spec and getting it implemented in browsers, is it not?

I didn’t think microformats were trying to solve every problem, nor do I think they’re a completely bad idea, in essence.

I’m just worried that we might be doing something that we’ll have to be taking apart a couple of years from now and starting from scratch again.

What if that desconstructing is/will be automatic? Since the metadata is recognizable by agents, we can, and it’s already being done, convert it to any format you want. It might event be format that will emerge in the future.
(example: hcard+xfn+xfolk/atom = foaf! )

It surely beats extracting this knowledge from pure html… 🙂 no?

I agree with Pedro at some point, HTML was not made having microformats in mind. That’s why microformats don’t fit so well. Maybe in the future the HTML evolution will bring uF inside, or just provide a better solution than uF or RDF.

Until then, uFs are pretty cool 🙂 And can be used right now!

The issue is, i believe, how can we add semantics to the Web. The truth is, there are several ways to do this. In my point of view, from the hackiest to the cleanest solution:

microformats — subverts the semantics of html/xhtml
rdf — doesn’t really subvert html/xhtml, but mixes document-semantics (provided by html/xhtml) with domain-semantics (whatever you document is about)
owl — provides only domain-semantics. html/xhtml keeps being used only for document-semantics (more complex than the previous options, but my personal favourite)

Filipe, funny… that’s the order of complexity as well. ufs being the simplest while owl is the more complicated.And we need to learn how to crawl before we learn how to jump.

Hmmm… Why do you say uf’s subvert the semantics of html/xhtml? They don’t remove any of the semantics provided by those specs. It adds on top of them… no?

Oh one more thing… this doesn’t have to be an A OR B war… we can have both microformats and rdf(a) in the same page. Provided your document is an xhtml. 😉

As for OWL, I studied it at college and even though its power was obvious, it’s quite a different concept than the web we have today. People have to evolve, and so do the tools. An hypertext browser probably isn’t the right one to navigate through OWL… is it?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.