A Semantic Web Overview for the Web-Literate

The “What”

Like the Worldwide Web, the Semantic Web is a model for how we use the Internet to share and consume information & services. It isn’t a single project or company or technology. Where our current Worldwide Web is all about humans reading text, the Semantic Web is about software reading data. The buzz-phrase “machine-readable” would cause much less confusion if instead it were “software-readable”… which is more accurate anyway.

For an example, if you’re looking at the “About” page on any given website, you the user can personally read and interpret contact information on that page, but to your browser, and your PC, that contact information is just a string of text, no different than the strings of text in the “Privacy Policy” page. Nothing about its HTML formatting identifies it as being information about a person, with an address and phone number and email address. So your browser, or any other software reading that page, cannot treat it any differently than as a blob of text.

A Semantic Web approach would be to format contact information in a standard, XML-based “microformat” specifically designed to contain contact information. The stylesheet for the page instructs your browser on how to format this data for a web-browsing experience, but the content itself would also be available—and legible—to any other application which knows the “Contact” microformat. Your could point your desktop address book application to the URL and let it scan the page for valid contact information. The program could find the contacts, ignoring everything else, and offer to update your address book by adding each of the contacts it found.

The “contact” example is popular because it is accessible, but other usefule formats exist, and others will emerge.

The “Who”

The Semantic Web isn’t a single project being conducted by a single entity (neither was the Worldwide Web). There are a few pioneers who espouse and employ Semantic Web techniques [Tim Berners-Lee], and there a handful of companies/projects which could be considered “Semantic Web plays” [Examples to follow].

The “When”

We’ve already witnessed first major success in the Semantic Web movement: RSS.

RSS is an open standard format for syndicating news stories, where multiple applications are able to read, interpret, and act on any RSS document on the Web. RSS newsreaders can be web-based or client based, and applications can use any piece of any RSS document to accomplish its purpose, which may well extend beyond simply displaying it for users’ consumption.

The “How”

We’ve already begun using display technology which will be necessary to integrate the web of today with the SM: Cascading Stylesheets which decouple web data from its display instructions.

In “original” HTML, the content itself was encapsulated within the markup information that described how the content should be displayed. For example, on a page listing a number of products, all the product names might be contained within table cells, bolded, and slightly larger than the other text on the page. The browser didn’t need to know that it was the name of the product, it only needed to know how to display it, because only a human user would be reading it.

Today, instead of surrounding content with display queues, we describe it within our well-formated XHTML content. We might have an internal structure for “product”, which would have properties like “name”, “description”, and “price”. Then, we apply stylesheets which tell browsers to treat all product names in one way, prices in another way, etc. It’s more convenient for developers because all product formatting can be changed in one place, with one change, rather than having to update it in multiple places.

This recent modernization has not been more about design control and simplicity/organization of code, but it does get us closer to the Semantic Web. In the Semantic Web model, rather than having proprietary formats for describing our content (each site with own structure for “product” data), we would apply standard formats (“microformats”). The stylesheet-based display instructions would be moved to a separate document. This is often done today, but in many cases at least some (or maybe all) stylesheet code is contained within the HTML document (though not usually interspersed with the content, as in the old days).

So today, in terms of readiness, we’re about half-way between the original Web and the Semantic Web.

The “Why”

I’ll save the Why for another blog post. The promise and potential of the Semantic Web, like our original WWW, is enormous. And like the original Worldwide Web, for good and bad, the reality will diverge sharply and dramatically from the academic vision. Much fodder for further commentary and discussion!


Tags: ,

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: