The (Realistic) Future of the Semantic Web

I’ve written recently on the Semantic Web, mostly to explain what it is. There’s no shortage of articles—academic and otherwise—describing the possibilities and virtues of the Semantic Web. A very rosy picture is painted in those texts. I’ll tell you why they’re mostly wrong.

There are at least three big problems facing the development of a semantic web for the masses:

  • Financial disincentives to publishers
  • Standards challenges
  • Chicken-and-egg problems for apps & content

Publishers’ Disincentives

The Worldwide Web was envisioned as millions of text-based pages all linked together via hyperlinks. The vision was predicated on the idea that, rather than duplicate content, a publisher would simply link away to another person’s article. The problem was/is that publishers don’t make their money if you click away to another site (only if you click to an advertiser). So the spirit of the original vision for the Web was in conflict with the Web’s only realistic revenue model for publishers. Thus, the Web developed along different lines.

The same pitfall awaits the Semantic Web, only several-fold greater. A central tenet of Semantic Web is that web content should be discoverable, and so must be marked-up in such a way as to be legible—and therefore useful—to multiple applications, without human intervention. With an ad-supported model (and we’re still a long way away from reasonable subscription-supported web services) publishers must be sure that they maintain control over the display mechanism of their content so that they can deliver ads with their content. If all meaningful content is marked-up for easy ingestion then publishers lose their revenue streams. Any other publisher or service provider can very easily lift the content and repurpose it, index it, catalog it, analyze it, or do anything else with it. So publishers will specifically avoid marking-up their content in semantic web-friendly ways.

To see evidence of the underlying problem we can look at the world as it is today. Think of the existing Web as a dramatically simplified version of the Semantic Web… which it is. To Google, the Web is a collection of a resources with a single, simple content type (text) which is machine readable (to web crawlers, which index the text).

Google studiously avoids creation of content themselves. Their very mission—to organize the world’s information—contains the admission that the content doesn’t belong to them. And the conflict that exists today between Google and content publishers is an example, in microcosm, of what we can expect in a semantic web scenario. For their main consumer product, the Google search engine, they crawl the Web, discovering and indexing others’ content, and there are plenty of companies who resent the power Google has amassed on the back of their content; Google controls discoverability, targeted advertising on the way in, behavioral ads based on expressed interests (the DoubleClick deal is an antitrust problem that the FTC and EU really didn’t adequately explore), and that’s just from search. Other of Google’s apps are far more invasive into publishers’ content (e.g. Google News Search, Google Reader).

In a semantic web, publishers’ content can magically float off to whatever application is able to ingest it. No publishers will voluntarily markup their content to make it easier for others to repurpose. Content isn’t really king right now, and a semantic web demotes it still further.

Standards Challenges

The Semantic Web will require agreement on a myriad of different standards. With no 800 lb. gorilla to mandate standards, all the would-be standards creators, large and small, will fight it out and come up with multiple, conflicting, parallel standards. Note what’s happened with RSS. We have RSS and Atom, and there are still others. RSS-reading applications must support both.

Multiple standards create a barrier for application developers: parallel-support and upgrade headaches. And multiple standards create a disincentive to publishers: choosing one standard cuts out applications that operate on other standards. Problems on both sides of the equation

Chicken and Egg for Apps and Content

Lastly, and most obviously, if there’s no killer app there’s no incentive for publishers to markup their new or existing content to be semantic web-friendly. And if there’s no content, there’s no incentive for developers to spend energy creating apps for content that doesn’t exist.

This isn’t the highest hurdle—in fact I have a project idea in my pocket which I believe could overcome this and other challenges (!)—but for now this is a barrier to the organic growth of the Wemantic Web.

History Repeats

Like the Semantic Web, our current Worldwide Web had its root in idealized vision—and by the same guy who “invented” (envisioned, really), the Semantic Web: Tim Berners Lee. The Web we actually developed diverged dramatically from the original vision, I’d say mostly for the better. For the Semantic Web, we don’t yet know exactly what it will look like. It may well exceed the vision; or it could fail to achieve expectations (or never materialize). But I think we can at least be confident that it won’t look anything like the original, idealized vision that’s touted by Semantic Web purists today.


Tags: , , ,

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: