Back in 2001, Sir Tim Berners-Lee, founder of the modern WWW, envisioned a future where the world wide web is not just a linked set of human understandable documents, but a web of machine understandable documents. He called this new web the Semantic web. Now, nearly ten years later, we are finally starting to see some realization of his vision, mostly through advances and adoption of new semantic search capabilities of Google, Bing, Yahoo search monkey, and Wolfram Alpha. Now is the time for companies to start seeing their content as data and their catalogs as semantic information that can become part of this new web.
Markup your Content!
How can you get started in the semantic web? The first would be to mark up your content as structured semantic data. This way, when search agents and other machines crawl your site, you can explicitly tell them that this is a product, with a price, and a category, that exists on your page. This is opposed to the old web 1.0 and 2.0 model where you would rely on Google indexing in the wild to pick up on keywords and other meta descriptions.
Companies have gone to great lengths to implement content management systems that separate content from code with well defined content types, only to republish back into the web as unstructured data. Now is the time to identify products, people, events, reviews, and videos on your site, and mark it up with RDFa or Microformats. This will allow these new semantic search engines to be able to handle your results differently and group them accordingly to help people get to what they are searching for.
Publish your catalog
The Modern Linked Data movement is the realization of Tim Berners-Lee’s vision. This movement consists of the identification and assignment of web IDs of data on the web. These uniquely identified sets of data can then be joined with other public datasets to combine web knowledge. In the area of eCommerce, Good Relations is an ontology that is gaining momentum of retailers publishing their catalogs as RDF. Best Buy has begun publishing their 30,000+ product catalog in Good Relations RDF. Companies that are looking for other avenues of traffic to their site should consider publishing their products in RDF so that others can integrate their products, mash-up style into their applications.
Enhance your metadata
Ok, so now you are committed to publishing RDFa inside your current web site. How do you get started? How do you go about identifying all the types of entities in your content? This is where OpenCalais can help. OpenCalais is a free (with terms) web service offered by Thompson-Reuters that allows you to post content and receive (in RDF) a response containing all of the identifiable entities and an associated confidence rating, within the page. For example, if I were to post this blog article I would get back entities representing companies (Best Buy, Google, Thompson-Reuters) , People (Tim Berners Lee), as well as many industry terms and social tags that I could use to integrate into my page. If you have a content management system, Open Calais is an excellent service to integrate into a publishing workflow to ensure you have proper metadata upon publishing.
The semantic web has been long discussed but is finally at a point where companies are starting to realize the benefits via search engine optimization and enhanced metadata. Of course, there are unlimited opportunities for SemWeb benefits when you start to look at internal datasources and leveraging public data sets on the web. In the short run, RDFa is an excellent way to begin Semantic adoption within your company and start to join in the community of LinkedData. Now is the time to deliver meaningful information to agents while still providing a rich user experience.