The semantic web as a term is over 15 years old now. How close are we to its realization? How much longer will it take? Justin Gilbreath, the managing director of 30 Digits, provides insight into the development of the semantic web and practical alternatives of Web Extraction available now.
One could say that in some ways, we are closer than ever before. Tools and standards have been developed like RDF, OWL, SPARQL, and many more which enable the possibility of its implementation. Only the few and brave have actually gone down this road. The problems are that the technology is still only mastered by a few. Indeed’s job trend analysis looks rather grim for this changing any time soon. Maybe engineers and programmers have a different take, but consider this question on stackoverflow Is semantic web a dead on arrival project? where these experts go to get answers. The major nail in the coffin though is that most content creators do not see the benefits of adding the extra effort to make their content machine readable. If that was not enough, consider what Richard Padley of Semantico discusses in his article (Triple bypass – What does the death of the semantic web mean for publishers?) about Yahoo, Google, and Microsoft announcing schema.org essentially providing a way for them to use content by some simple standards for labeling in HTML5 leaving the semantic web out in the cold.
Then where do these wonderful sites (MashUps) full of information from multiple sources come from? How do they get their data? What about companies that gather insights from aggregates of information across the web? In some cases, these benefits have come from the advent of functionalities that are the predecessors of the semantic web. The most beneficial and prevalent are RSS feeds and APIs.
One is still left asking the question about all the content on the Internet out there that is not accessible via RSS or an API. There is also issue around making sense of social media which has passed the semantic web in many ways as discussed in this article (Quora: Has Social Trumped the Semantic Web?) and many like it. That is where web data extraction comes in. Web Extraction, often called web scraping or web harvesting, is where a tool (often called a spider) crawls the sites in question and scrapes the precise content from the site in question delivering that data in a format that is unified and processable by another system.
There are many services out there that make the benefits from the semantic web possible before it actually exists. Most of them are meant for small scale companies grabbing data from a few pages on a few sites. Others provide the ability to gather decent amounts of data but are limited by more complex sites. When a company requires data for their business and need scale, reliability, and precision; the 30 Digits Web Extractor delivers this functionality to meet the most demanding requirements.
The use cases are continually growing. The 30 Digits Web Extractor provides its services and products to companies that cannot wait for the Semantic Web. It is used for Open Source Intelligence (OSINT) to inform companies of dangers in countries where they have people, plants, and resources worth millions. It supplies social web data to some of the leading companies providing brand management to leading brands in the hospitality industry. The data is crucial for other companies to gather market intelligence to compete in fast moving markets.
The Semantic Web has great promise for the future (should it arrive), but would it not be better to realize the benefits today?