What is Semantic Web?
The Semantic Web can be defined as extension of the World Wide Web that enables people to share content beyond limits of applications and websites. It can also be defined as collaborative movement, led by the international standards body – the World Wide Web Consortium (W3C). Semantic Web converts current web, dominated by unstructured and semi-structured documents, into a “web of data” by encouraging the inclusion of semantic content in web pages.
The concept of the Semantic Network Model was formed in the early sixties as a form to represent semantically structured knowledge. It extends the network of hyperlinked human-readable web pages by inserting machine-readable metadata about them and how they are related to each other, enabling automated agents to access the Web more intelligently and perform tasks on behalf of users.
The main purpose of the Semantic Web is driving the evolution of the current Web by enabling users to find, share, and combine information more easily. The Semantic Web, as originally envisioned, is a system that enables machines to “understand” and respond to complex human requests based on their meaning. Such an “understanding” requires semantically structured relevant information sources.
The ultimate goal of the Web of data is to enable computers to do more efficient work in order to develop systems that can support trusted interactions over the network. Semantic Web technologies enable people to create data stores on the Web, build vocabularies, and write rules for handling data. Linked data are empowered by technologies such as RDF, SPARQL, OWL, and SKOS.
History of Semantic Web
The term “Semantic Web” was coined by Tim Berners-Lee, the inventor of the World Wide Web and director of the W3C, for a web of data that can be processed by machines.
Web 2.0. refers to a supposed second generation of Internet-based services — such as social networking sites, wikis, communication tools, and folksonomies — that emphasize online collaboration and sharing among users.
Web 3.0. refers to a supposed third generation of Internet-based services that collectively comprise what might be called ‘the intelligent Web’ — such as those using semantic web, microformats, natural language search, data-mining, machine learning, recommendation agents, and artificial intelligence technologies — which emphasize machine-facilitated understanding of information in order to provide a more productive and intuitive user experience.
Structure of Semantic Web
While the specific nature of Web 3.0 technologies are difficult to define precisely, the outline of emerging applications has become clear over the past year. Key enablers are a maturing infrastructure for integrating Web data resources and the increased use of and support for the languages developed in the W3C Semantic Web Activity. As Figure 1 shows, the application of these technologies, integrated with the Web frameworks that power the better-known Web 2.0 applications, is generally becoming the accepted definition of the Web 3.0 generation. The base of Web 3.0 applications resides in the Resource Description Framework (RDF) for providing a means to link data from multiple websites or databases. With the SPARQL query language, a SQL-like standard for querying RDF data, applications can use native graph-based RDF stores and extract RDF data from traditional databases. Once the data is in RDF form, the use of uniform resource identifiers (URIs) for merging and mapping data from different resources facilitates development of multisite mash-ups. RDF Schema (RDFS) and the Web Ontology Language (OWL) provide the ability to infer relationships between data in different applications or in different parts of the same application. These Semantic Web languages allow for the assertion of relationships between data elements, which developers can use, via custom code or an emerging toolset, to enhance the URI-based direct merging of data into a single RDF store. In RDF, if we can recognize two data elements with the same URI, then we can join them in a merged graph.
Figure 1. Web 3.0 extends current Web 2.0 applications using Semantic Web technologies and graph-based, open data.
Web 3.0 might be defined as a third-generation of the Web enabled by the convergence of several key emerging technology trends:
- Broadband adoption
- Mobile Internet access
- Mobile devices
- Software-as-a-service business models
- Web services interoperability
- Distributed computing (P2P, grid computing, hosted “cloud computing” server farms such as Amazon S3)
- Open APIs and protocols
- Open data formats
- Open-source software platforms
- Open data (Creative Commons, Open Data License, etc.)
- Open identity (OpenID)
- Open reputation
- Portable identity and personal data (for example, the ability to port your user account and search history from one service to another)
The Intelligent Web
- Semantic Web technologies (RDF, OWL, SWRL, SPARQL, Semantic application platforms, and statement-based datastores such as triplestores, tuplestores and associative databases)
- Distributed databases — or what I call “The World Wide Database” (wide-area distributed database interoperability enabled by Semantic Web technologies)
- Intelligent applications (natural language processing, machine learning, machine reasoning, autonomous agents)
Currently, the World Wide Web is based mainly on documents written in Hypertext Markup Language (HTML), a markup convention that is used for coding a body of text interspersed with multimedia objects such as images and interactive forms. Metadata tags provide a method by which computers can categorise the content of web pages, for example:
<meta name=”keywords” content=”computing, computer studies, computer” />
<meta name=”description” content=”Cheap widgets for sale” />
<meta name=”author” content=”John Doe” />
Semantic HTML refers to the traditional HTML practice of markup following intention, rather than specifying layout details directly. For example, the use of <em> denoting “emphasis” rather than ”, which specifies italics. Layout details are left up to the browser, in combination with Cascading Style Sheets (style sheet language used for describing the presentation semantics (the look and formatting) of a document written in a markup language). But this practice falls short of specifying the semantics of objects such as items for sale or prices.
Microformats extend HTML syntax to create machine-readable semantic markup about objects including people, organisations, events and products. Similar initiatives include RDFa, Microdata and Schema.org.
The Semantic Web takes the solution further. It involves publishing in languages specifically designed for data: Resource Description Framework (RDF), Web Ontology Language (OWL), and Extensible Markup Language (XML). HTML describes documents and the links between them. RDF, OWL, and XML, by contrast, can describe arbitrary things such as people, meetings, or airplane parts.
These technologies are combined in order to provide descriptions that supplement or replace the content of Web documents. Thus, content may manifest itself as descriptive data stored in Web-accessible databases, or as markup within documents (particularly, in Extensible HTML (XHTML) interspersed with XML, or, more often, purely in XML, with layout or rendering cues stored separately). The machine-readable descriptions enable content managers to add meaning to the content, i.e., to describe the structure of the knowledge we have about that content. In this way, a machine can process knowledge itself, instead of text, using processes similar to human deductive reasoning and inference, thereby obtaining more meaningful results and helping computers to perform automated information gathering and research.
An example of a tag that would be used in a non-semantic web page:
Encoding similar information in a semantic web page might look like this:
Tim Berners-Lee calls the resulting network of Linked Data the Giant Global Graph, in contrast to the HTML-based World Wide Web. Berners-Lee posits that if the past was document sharing, the future is data sharing. His answer to the question of “how” provides three points of instruction. One, a URL should point to the data. Two, anyone accessing the URL should get data back. Three, relationships in the data should point to additional URLs with data.
The term “Semantic Web” is often used more specifically to refer to the formats and technologies that enable it. The collection, structuring and recovery of linked data are enabled by technologies that provide a formal description of concepts, terms, and relationships within a given knowledge domain. These technologies are specified as W3C standards and include:
- Resource Description Framework (RDF), a general method for describing information
- RDF Schema (RDFS)
- Simple Knowledge Organization System (SKOS)
- SPARQL, an RDF query language
- Notation3 (N3), designed with human-readability in mind
- N-Triples, a format for storing and transmitting data
- Turtle (Terse RDF Triple Language)
- Web Ontology Language (OWL), a family of knowledge representation languages
- Rule Interchange Format (RIF), a framework of web rule language dialects supporting rule interchange on the Web
The Semantic Web Stack illustrates the architecture of the Semantic Web. The functions and relationships of the components can be summarized as follows:
- XML provides an elemental syntax for content structure within documents, yet associates no semantics with the meaning of the content contained within. XML is not at present a necessary component of Semantic Web technologies in most cases, as alternative syntaxes exists, such as Turtle. Turtle is a de facto standard, but has not been through a formal standardization process.
- XML Schema is a language for providing and restricting the structure and content of elements contained within XML documents.
- RDF is a simple language for expressing data models, which refer to objects (“web resources”) and their relationships. An RDF-based model can be represented in a variety of syntaxes, e.g., RDF/XML, N3, Turtle, and RDFa. RDF is a fundamental standard of the Semantic Web.
- RDF Schema extends RDF and is a vocabulary for describing properties and classes of RDF-based resources, with semantics for generalized-hierarchies of such properties and classes.
- OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. “exactly one”), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes.
- SPARQL is a protocol and query language for semantic web data sources.
- RIF is the W3C Rule Interchange Format. It’s an XML language for expressing Web rules which computers can execute. RIF provides multiple versions, called dialects. It includes a RIF Basic Logic Dialect (RIF-BLD) and RIF Production Rules Dialect (RIF PRD).
While its critics have questioned its feasibility, proponents argue that applications in industry, biology and human sciences research have already proven the validity of the original concept. Scholars have explored the social potential of the semantic web in the business and health sectors, and for social networking. Most of all, the Semantic Web has inspired and engaged many people in creating innovative semantic technologies and applications. semanticweb.org is the common platform for this community.
Jim Hendler, 2009. Web 3.0 Emerging. IEEE Computer Society 0018-9162/09. Available at: http://alumnos.elo.utfsm.cl/~egarcia/memoria/files/publicaciones/Junio/web%203.0%20emerging.pdf