Dean Allemang

Semantic Web for the Working Ontologist


Скачать книгу

refer not to the historical Shakespeare but to the title character of the feature film Shakespeare in Love, which bears very little resemblance to the historical figure. And “Shakespeare” is one of the more stable concepts to appear on the Web; consider the range of referents for a name like “Washington” or “Bordeaux.” To merge graphs in a Semantic Web setting, we have to be more specific: In what sense do we mean the word Shakespeare?

Subject Predicate Object
Scotland part Of The UK
England part Of The UK
Wales part Of The UK
Northern Ireland part Of The UK
Channel Islands part Of The UK
Isle of Man part Of The UK
Subject Predicate Object
Shakespeare wrote As You Like It
Shakespeare wrote Henry V
Shakespeare wrote Love’s Labour’s Lost
Shakespeare wrote Measure for Measure
Shakespeare wrote Twelfth Night
Shakespeare wrote The Winter’s Tale
Shakespeare wrote Hamlet
Shakespeare wrote Othello

      RDF borrows its solution to this problem from foundational Web technology—in particular, the URI. The basic syntax and format of a URI are familiar even to casual users of the Web today because of the special, but typical, case of the URL—for example, http://www.workingontologist.org/Examples/Chapter3/Shakespeare#Shakespeare. But the significance of the URI as a global identifier for a Web resource is often not appreciated. A URI provides a global identification for a resource that is common across the Web. This is not a stipulation that is particular to the Semantic Web but to the Web in general; global naming leads to global network effects. Of course, in the jungle that is the Web, we can’t expect that every data source that refers to Shakespeare will use the same URI. In Chapter 5 we will explore when and why we might use different URIs for the same individual, and what capabilities the Semantic Web provides to manage them.

Image

      Figure 3.5 Graphic representation of triples describing (a) Shakespeare’s plays and (b) parts of the UK.

Image

      Figure 3.6 Combined graph of all triples about Shakespeare and the UK.

      URIs and URLs look exactly the same, and, in fact, a URL is just a special case of the URI. Why does the Web have both of these ideas? Simplifying somewhat, the URI is an identifier with global (i.e., “World Wide” in the “World Wide Web” sense) scope. Any two Web applications in the world can refer to the same thing by referencing the same URI. But the syntax of the URI makes it possible to “dereference” it—that is, to use all the information in the URI (which specifies things like server name, protocol, port number, file name, etc.) to locate a file (or a location in a file) on the Web1. This dereferencing succeeds if all these parts work; the protocol locates the specified server running on the specified port and so on. When this is the case, we can say that the URI is not just a URI, but an effective HTTP URI. From the point of view of modeling, the distinction is not important. But from the point of view of having a model on the Semantic Web, the fact that a URI can potentially be dereferenced allows the models to participate in a global Web infrastructure as we will see in Chapter 5.

      The URI can be generalized further as an Internationalized Resource Identifier, or IRI. The IRI is a generalization of the URI that uses all the character representations for languages on the Web, so an IRI can include characters with accents or indeed characters from any language that has a standard web encoding.

      RDF applies the notion of the URI to resolve the identity problem in graph merging. The application is quite simple: A node from one graph is merged with a node from another graph exactly if they have the same URI. On the one hand, this may seem disingenuous, “solving” the problem of node identity by relying on another standard to solve it. On the other hand, since issues of identity appear in the Web in general and not just in the Semantic Web, it would be foolish not to use the same strategy to resolve the issue in both cases.

       Expressing URIs in print

      URIs work very well for expressing identity on the World Wide Web, but they are typically a bit of a pain to write out in detail when expressing models, especially in print. So for the examples in this book, we use a simplified version of a URI abbreviation scheme called CURIEs (standing for Compact URI). In its simplest form, a URI expressed as a CURIE has two parts: a namespace and an identifier, written with a colon between. So the CURIE representation for the identifier England in the namespace geo is simply geo:England. The RDF standard syntaxes include elaborate rules that allow programmers to map namespaces to other URI representations (such as the familiar http:// notation). For the examples in this book, we will use the simple CURIE form for all URIs. It is important, however, to note that CURIEs are not global identifiers on the Web; only fully qualified URIs (for example, http://www.WorkingOntologist.org/Examples/Chapter3/Shakespeare#Shakespeare) are global Web names. Thus, any representation of a CURIE must, in principle, be accompanied by a declaration of the namespace correspondence.

      It is customary on the Web in general to insist that URIs contain no embedded spaces. For example, an identifier “part of” is typically not used in the Web. Instead, we follow the InterCap convention (sometimes called CamelCase), whereby names that are made up of multiple words are transformed into identifiers without spaces by capitalizing each word. Thus, “part of” becomes partOf, “Great Britain” becomes GreatBritain, “Measure for Measure” becomes MeasureForMeasure, and so on.

      There is no limitation on the use of multiple namespaces in a single source of data, or even in a single triple. Selection of namespaces is entirely unrestricted as far as the data model and standards are concerned. It is common practice, however, to refer to related identifiers in a single namespace. For instance, all of the literary or geographical information