Скачать книгу

cell by cell. Each machine is responsible for some number of cells in the table. This system combines the flexibility of both of the previous strategies. Two servers can share the description of a single entity (in the figure, the year and title of Hamlet are stored separately), and they can share the use of a particular property (in Figure 3.3, the Medium of rows 6 and 7 are represented on different servers).

Image

      Figure 3.2 Distributing data across the Web, column by column.

Image

      Figure 3.3 Distributing data across the Web, cell by cell.

      This flexibility is required if we want our data distribution system to really support the AAA slogan that “Anyone can say Anything about Any topic.” If we take the AAA slogan seriously, any server needs to be able to make a statement about any entity (as is the case in Figure 3.2), but also any server needs to be able to specify any property of an entity (as is the case in Figure 3.1). The solution in Figure 3.3 has both of these benefits.

Subject Predicate Object
Row 7 Medium Poem
Row 2 Title Hamlet
Row 2 Year 1604
Row 4 Author Shakespeare
Row 6 Medium Play
Subject Predicate Object
Shakespeare wrote King Lear
Shakespeare wrote Macbeth
Anne Hathaway married Shakespeare
Shakespeare livedIn Stratford
Stratford isIn England
Macbeth setIn Scotland
England partOf UK
Scotland partOf UK

      But this solution also combines the costs of the other two strategies. Not only do we now need a global reference for the column headings, but we also need a global reference for the rows. In fact, each cell has to be represented with three values: a global reference for the row, a global reference for the column, and the value in the cell itself. This third strategy is the strategy taken by RDF. We will see how RDF resolves the issue of global reference later in this chapter, but for now, we will focus on how a table cell is represented and managed in RDF.

      Since a cell is represented with three values, the basic building block for RDF is called the triple. The identifier for the row is called the subject of the triple (following the notion from elementary grammar, since the subject is the thing that a statement is about). The identifier for the column is called the predicate of the triple (since columns specify properties of the entities in the rows). The value in the cell is called the object of the triple. Table 3.2 shows the triples in Figure 3.3 as subject, predicate, and object.

      Triples become more interesting when more than one triple refers to the same entity, such as in Table 3.3. When more than one triple refers to the same thing, sometimes it is convenient to view the triples as a directed graph in which each triple is an edge from its subject to its object, with the predicate as the label on the edge, as shown in Figure 3.4. The graph visualization in Figure 3.4 expresses the same information presented in Table 3.3, but everything we know about Shakespeare (either as subject or object) is displayed at a single node.

Image

      Figure 3.4 Graph display of triples from Table 3.3. Eight triples appear as eight labeled edges.

      We started off describing RDF as a way to distribute data over several sources. But when we want to use that data, we will need to merge those sources back together again. One value of the triples representation is the ease with which this kind of merger can be accomplished. Since information is represented simply as triples, merged information from two graphs is as simple as forming the graph of all of the triples from each individual graph, taken together. Let’s see how this is accomplished in RDF.

      Suppose that we had another source of information that was relevant to our example from Table 3.3—that is, a list of plays that Shakespeare wrote or a list of parts of the United Kingdom (UK). These would be represented as triples as in Tables 3.4 and 3.5. Each of these can also be shown as a graph, just as in the original table, as shown in Figure 3.5.

      What happens when we merge together the information from these three sources? We simply get the graph of all the triples that show up in Figures 3.4 and 3.5. Merging graphs like those in Figures 3.4 and 3.5 to create a combined graph like the one shown in Figure 3.6 is a straightforward process—but only when it is known which nodes in each of the source graphs match.

      The essence of the merge process comes down to answering the question “When is a node in one graph the same node as a node in another graph?” In RDF, this issue is resolved through the use of Uniform Resource Identifiers (URIs).

      In the figures so far, we have labeled the nodes and edges in the graphs with simple names like Shakespeare or Wales. On the Semantic Web, this is not sufficient information to determine whether two nodes are really the same. Why not? Isn’t there just one thing in the universe that everyone agrees refers to as Shakespeare? When referring to agreement