build a community of students who take the data from this edition and make it their own; add new queries, new ideas and even new data, so that the examples in the book become a seed for a growing set of examples and data to inspire a new generation of Semantic Web students.
Acknowledgments
In the time between the second and third editions, there have been a number of industrial deployments of the Semantic Web stack, which have informed our treatment of the material. The adoption of the technology in industry is what drove, to a large extent, the motivation to release a third edition at all.
As we have updated the examples for QUDT, we’d like to acknowledge the help we received from Steve Ray to coordinate the second edition of QUDT with the examples in the book. Without his help, our examples would be out of date as soon as the book hit print. From Schema.org, we’d like to acknowledge Eric Franzon, who helped us to coordinate the motivation for Schema.org with the principles of Semantic Web and Linked Data that we describe in this book. We’d like to acknowledge the leadership at the Enterprise Data Management (EDM) Council for their assistance with the FIBO examples, and the leadership at the United Nations Food and Agriculture Organization (FAO) and Global Open Data for Agriculture and Nutrition (GODAN) for their work on AGROVOC.
All of the figures in the third edition were built using the open-source Cytoscape platform, using a plug-in for data.world. We are grateful to Bryon Jacob of data.world for all the work he put in to tailoring the Cytoscape connection to the needs of this book. We also want to thank data.world for hosting all the data and queries in the book, so that we can check that all the answers are correct.
We’d like to thank Tim Beers for copy editing the manuscript before delivering it to the publisher. It is impossible to copy edit your own writing, so having a fresh pair of eyes was invaluable. We also thank Michele Murray and Jacky Carley of RPI who provided crucial logistic and administrative support for Jim as he worked on this edition.
Finally, and most importantly, we’d like to thank all the students and readers who have encouraged us over the past decades. The project managers who encouraged their programmers to read the book, the readers who wrote to us pointing out errata, and everyone who has told us that they read and appreciated the previous books have encouraged us to put the effort into this third edition.
1 What Is the Semantic Web?
This book is about something we call the Semantic Web. From the name, you can probably guess that it is related somehow to the World Wide Web (WWW) and that it has something to do with semantics. Semantics, in turn, has to do with understanding the nature of meaning, but even the word semantics has a number of meanings. In what sense are we using the word semantics? And how can it be applied to the Web?
This book is for a working ontologist. An ontologist might do their work as part of an Enterprise Knowledge Graph, a data lake, global linked data, graph data, or any of a number of other technological approaches that share the idea that data is more powerful when it comes together in a meaningful way. The aim of this book is not to motivate or pitch the Semantic Web but to provide the tools necessary for working with it. Or, perhaps more accurately, the World Wide Web Consortium (W3C) has provided these tools in the forms of standard Semantic Web languages, complete with abstract syntax, model-based semantics, reference implementations, test cases, and so forth. But these are like any tools—there are some basic tools that are all you need to build many useful things, and there are specialized craftsman’s tools that can produce far more specialized outputs. Whichever tools are needed for a particular task, however, one still needs to understand how to use them. In the hands of someone with no knowledge, they can produce clumsy, ugly, barely functional output, but in the hands of a skilled craftsman, they can produce works of utility, beauty, and durability. It is our aim in this book to describe the craft of building Semantic Web systems. We go beyond only providing a coverage of the fundamental tools to also show how they can be used together to create semantic models, sometimes called ontologies or vocabularies, that are understandable, useful, durable, and perhaps even beautiful.
1.1 What Is a Web?
The Web architecture was built by standing on the shoulders of giants. Writing in The Atlantic magazine in 1945 [Bush and Wang 1945], Vannevar Bush identified the problems in managing large collections of documents and the links we make between them. Bush’s proposal was to consider this as a scientific problem, and among the ideas he proposed was the one of externalizing and automating the storage and management of association links we make in our readings. He also illustrated his ideas with an imaginary device he called the Memex (“memory extension”) that would assist us in studying, linking, and remembering the documents we work with and the association links we weave between them. Twenty years later, Ted Nelson quoted As We May Think and proposed using a computer to implement the idea, using hypertext and hypermedia structures to link parts of documents together. In the late sixties, Douglas Engelbart and the Augment project provided the mouse and new means of interaction and applied them in particular to hypertext editing and browsing. The beginning of the seventies brought the work of Vinton Cerf and the emergence of the Internet, which connected computers all around the world.
By the end of the eighties, Tim Berners-Lee was able to stand on the shoulders of these giants when he proposed a new breakthrough: an architecture for distributing hypermedia on the Internet, which we now know as the WWW. The Web provides a hypertext infrastructure that links documents across the Internet, that is, connecting documents that are not on the same machine. And so the Web was born. The Web architecture includes two important parts: Web clients, the most well known being the Web browser, and Web servers, which serve documents and data to the clients whenever they require it. For this architecture to work, there have to be three initial essential components. First, addresses that allow us to identify and locate the document on the Web; second, communication protocols that allow a client to connect to a server, send a request, and get an answer; and third, representation languages to describe the content of the pages, the documents that are to be transferred. These three components comprise a basic Web architecture as described in Jacobs and Walsh [2004], which the Semantic Web standards, which we will describe later in this book, extend in order to publish semantic data on the Web.
The idea of a web of information was once a technical idea accessible only to highly trained, elite information professionals: IT administrators, librarians, information architects, and the like. Since the widespread adoption of the WWW, it is now common to expect just about anyone to be familiar with the idea of a web of information that is shared around the world. Contributions to this web come from every source, and every topic you can think of is covered.
Essential to the notion of the Web is the idea of an open community: anyone can contribute their ideas to the whole, for anyone to see. It is this openness that has resulted in the astonishing comprehensiveness of topics covered by the Web. An information “web” is an organic entity that grows from the interests and energy of the communities that support it. As such, it is a hodgepodge of different analyses, presentations, and summaries of any topic that suits the fancy of anyone with the energy to publish a web page. Even as a hodgepodge, the Web is pretty useful. Anyone with the patience and savvy to dig through it can find support for just about any inquiry that interests them. But the Web often feels like it is a mile wide but an inch deep. How can we build a more integrated, consistent, deep Web experience?
1.2 Communicating with Data
Suppose you are thinking about heading to your favorite local restaurant, Copious, so you ask your automated personal assistant, “What are the hours for Copious?” Your assistant replies that it doesn’t have the hours for Copious. So you go to a web page, look them up, and find right there, next to the address and the daily special, the opening hours. How could the web master at Copious have told your assistant about what was on the web page? Then you wouldn’t just be able to find out the opening hours, but also the daily special.
Suppose you consult a web page, looking for a major national park, and you find a list of hotels that have branches in the vicinity of the park. You