Gregory Grefenstette

Search-Based Applications


Скачать книгу

SQL Structured Query Language, commonly used language for manipulating relational databases Structured data Data organized according to an explicit schema and broken down into discrete units of meaning, with units represented using consistent data types and formats (databases, log files, spreadsheets) SVM Support vector machine, used in classification Table Part of a relational database, a body of related information. Each row of the table corresponds to one entity, and each column, to some attribute of this entity Taxonomy A hierarchically typed system of entities, such as mammals being part of animals being part of living beings TCO Total cost of ownership, how much an application costs when all implicit and explicit costs are factored in over time Timestamp A chronological value indicating when some data was created Top-k The k highest ranked responses in a database system that can rank answers to a query Transaction In databases, a sequence of actions that should be performed as an uninterruptable unit, for example, purchasing a seat on a flight Unstructured data Data that is not formally or consistently organized, such as textual data (email, reports, documents) and multimedia content URL Universal Resource Locator, the address of a web page Usability The desirable quality of being able to be used by a large population of users with little or no training Vertical application An application built for a specific domain, such as pharmaceuticals, finance, or manufacturing. A horizontal application could be used in a number of different domains. XML eXtended Markup Language, a standard for including metadata in a document W3C World Wide Web Consortium WYSIWYG What You See Is What You Get YPG Yellow Pages Group, Canada

      CHAPTER 1

       Search Based Applications

       1.1 INTRODUCTION

image

      Figure 1.1: Can you see the search engine behind these screens?

      Management of information via computers is undergoing a revolutionary change as the frontier between databases and search engines is disappearing. Against this backdrop of nascent convergence, a new class of software has emerged that combines the advantages of each technology, right now, in Search Based Applications.

      Until just a short while ago, the lines were still relatively clear. Database software concentrated on creating, storing, maintaining and accessing structured data, where discrete units of information (e.g. product number, quantity available, quantity sold, date) and their relation to each other were well defined. Search engines were primarily concerned with locating a document or a bit of information within collections of unstructured textual data: short abstracts, long reports, newspaper articles, email, Web pages, etc. (classic Information Retrieval, or IR; see Chap. 3).

      Business applications were built on top of databases, which defined the universe of information available to the end user, and search engines were used for IR on the Web and in the enterprise.

image

      Figure 1.2: Databases have traditionally been concerned with the world of structured data; search engines with that of unstructured data (some of these data types, like HTML pages and email messages, contain a certain level of exploitable structure, and are consequently sometimes referred to as "semi-structured").

      Such neat distinctions are now falling away as the core architectures, functionality and roles of search engines and databases have begun to evolve and converge. A new generation of non-relational databases, which shares conceptual models and structures with search engines, has emerged from the world of the Web (see Chapter 4), and a new breed of search engine has arisen which provides native functionality akin to both relational and non-relational databases (described in Chapters 3-9 and listed in Chapter 10).

      It is this new generation engine that supports Search Based Applications, which offer precise, multi-axial information access and analysis that is virtually indistinguishable at a surface level from database applications, yet are endowed with the usability and massive scalability of Web search.

       Definition: Search Based Application

      A software application that uses a search engine as the primary information access backbone, and whose main purpose is performing a domain-oriented task rather than locating a document. Examples:

      Customer service and support

      Logistical track and trace Contextual advertising

      Decision intelligence

      e-Discovery

      SBAs may be used to provide more intuitive, meaningful and scalable access to the content in a single database, hiding away the complexity of the database structure as data is extracted and re-purposed by search engine techniques. They may also be used to autonomously and intelligently gather together massive volumes of unstructured and structured data from an unlimited number of sources (internal or external) and to make this aggregate data available in real time to a wide base of users for a broad range of purposes.

      "The elements that make search powerful are not necessarily the search box, but the ability to bring together multiple types of information quickly and understandably, in real time, and at massive scale. Databases have been the underpinning for most of the current generation of enterprise applications; search technologies may well be the software backbone of the future."