Baesens Bart

Profit Driven Business Analytics


Скачать книгу

of data and – to our liking – accurately characterizes data as raw material. Data are to be seen as an input or basic resource needing further processing before actually being of use. In a subsequent section in this chapter, we introduce the analytics process model that describes the iterative chain of processing steps involved in turning data into information or decisions, which is quite similar actually to an oil refinery process. Note the subtle but significant difference between the words data and information in the sentence above. Whereas data fundamentally can be defined to be a sequence of zeroes and ones, information essentially is the same but implies in addition a certain utility or value to the end user or recipient. So, whether data are information depends on whether the data have utility to the recipient. Typically, for raw data to be information, the data first need to be processed, aggregated, summarized, and compared. In summary, data typically need to be analyzed, and insight, understanding, or knowledge should be added for data to become useful.

      Applying basic operations on a dataset may already provide useful insight and support the end user or recipient in decision making. These basic operations mainly involve selection and aggregation. Both selection and aggregation may be performed in many ways, leading to a plentitude of indicators or statistics that can be distilled from raw data. The following illustration elaborates a number of sales indicators in a retail setting.

      Providing insight by customized reporting is exactly what the field of business intelligence (BI) is about. Typically, visualizations are also adopted to represent indicators and their evolution in time, in easy-to-interpret ways. Visualizations provide support by facilitating the user's ability to acquire understanding and insight in the blink of an eye. Personalized dashboards, for instance, are widely adopted in the industry and are very popular with managers to monitor and keep track of business performance. A formal definition of business intelligence is provided by Gartner (http://www.gartner.com/it-glossary):

      Example

      For managerial purposes, a retailer requires the development of real-time sales reports. Such a report may include a wide variety of indicators that summarize raw sales data. Raw sales data, in fact, concern transactional data that can be extracted from the online transaction processing (OLTP) system that is operated by the retailer. Some example indicators and the required selection and aggregation operations for calculating these statistics are:

      ◼ Total amount of revenues generated over the last 24 hours: Select all transactions over the last 24 hours and sum the paid amounts, with paid meaning the price net of promotional offers.

      ◼ Average paid amount in online store over the last seven days: Select all online transactions over the last seven days and calculate the average paid amount;

      ◼ Fraction of returning customers within one month: Select all transactions over the last month and select customer IDs that appear more than once; count the number of IDs.

      Remark that calculating these indicators involves basic selection operations on characteristics or dimensions of transactions stored in the database, as well as basic aggregation operations such as sum, count, and average, among others.

      Business intelligence is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance.

Note that this definition explicitly mentions the required infrastructure and best practices as an essential component of BI, which is typically also provided as part of the package or solution offered by BI vendors and consultants. More advanced analysis of data may further support users and optimize decision making. This is exactly where analytics comes into play. Analytics is a catch-all term covering a wide variety of what are essentially data-processing techniques. In its broadest sense, analytics strongly overlaps with data science, statistics, and related fields such as artificial intelligence (AI) and machine learning. Analytics, to us, is a toolbox containing a variety of instruments and methodologies allowing users to analyze data for a diverse range of well-specified purposes. Table 1.1 identifies a number of categories of analytical tools that cover diverse intended uses or, in other words, allow users to complete a diverse range of tasks.

Table 1.1 Categories of Analytics from a Task-Oriented Perspective

A first main group of tasks identified in Table 1.1 concerns prediction. Based on observed variables, the aim is to accurately estimate or predict an unobserved value. The applicable subtype of predictive analytics depends on the type of target variable, which we intend to model as a function of a set of predictor variables. When the target variable is categorical in nature, meaning the variable can only take a limited number of possible values (e.g., churner or not, fraudster or not, defaulter or not), then we have a classification problem. When the task concerns the estimation of a continuous target variable (e.g., sales amount, customer lifetime value, credit loss), which can take any value over a certain range of possible values, we are dealing with regression. Survival analysis and forecasting explicitly account for the time dimension by either predicting the timing of events (e.g., churn, fraud, default) or the evolution of a target variable in time (e.g., churn rates, fraud rates, default rates). Table 1.2 provides simplified example datasets and analytical models for each type of predictive analytics for illustrative purposes.

Table 1.2 Example Datasets and Predictive Analytical Models

The second main group of analytics comprises descriptive analytics that, rather than predicting a target variable, aim at identifying specific types of patterns. Clustering or segmentation aims at grouping entities (e.g., customers, transactions, employees, etc.) that are similar in nature. The objective of association analysis is to find groups of events that frequently co-occur and therefore appear to be associated. The basic observations that are being analyzed in this problem setting consist of variable groups of events; for instance, transactions involving various products that are being bought by a customer at a certain moment in time. The aim of sequence analysis is similar to association analysis but concerns the detection of events that frequently occur sequentially, rather than simultaneously as in association analysis. As such, sequence analysis explicitly accounts for the time dimension. Table 1.3 provides simplified examples of datasets and analytical models for each type of descriptive analytics.

Table 1.3 Example Datasets and Descriptive Analytical Models

      Note that Tables 1.1 through 1.3 identify and illustrate categories of approaches that are able to complete a specific task from a technical rather than an applied perspective. These different types of analytics can be applied in quite diverse business and nonbusiness settings and consequently lead to many specialized applications. For instance, predictive analytics and, more specifically, classification techniques may be applied for detecting fraudulent credit-card transactions, for predicting customer churn, for assessing loan applications, and so forth. From an application perspective, this leads to various groups of analytics such as, respectively, fraud analytics, customer or marketing analytics, and credit risk analytics. A wide range of business applications of analytics across industries and business departments is discussed in detail in Chapter 3.

With respect to Table 1.1, it needs to be noted that these different types of analytics apply to structured data. An example of a structured dataset is shown in Table 1.4. The rows in such a dataset are typically called observations, instances, records, or lines, and represent or collect information