Tim Rey

Applied Data Mining for Forecasting Using SAS


Скачать книгу

3.1.) At the basis of data infrastructure design is the metadata (the data about the data) definition. The cost for maintenance and support of the internal data infrastructure depends on the internal cost structure derived by corporate IT.

      Usually, the data about potential economic drivers are not available internally and need to be delivered by external sources. Examples of such sources are the Bloomberg services5 with various types of financial data, such as equities, commodities, foreign exchange rates, and the Global Insight services6 with more than 30 million time series of different nature across the globe, such as prices, economic indicators, and labor costs. The external data are generally consistent, collected in a timely manner, and some have forecast values for a given forecasting horizon. The last feature is very beneficial in the case of using these data as inputs in the multivariate in X forecasting models.

      There are two options for delivering external data. The first one is based on accessing the necessary data by direct extracts from the key sources. The second option is based on building an internal database of the most frequently used external data. The advantage of the second approach is the synchronized update of all needed external data, fast search of the specific economic drivers, and more reliable maintenance of deployed models. However, this option requires allocating internal resources for the design and maintenance of the database and training of potential users.

      An example of integrating different external and internal data sets in a data set that is appropriate for data mining in forecasting is shown in Figure 3.2. It includes three external data sets (Bloomberg, Global Insight, and CMAI) and two internal data sets. The different data are integrated in the forecasting data set based on a selected starting time and time interval (month, quarter, or year). Those time series with different time intervals are appropriately expanded or contracted in a previous step as described in Chapter 6.

images

      The cost of maintaining and supporting the external data infrastructure depends on the subscription services cost, the cost of developing and maintaining an internal database, and the internal cost of corporate IT.

      The objective of this section is to give the reader possible ways to build an organizational infrastructure for data mining in forecasting in a business. We briefly discuss organizing model developers and forecasting users, selecting a proper work process, and integrating everything into the corporate IT environment.

      A key strategic business decision related to a forecasting organization is deciding how much to invest in people that can develop forecasting models. The type of the forecasting development effort and its size depend on the projected demand for forecasting projects in the organization. Other factors that have to be taken into account are as follows:

       the available internal personnel in corporate IT who can support forecasting models by managing the data, infrastructure, and operations

       the strategic commitment of key users for time and resources

       the available internal skills in the area of modeling, statistics, data mining, and forecasting

       the level of experience in applying forecasting projects

      Below we briefly discuss three ways to organize developers: (1) external consultant services, (2) distributed developers in organizations (key users of forecasting services), and (3) a centralized group of developers.

      External consultant services

      This is the minimum-investment solution for when you have low expected demand, no strategic commitment, and a lack of internal resources. The only allocated internal resources are for project management and interaction with the external consultants. However, even in this case, some basic training for forecasting and statistics is recommended. It is preferable to have a well-prepared test case when you begin the working relationship with the external consultants. (Some suggestions on how to prepare an effective test case are given by Michael Gilliland in his book The Business Forecasting Deal.) The key advantage of this solution is the minimum cost. The key disadvantage is the total dependence on external resources.

      Distributed developers

      This organizational structure is appropriate in small or medium-size businesses when the demand for forecasting services is concentrated in several key users, such as marketing and sales, supply chain, and purchasing. Often they prefer to own the whole model development and deployment process and hire experts with forecasting knowledge. In many cases they do not invest in the high-end hardware and software infrastructure, such as SAS Forecast Studio. The key advantage of this solution is the availability to implement forecasting capabilities with internal resources in appropriate business functions at an affordable cost. The key disadvantage is the limited capacity for growth.

      Centralized developers group

      The best-case scenario for applying data mining in forecasting in larger organizations is by building a centralized group of developers. The group must have the capacity to respond fast to the growing demand of forecasting projects from various sections of a large corporation. The skill set of the developers' team must have a proper balance between system and data support expertise and modeling capabilities in the area of statistics, data mining, and forecasting. An example of key roles in a centralized group of data mining for forecasting is given below.

       The system administrator maintains servers, upgrades software, handles security issues, and interacts with IT.

       The data administrator maintains data integrity, identifies internal and external data sources, and collects and harmonizes data.

       The modeler interacts with clients, identifies system structure and data, pre-processes the data, performs variable reduction and selection, develops, validates, implements, and maintains forecasting models.

       The manager manages the group, delivers needed resources, and brings in projects.

      The proper place of this group within a large organization is in the centralized corporate business services. This group serves all potential users so that the return of investment is maximized. The size of the group depends on projected demand. However, at least five to seven developers are needed to be efficient. It is assumed that a period of at least two to three years is needed for the group to establish itself by building infrastructure, hiring, learning, promoting to potential clients, and developing test projects. The funding during this period is centralized and gradually gives way to a self-support mode where projects are supported directly by their clients. The key issue that will determine the fate of this group is whether a sustainable project pipeline can be maintained.

      Forecasting users come from different parts of the organization. Typical clients for statistical forecasting services are the marketing, sales, financial, purchasing, and operations planning departments. Forecasting users can be classified in the following four categories, briefly discussed below: (1) forecasting reports users, (2) planners, (3) decision-makers, and (4) top level managers. (A similar user classification for demand-driven forecasting is described in detail in Charles Chase's book Demand-Driven Forecasting: A Structured Approach to Forecasting.)

      Forecasting reports users

      These are the users who passively use the delivered forecasts for information purposes only without making direct business decisions based on specific forecasting results or participating in judgmental forecasting or process planning. Most of the top managers are in this category. Recently many businesses have included forecasts