Vassil Roussev

Digital Forensic Science


Скачать книгу

phases. It does, however, postulate that the inquiry follow the general scientific method, which typically consists of four phases: Observation, Hypothesis Formulation, Prediction, and Testing & Searching.

      Observation includes the running of appropriate tools to capture and observe aspects of the state of the system that are of interest, such as listing of files/processes, and rendering the content of files. During Hypothesis Formulation the investigators use the observed data, and combine it with their domain knowledge to formulate hypothesis that can be tested, and potentially falsified, in the history model. In the Prediction phase, the analyst identifies specific evidence that would be consistent, or would be in contradiction, with the hypothesis. Based on the predictions, experiments are performed in the Testing phase, and the outcomes are used to guide further iterations of the process.

       Categories of Forensic Analysis

      Based on the outlined framework, the CHM identifies seven categories of analytical techniques.

      History duration. The sole techniques in this category and is operational reconstruction—it uses event reconstruction and temporal data from the storage devices to determine when events occurred and at what points in time the system was active. Primary sources for this analysis include log files, as well as the variety of timestamp attributes kept by the operating system and applications.

      Primitive storage system configuration. The techniques in this category define the capabilities of the primitive storage system. These include the names of the storage devices, the number of addresses for each storage device, the domain of each address on each storage device, and when each storage device was connected. Together, these sets and functions define the set of possible states Q of the FSM.

      Primitive event system configuration. Methods in this category define the capabilities of the primitive event system; that is, define the names of the event devices connected, the event symbols for each event device, the state change function for each event device, and when each event device was connected. Together, these sets and functions define the set of event symbols Σ and state change function δ. Since primitive events are almost never of direct interest to an investigation, these techniques are not generally performed.

      Primitive state and event definition. Methods in this category define the primitive state history (hps) and event history (hes) functions. There are five types of techniques that can be used to formulate and test this type of hypothesis and each class has a directional component. Since different approaches can be used to defining the same two functions, a hypothesis can be formulated using one technique and tested with another. Overall, these are impractical in real investigations, but are presented below for completeness.

      Observation methods use direct observation of an output device to define its state in the inferred history, and are only applicable to output device controllers; they cannot work for internal devices, such as hard disks.

      Capabilities techniques employ the primitive system capabilities to formulate and test state and event hypotheses. To formulate a hypothesis, the investigator chooses a possible state or event at random; this is impractical for almost all real systems as the state space is enormous.

      Sample data techniques extract samples from observations of similar systems or from previous executions of the system being investigated; the results are metrics on the occurrence of events and states. To build a hypothesis, states and events are chosen based on how likely they are to occur. Testing the hypothesis reveals if there is evidence to support the state or event. Note that this is a conceptual class not used in practice as there are no relevant sample data.

      Reconstruction techniques use a known state to formulate and test hypotheses about the event and state that existed immediately prior to the known state. This is not performed in practice, as questions are rarely formulated about primitive events.

      Construction methods are the forward-looking techniques that use a known state to formulate and test hypotheses about the next event and state. This is not useful in practice as the typical starting point is an end state; further, any hypothesis about the future state would not be testable.

      Complex storage system configuration. Techniques in this category define the complex storage capabilities of the system, and are needed to formulate and test hypotheses about complex states. The techniques define the names of the complex storage types (Dcs), the attribute names for each complex storage type (DATcs), the domain of each attribute (ADOcs), the set of identifiers for the possible instances of each complex storage type (DADcs), the abstraction transformation functions for each complex storage type (ABScs), the materialization transformation functions for each complex storage type (MATcs), and the complex storage types that existed at each time and at each abstraction layer XL(ccs–X).

      Two types of hypotheses are formulated in this category: the first one defines the names of the complex storage types and the states at which they existed; the second defines the attributes, domains, and transformation functions for each complex storage type. As discussed earlier, complex storage locations are program data structures. Consequently, to enumerate the complex storage types in existance at a particular point in time requires the reconstruction of the state of the computer, so that program state could be analyzed.

      Identification of existing programs can be accomplished in one of two ways: program identification—by searching for programs on the system to be subsequently analyzed; and data type observation—by inferring the presence of complex storage types that existed based on the data types that are found. This latter technique may give false positives in that a complex type may have been created elsewhere and transferred to the system under investigation.

      Three classes of techniques can be used to define the attributes, domains, and transformation functions for each complex storage type: (a) complex storage specification observation, which uses a specification to define a program’s complex storage types; (b) complex storage reverse engineering, which uses design recovery reverse engineering to define complex storage locations; (c) complex storage program analysis, which uses static, or dynamic, code analysis of the programs to identify the instructions creating, or accessing, the complex storage locations and to infer their structure.

      It is both impractical and unnecessary to fully enumerate the data structures used by programs; only a set of the most relevant and most frequently used ones are supported by investigative tools, and the identification process is part of the tool development process.

      Complex event system configuration. These methods define the capabilities of the complex event system: the names of the programs that existed on the system (Dce), the names of the abstraction layers(L), the symbols for the complex events in each program (DSYce–X), the state change functions for the complex events (DCGce–X), the abstraction transformation functions (ABSce), the materialization transformation functions (MATce), and the set of programs that existed at each time (cce).

      Inferences about events are more difficult than those about storage locations because the latter are both abstracted and materialized and tend to be long-lived because of backward compatibility; the former are usually designed from the top-down, and backward compatibility is a much lesser concern.

      Three types of hypotheses can be tested in this category: (a) programs existence, including period of their existence; (b) abstraction layers, event symbols, and state change functions for each program; (c) the materialization and abstraction transformation functions between the layers.

      With respect to (a), both program identification and data type reconstruction can be used in the forms already described.

      For hypotheses in regard to (b), there are two relevant techniques—complex event specification observation and complex event program analysis. The former uses a specification of the program to determine the complex events that it could cause. The latter works directly with the program to observe the events; depending on the depth of the analysis, this could be as simple as running the program under specific