Vassil Roussev

Digital Forensic Science


Скачать книгу

forensics.

      In this section we discuss three models of the forensic analysis; each considers a different aspect of the analysis and uses different methods to describe the process. Garfinkel’s differential analysis (Section 3.3.1) approach formalizes a common logical inference technique (similar, for example, to differential diagnosis in medicine) for the case of computer systems. In this context, diffential analysis is an incremental technique to reason about the likely prior state and/or subsequent events of individual artifacts (e.g., a file has been copied).

      Carrier’s computer history model (Section 3.3.2) takes a deeper mathematical approach in describing forensics by viewing the computer system under investigation as a finite state machine. Although it has few direct practical implications, it is a conceptually important model for the field. Some background in formal mathematical reasoning is needed to fully appreciate its contribution.

      The final model of Pirolli and Card (Section 3.3.3) does not come from the digital forensics literature, but from cognitive studies performed on intelligence analysts. It is included because we believe that the analytical process is very similar and requires the same type of skills. Understanding how analysts perform the cognitive tasks is of critical importance to designing usable tools for the practice. It also helps in understanding and modeling the differences in the level of abstraction at which the three groups of experts—forensic researchers/developers, analysts, and lawyers—operate.

      The vast majority of existing forensic techniques can be described as special cases of differential analysis—the comparison of two objects, A and B, in order to identify the differences between them. The ultimate goal is to infer the sequence of events that (likely) have transformed A into B (A preceeds B in time). In the context of digital forensics, this fundamental concept has only recently been formalized by Garfinkel et al. [75], and the rest of this section introduces the formal framework they put forward.

       Terminology

      Historically, differencing tools (such as the venerable diff) have been applied to a wide variety of artifacts, especially text and program code, long before they were employed for forensic use. The following definitions are introduced to formally generalize the process.

      • Image. A byte stream from any data-carrying device representing the object under analysis. This includes all common evidence sources—disk/filesystem images, memory images, network captures, etc.

      Images can be physical, or logical. The former reflect (at least partially) the physical layout of the data on the data store. The latter consists of a collection of self-contained objects (such as files) along with the logical relationships among them without any reference to their physical storage layout.

      • Baseline image, A. The image first acquired at time TA.

      • Final image, B. The last acquired image, taken at time TB.

      • Intermediary images, In. Zero, or more, images recorded between the baseline and final images; In is the nth image acquired.

      • Common baseline is a single image that is a common ancestor to multiple final images.

      • Image delta, B – A, is the differences between two images, typically between the baseline image and the final image.

      • The differencing strategy defines the rules for identifying and reporting the differences between two, or more, images.

      • Feature, f, is a piece of data that is either directly extracted from the image (file name/size), or is computed from the content (crypto hash).

      • Feature in image, (A, f). Features are found in images; in this case, feature f is found in image A.

      • Feature name, NAME (A, f). Every feature may have zero, one, or multiple names. For example, for a file content feature, we could use any of the file names and aliases under which it may be known in the host filesystem.

      • Feature location, Loc(f), describes the address ranges from which the content of the particular feature can be extracted. The locations may be either physical, or logical, depending on the type of image acquired.

      • A feature extraction function, F(), performs the extraction/computation of a feature based on its location and content.

      • Feature set, F(A), consists of the features extracted from an image A, using the extraction function F().

      • The feature set delta, F(B) – F(A), contains the differences between the feature sets extracted from two images; the delta is not necessarily symmetric.

      • Transformation sequence, R, consists of the sequence of operations that, when applied to A, produce B. For example, the Unix diff program can generate a patch file that can be used to transform a text file in this fashion. In general, R is not unique and there can be an infinite number of transformations that can turn A into B.

       Generalized Differential Analysis

      As per [75], each feature has three pieces of metadata:

      Location: A mandatory attribute describing the address of the feature; each feature must have at least one location associated with it. Name: A human-readable identifier for the feature; this is an optional attribute. Timestamp(s) and other metadata: Features may have one, or more, timestamps associated with them, such as times of creation, modification, last access, etc. In many cases, other pieces of metadata (key-value pairs) are also present.

      Given this framework, differential analysis is performed not on the data images A and B, but on their corresponding feature sets, F(A) and F(B). The goal is to identify the operations which transform F(A) into F(B). These are termed change primitives, and seek to explain/reproduce the feature set changes.

      In the general case, such changes are not unique as the observation points may fail to reflect the effects of individual operations which are subsequently overridden (e.g., any access to a file will override the value of the last access time attribute). A simple set of change inference rules is defined (Table 3.1) and formalized (Table 3.2) in order to bring consistency to the process. The rules are correct in that they transform F(A) into F(B) but do not necessarily describe the actual operations that took place. This is a fundamental handicap for any differential method; however, in the absence of complete operational history, it is the best that can be accomplished.

      If A and B are from the same system and TA < TB, it would appear that all new features in the feature set delta F(B) – F(A) should be timestamped after TA. In other words, if B were to contain features that predate TA, or postdate TB, then this would rightfully be considered an inconsistecy. An investigation should detect such anomalies and provide a sound explanation based on knowledge of how the target system operates. There is a range of possible explanations, such as:



If something did not exist and now it does, it was created
If it is in a new location, it was moved