due to the quantization of the DCT coefficients, in particular those corresponding to high frequencies. Furthermore, the curve estimated by Ponomarenko et al.’s algorithm (Colom and Buades 2013) differs more from the temporal series curves, because the noise estimation method estimates noise at high frequencies, which are altered or even destroyed by compression.
The noise present in JPEG images is the result of several transformations on the initial noise model, which initially follows a Poisson distribution. In the end, the final image’s noise does not follow any predefined model, it instead depends on many unknown parameters that are set by each manufacturer. The only certainty we have is that noise is intensity dependent and frequency dependent. Therefore, it is preferable to use non-parametric models to estimate noise curves, so as to estimate the curves from the image itself.
1.3.3. Forgery detection through noise analysis
Image tampering, such as external copy–paste (splicing) or internal texture synthesis, can be revealed by inconsistencies in local noise levels, since noise characteristics depend on lighting conditions, camera sensors, ISO setting, and post-processing, as shown in section 1.3.2.
One of the unique features characterizing noise is the photographic response non-uniformity (PRNU), as presented in section 1.3.2. Chen et al. propose to detect the source digital camera by estimating the PRNU (Chen et al. 2008). According to the authors, the PRNU represents a unique fingerprint of image sensors and, hence, it would reveal altered images.
One of the most popular algorithms for detecting splicing using noise level traces is proposed by Mahdian and Saic (2009). It consists of dividing the image into blocks and estimating the noise level using wavelets in each block. Blocks are then merged into homogeneous regions, the noise standard deviation being the homogeneity condition. The output of this method is a map showing the segments of the image having a similar noise standard deviation.
A different approach is introduced in Pan et al. (2011), where the noise estimation is based on the kurtosis concentration phenomenon. The kurtosis of natural images across different frequency bands is constant. This allows for the estimation of noise variance when it follows an additive white Gaussian noise (AWGN) model. The method then segments the image into regions based on their noise variance using the k-means algorithm.
The method presented in Yao et al. (2017) makes use of the signal dependency of noise. Instead of a single noise level, it estimates a noise-level function. The image is segmented into edges and flat regions. It estimates the noise level on flat regions and the camera response function (CRF) on edges. Noise level functions are then compared and an empirical threshold is set to detect the salient curves. The main disadvantage of this method is that it assumes that the image has only undergone demosaicing.
We will now describe a recent method based on multi-scale noise analysis for the detection of tampering in JPEG-compressed images. After the complete camera processing chain, noise is not only signal dependent but also frequency dependent, which is mainly due to the correlation introduced by demosaicing and the quantization of DCT coefficients during JPEG compression. In this context, a multi-scale approach is necessary to capture the noise in the low frequencies. Indeed, when successive subscales are considered, the low frequencies become high frequencies due to the contraction of the DCT domain, which makes it possible to estimate the noise in low frequencies with the methods presented in section 1.3.1. One could argue that the noise contained in the low and medium frequencies could be directly estimated without considering subscales, but this is a risky procedure since these frequencies also contain part of the signal. This problem is avoided by the proposed method because, at each scale, the algorithm finds blocks having low variance in the low and medium frequencies to estimate the noise.
Consider the operator S that tessellates the image into sets of blocks of 2 × 2 pixels, and replaces each block by a pixel whose value is the average of the four pixels. We define the nth scale of an image u, and we denote it by Sn, as the result of applying n times the operator S to the image u.
The first step of the method consists of splitting the image into blocks of 512 × 512 with 1/2 overlap, which are called macro-blocks. Noise curves are estimated using Ponomarenko et al.’s method (Colom and Buades 2013) in each of the three RGB channels for each macro-block, as well as for the complete image (global estimation), at scales 0, 1 and 2. For each scale and channel, the noise curves obtained from each macro-block are compared to the global estimation.
Ideally, a non-forged image should exhibit the same noise level function for all of its macro-blocks, as well as for the entire image. However, when estimating noise curves in the presence of textures, noise overestimation is likely to happen (Liu et al. 2006). Therefore, textured macro-blocks are expected to give higher noise levels, even when they are not tampered with. Thus, the global estimation obtained provides, in fact, a lower bound for the noise curves of the individual macro-blocks. Therefore, any macro-block with lower noise levels than the global estimation is suspicious, as it would be an indicator that the underlying region has a different noise model than the rest of the image.
For detection purposes, we consider the percentage of bins in the noise curve of the macro-block whose count is below the global estimation, independently for each RGB channel and for each scale. The geometric mean of percentages obtained for each channel provides a heat map, unique to each scale. These heat maps show macro-blocks with noise levels that are incompatible with the global estimation, at a given scale.
Different criteria can be considered to define detections. One possibility is to consider that a macro-block is detected if, at any scale, the geometric mean of the percentage of cells below the overall image curve is 100%. This means that the noise curve calculated by the macro-block, at a certain scale, is entirely below the noise curve of the overall image, for all three RGB channels.
The size of the macro-blocks may appear to be too big compared to other methods willing to detect forgeries by noise analysis. This choice is made in order to achieve S2 with a reasonable number of bins to obtain an accurate enough estimation. Each sub-scale implies a reduction of the image by a factor of 2 in each direction. In this way, if the macro-blocks are 512 × 512 in S0, in S1, they are 256 × 256, and in S2, the macro-blocks are 128 × 128.
Figure 1.7 shows an example of an image where an external copy–paste has been performed. The vase on the right has been cropped from the auxiliary image and pasted onto the original image. The results of applying the proposed method to this forged image are shown in Figure 1.8.
At S0, the results do not show any different behavior in the tampered zone. If another scale is considered, S1, we find that the manipulated area has lower noise levels than the rest of the image. In fact, the noise curves corresponding to macro-blocks containing the spliced region have about 80% of their bins below the global noise curve, this percentage being slightly different for each channel. Finally, S2 provides the strongest proof of falsification. Indeed, the noise curves corresponding to forged macro-blocks have all of their bins below the global estimation in all the three RGB channels.
Figure 1.7. Example of falsification: the vase in b) has been cut out and copied onto a), which gives c)
COMMENT ON FIGURE 1.7.– The original image was taken with ISO 800 and exposure time 1/8 s. The auxiliary image was taken with ISO 100 and exposure time 1.3 s. Both images were taken with the same Panasonic Lumix DMC-FZ8 camera under the high-quality JPEG compression setting.
This example illustrates the need for a multi-scale approach for noise inconsistency analysis applied to forgery detection.
To conclude, noise inconsistency analysis is a rich source for