Figure 1.6 illustrates the hierarchy of the HVS.
Fred Attneave and Horace Barlow realized that the HVS perceives surroundings in an ‘economical description’ or ‘economical thought’ that compresses the information redundancy in the visual stimuli. Actually, this point of view suggests an opportunity for us to consider extracting prior information in the HVS perspective. Specifically, in neurophysiological studies Barlow proposed the efficient coding hypothesis in 1961, as a theoretical model of sensory coding in the human brain. In the brain, neurons communicate with one another by sending electrical impulses or spikes (action potentials), which represent and process information on the outside world. Since among the hundreds of millions of neurons in the visual cortex only a few neurons are activated in response to a specific input, Barlow hypothesized that a neural code formed by the spikes represents visual information efficiently; that is, the HVS has the sparse representation ability. HVS tends to minimize the number of spikes needed to transmit a given signal, which can be modeled as an optimization problem. In his hypothesis, the brain uses an efficient coding system suitable for expressing the visual information of different scenes. Barlow’s model treats the sensory pathway as a communication channel, in which neuronal spikes are sensory signals, with the goal to maximize the channel capacity by reducing the redundancy in a representation. They thought that the goal of the HVS is to use a collection of independent events to explain natural images. To form an efficient representation of natural images, the HVS uses pre-processing operations to get off first- and second-order redundancy. In natural image statistics, first-order statistics gives the direct current (DC), which is average luminance, and the second order describes variance and covariance, i.e. the contrast of the image. The heuristics is that image recognition should not be changed by the average luminance and contrast scale. In mathematics, this pre-processing can be modeled as zero-phase component analysis (ZCA). Interestingly, it was found that the responses of ganglion and LGN cells are similar to features obtained with natural image statistics techniques such as ZCA.
Inspired by the mechanism of the HVS, researchers have worked to mimic the HVS by reducing the redundancy of images so as to represent them efficiently. In this context, machine learning techniques were used to obtain similar features as observed in the HVS. In figure 1.7, we explain the relationship between an artificial neural network (to be explained in chapter 3) and the HVS. Furthermore, in the HVS feature extraction and representation, high-order redundancy is also reduced. Specifically, the receptive field properties are accounted for with a strategy to sparsify the output activity in response to natural images. The ‘sparse coding’ concept was introduced to describe this phenomenon. Olshausen and Field, based on neurobiological observations, used a network to code image patches in an over-complete basis to capture image structures under sparse constraints. They found that the features have local, oriented, receptive fields, essentially the same as V1 receptive fields. That is to say, the HVS and natural image statistics are closely related, both of which are very relevant to prior information extraction.
Figure 1.7. The relationship between an artificial neural network and the HVS.
In the following sub-sections, we will introduce several HVS models, and describe how to learn features from natural images in the light of visual neurophysiological findings.
1.1.5 Data decorrelation and whitening
How can one represent natural images with their intrinsinc properties? One of the widely used methods in natural image statistics is principal component analysis (PCA). PCA considers the second-order statistics of natural images, i.e. the variances of and covariances among pixel values. Although PCA is not a sufficient model for the HVS, it is the foundation for the other models, and is usually applied as a pre-processing step for further analysis (Hyvärinen et al 2009). It can map original data into a set of linearly decorrelated representations of each dimension through linear transformation of the data, identifying the main linear components of the data.
During the linear transformation, we would like to make the transformed vectors as dispersed as possible. Mathematically, the degree of dispersion can be expressed in terms of variance. The variance of data provides information about the data. Therefore, by maximizing the variance, we can obtain the most information, and we define it as the first principle component of the data. After obtaining the first principal component, the next linear feature must be orthogonal to the first one and, more generally, a new linear feature should be made orthogonal to the existing ones. In this process, the covariance of vectors is used to represent their linear correlation. When the covariance equals zero, there is no correlation between the two vectors. The goal of PCA is to diagonalize the covariance matrix: that is, minimizing the amplitudes of the elements other than the diagonal ones, because diagonal values are the variances of the vector elements. Arranging the elements on the diagonal from top to bottom according to their amplitude, we can achieve PCA. In the following, we briefly introduce a realization of the PCA method.
Usually, before calculating PCA we remove the DC component in images (the first-order statistical information, often containing little structural information for natural images). Let X⊆Rn×m denote a sample matrix with DC removed, n be the data dimension, and m be the number of samples. Then, the covariance matrix can be computed as follows:
Σ=1mXX⊤.(1.11)
By singular value decomposition (SVD), the covariance matrix can be expressed as
Σ=USV,(1.12)
where U is an n × n unitary matrix, S is an n × n eigenvalue matrix, and V=U⊤ is also an n × n unitary matrix. The magnitude of the eigenvalues reflects the importance of the principal components. Arranging the eigenvalues from top to bottom in descending order, PCA can be realized with the following formula:
XPCA=U⊤X.(1.13)
Figure 1.8 depicts 64 weight matrices for Lena image patches of 8 × 8 pixels. The descending order of variance is from left to right along each row, and from top to bottom row-wise. PCA has been widely applied as a handy tool to compress data. In figure 1.9, we show a simple experiment of PCA compression. It can be seen that a natural image can be represented by a small number of components, relative to its original dimensionality. This means that some data redundancy in natural images can be removed by PCA.
Figure 1.8. 64 weighting matrices for Lena image patches of 8 × 8 pixels.
Figure 1.9. Image compressed with PCA. Lena image (©) Playboy Enterprises, Inc.
There is an important pre-processing step related to PCA, which is called whitening. It removes the first- and second-order information which, respectively, represent the average luminance and contrast information, and allows us to focus on higher-order statistical properties of the original data. Whitening is also a basic processing function of the retina and LGN cells (Atick and Redlich 1992). The data exhibit the following properties after the whitening operations: (i) the features are uncorrelated and (ii) all features have the same variance. In the patch-based whitening process, it is worth mentioning that the whitening process works well with PCA or other redundancy reduction methods. After PCA, the only thing we need to do for whitening data is to normalize the variances of the principal components. Thus, PCA with whitening can be expressed as follows:
XPCAwhite=S−12U⊤X,(1.14)
where S−12=diag(1λ1,…,1λn), λi is the eigenvalues.
After the whitening process, we have nullified the second-order information. That is, PCA with whitening remove the first- and second-order redundancy of data. Whitening, unlike