Figure 1.13. Basis functions learned by the sparse coding algorithm. All were normalized, with zero always represented by the same gray level. Reproduced with permission from Olshausen and Field (1997). Copyright 1997 Elsevier.
The second model (Bell and Sejnowski 1997) was proposed based on the independent component analysis (ICA) principle. ICA is an approach to solving the blind source separation (BSS) problem (figure 1.14). In natural image statistics, let X={xi∣i=1,…,N} represent N independent source signals forming a column vector, Y={y1,yi,…∣i=1,…,M} representing M image patches also forming a column vector, and W is the mixing matrix of N × M dimensions. The BSS problem is to invert the measurement Y={yj∣j=1,…,M},
Y=WX,M⩾N,(1.18)
for both W and X subject to uncertainties in amplitudes and permutations of independent source signals. ICA helps us find the basis components X={xi∣i=1,…,N} which have representative features of image patches.
Figure 1.14. ICA to find both embedded independent components and the mixing matrix that blends the independent components.
The premise of ICA is based on statistical independence among hidden data sources. In information theory (see appendix A for more details), we use the mutual information to measure the relationship between two signals. Let H(X) and H(Y) represent the self-information, which solely depend on the probability density functions of X and Y, respectively. H(X, Y) is the joint information, which represents the amount of information generated when X and Y occur together. I(X, Y) is the mutual information that is the information we have when a certain X or Y is known (figure 1.15). For example, if we know the information about Y, we only need to have the amount of H(X) − I(X, Y) information to determine X completely. When two signals are independent, the mutual information is zero (Chechile 2005).
Figure 1.15. Joint information determined by the signal information H(X) and H(Y) as well as mutual information I(X,Y).
We consider the ICA operation as a system, Y as the input and X as the output. When the output information of the system reaches its maximum, it indicates the minimum mutual information between output components. That is to say, the output components are as independent of each other as possible, since any non-trivial linear combination will compromise data independency. This is a simplified description of the infomax principle.
In 1997, Bell and Sejnowski applied ICA using the information theoretic approach in the case of natural images, and found that ICA is a special sparse coding method. They explained the results obtained with the network proposed by Olshausen and Field in the ICA framework. ICA on natural images produces decorrelating filters that are sensitive to both phase and frequency, similar to the cases with transforms involving oriented Gabor functions or wavelets. Representative ICA filters generated from natural images are shown in figure 1.16. It can be seen that ICA can also model the Gabor-like receptive fields of simple cells in V1.
Figure 1.16. A matrix of 144 filters obtained using ICA on ZCA-whitened natural images. Reproduced with permission from Bell and Sejnowski (1997). Copyright 1997 Elsevier.
In this chapter, we have provided a general explanation on how to reduce data redundancy and form a sparse representation in the HVS. Multiple types of cells, such as ganglion and LGN cells, are involved to normalize first- and second-order statistics and remove the associated redundancy. In the HVS, higher-order redundancy is eliminated with simple cells. From the viewpoint of biomimicry, the mechanism of simple cells is the basis for sparse representation. In addition, from the natural image perspective, we can use a sparsifying transform or model to obtain similar results as are observed in the HVS. It is noted that deep neural networks (to be formally explained in chapter 3) exhibit workflows similar to that of the HVS, such as multi-resolution analysis. As a second example, the whitening process is used to pre-process data for both the HVS and machine learning. Yet another example is that higher-order redundancy operations share Gabor-like characteristics observed in the HVS and machine learning. It will become increasingly more clear that machine learning imitates the HVS in major ways. Now, we have the tools to extract features constrained by or in reference to natural image statistics. How could we use these features to help solve practical problems? This question naturally leads us to our following chapters.
References
Atick J J and Redlich A N 1992 What does the retina know about natural scenes? Neural Comput. 4 196–210
Bakushinsky A B and Kokurin M Y 2004 Iterative Methods for Approximate Solution of Inverse Problems (Berlin: Springer)
Baraniuk R G 2007 Compressive sensing IEEE Signal Process. Mag. 24 118–21
Bell A J and Sejnowski T J 1997 The ‘independent components’ of natural scenes are edge filters Vis. Res. 37 3327–38
Bertero M and Boccacci P 1998 Introduction to Inverse Problems in Imaging (Boca Raton, FL: CRC Press)
Cadieu C, Kouh M, Pasupathy A, Connor C E, Riesenhuber M and Poggio T 2007 A model of V4 shape selectivity and invariance J. Neurophysiol. 98 1733–50
Poggio T 2007 A model of V4 shape selectivity and invariance J. Neurophysiol. 98 1733–50
Chechile R A 2005 Independent component analysis: a tutorial introduction J. Math. Psychol. 49 426
Dashti M and Stuart A M 2017 The Bayesian Approach to Inverse Problems (Berlin: Springer)
Hyvärinen A, Hurri J and Hoyer P O 2009 Natural Image Statistics (London: Springer)
Landweber L 1951 An iteration formula for Fredholm integral equations of the first kind Am. J. Math 73 615–24
Leigh P N, Simmons A, Williams S, Williams V, Turner M and Brooks D 2002 Imaging: MRS/MRI/PET/SPECT: summary Amyotro. Later. Sclero. Other Motor Neur. Disord. 3 S75–80
Olshausen B A and Field D J 1996 Emergence of simple-cell receptive field properties by learning a sparse code for natural images Nature 381 607–9
Olshausen B