Professor Ge Wang

Machine Learning for Tomographic Imaging


Скачать книгу

image

      Figure 1.13. Basis functions learned by the sparse coding algorithm. All were normalized, with zero always represented by the same gray level. Reproduced with permission from Olshausen and Field (1997). Copyright 1997 Elsevier.

      The second model (Bell and Sejnowski 1997) was proposed based on the independent component analysis (ICA) principle. ICA is an approach to solving the blind source separation (BSS) problem (figure 1.14). In natural image statistics, let X={xi∣i=1,…,N} represent N independent source signals forming a column vector, Y={y1,yi,…∣i=1,…,M} representing M image patches also forming a column vector, and W is the mixing matrix of N × M dimensions. The BSS problem is to invert the measurement Y={yj∣j=1,…,M},

      Y=WX,M⩾N,(1.18)

      for both W and X subject to uncertainties in amplitudes and permutations of independent source signals. ICA helps us find the basis components X={xi∣i=1,…,N} which have representative features of image patches.

image

      Figure 1.14. ICA to find both embedded independent components and the mixing matrix that blends the independent components.

      The premise of ICA is based on statistical independence among hidden data sources. In information theory (see appendix A for more details), we use the mutual information to measure the relationship between two signals. Let H(X) and H(Y) represent the self-information, which solely depend on the probability density functions of X and Y, respectively. H(X, Y) is the joint information, which represents the amount of information generated when X and Y occur together. I(X, Y) is the mutual information that is the information we have when a certain X or Y is known (figure 1.15). For example, if we know the information about Y, we only need to have the amount of H(X) − I(X, Y) information to determine X completely. When two signals are independent, the mutual information is zero (Chechile 2005).

image

      Figure 1.15. Joint information determined by the signal information H(X) and H(Y) as well as mutual information I(X,Y).

      We consider the ICA operation as a system, Y as the input and X as the output. When the output information of the system reaches its maximum, it indicates the minimum mutual information between output components. That is to say, the output components are as independent of each other as possible, since any non-trivial linear combination will compromise data independency. This is a simplified description of the infomax principle.

      In 1997, Bell and Sejnowski applied ICA using the information theoretic approach in the case of natural images, and found that ICA is a special sparse coding method. They explained the results obtained with the network proposed by Olshausen and Field in the ICA framework. ICA on natural images produces decorrelating filters that are sensitive to both phase and frequency, similar to the cases with transforms involving oriented Gabor functions or wavelets. Representative ICA filters generated from natural images are shown in figure 1.16. It can be seen that ICA can also model the Gabor-like receptive fields of simple cells in V1.

image

      Figure 1.16. A matrix of 144 filters obtained using ICA on ZCA-whitened natural images. Reproduced with permission from Bell and Sejnowski (1997). Copyright 1997 Elsevier.

      In this chapter, we have provided a general explanation on how to reduce data redundancy and form a sparse representation in the HVS. Multiple types of cells, such as ganglion and LGN cells, are involved to normalize first- and second-order statistics and remove the associated redundancy. In the HVS, higher-order redundancy is eliminated with simple cells. From the viewpoint of biomimicry, the mechanism of simple cells is the basis for sparse representation. In addition, from the natural image perspective, we can use a sparsifying transform or model to obtain similar results as are observed in the HVS. It is noted that deep neural networks (to be formally explained in chapter 3) exhibit workflows similar to that of the HVS, such as multi-resolution analysis. As a second example, the whitening process is used to pre-process data for both the HVS and machine learning. Yet another example is that higher-order redundancy operations share Gabor-like characteristics observed in the HVS and machine learning. It will become increasingly more clear that machine learning imitates the HVS in major ways. Now, we have the tools to extract features constrained by or in reference to natural image statistics. How could we use these features to help solve practical problems? This question naturally leads us to our following chapters.

      Poggio T 2007 A model of V4 shape selectivity and invariance J. Neurophysiol. 98 1733–50