Professor Ge Wang

Machine Learning for Tomographic Imaging


Скачать книгу

natural images pixel/voxel-wise, an image can be transformed to a feature space and obtain feature statistics to build a prior model. Also, the features, unlike pixel values having corresponding dependence, are independent or nearly independent of each other, which makes the statistical model informative. This concept is the key to all natural image statistics. When it comes to natural image statistics, it is necessary to introduce the human vision system (HVS) because many natural image statistics and analyses are derived based on observations of the HVS. There are two sides to sensing prior information. In the next subsection, we will first briefly introduce the HVS mechanism, and then describe some basic techniques in natural image statistics.

      The human vision system (HVS) is an important part of the central nervous system, which enables us to observe and perceive our surroundings (Hyvärinen et al 2009). After its long-term adaptation to natural scenes, the HVS is highly efficient in working with natural scenes through multi-layer perceptive operations. Here, natural scenes refer to daily-life inputs to the HVS. Visual perception begins with the pupils which catch light, then the information carried by light photons is processed step by step, and finally analyzed for perception in the brain, as depicted in figure 1.2. This pathway consists of neurons. Typically, a neuron consists of a cell body (soma), dendrites to weight and integrate inputs, and an axon to output a signal, also referred to as an action potential. In the following, we briefly introduce the multiple layer structure of HVS.

image

      Figure 1.2. A schematic of the HVS pathway.

      The first stage involves light photons reaching the retina. The retina is the innermost light-sensitive layer of tissue of the eye. It is covered by more than a hundred million photoreceptors, which translate the light into electrical neural impulses. Depending on their function, the photoreceptors can be divided into two types—cone cells and rod cells. Rod cells are mainly distributed in the peripheral area of the fovea, which is sensitive to light and can respond even to a single photon. These cells are mainly responsible for vision in a low-light environment, with neither high acuity nor color sensing. Contrary to rod cells, cone cells are distributed in the fovea region, and are responsible for perception of details and colors in a bright environment, but are light-insensitive.

      In the second stage, the electrical signals are transmitted and processed through neural layers. One of the most important cell-types, called Ganglion cells, gather all the information from other cells and send the signal from the eye along their long axons. The visual signals are initially processed in this stage. Neurobiologists have found that the receptive field of ganglion cells is usually centralized or circularly symmetric, with the center either excited or inhibited by light. Such light responses can be simulated by the Laplace of Gaussian (LOG) or zero-phase component (ZCA) operator. We describe two kinds of LOG operator in figure 1.3 from three perspectives: 3D visualization, 2D plane figure, and center profile.

image

      Figure 1.3. Visualization of the LOG operator.

      In the HVS, the receptive field of a visual neuron is defined as the specific light pattern over the photoreceptors of the retina which yields the maximum response of the neuron. We illustrate this operation with a vivid example decipted in figure 1.4 with two different operators.

image

      Figure 1.4. Responses of the LOG and Gabor filters, which can be modeled as convolutions with an underlying image. Lena image © Playboy Enterprises, Inc.

      Next, the signal is transmitted to the lateral geniculate nucleus (LGN) of the thalamus, which is the main sensory processing area in the brain. The receptive field of the LGN is also centralized or circularly symmetric. After processing by the LGN, the signal is transmitted to the visual cortex at the back of the brain for subsequent processing steps. It is worth mentioning that, different to the retina, the number of ganglion or LGN cells is not great, only just over a million. That is to say, they work with the compressed features from the retina after reducing the redundancy in original data.

      The first place in the cortex where most of the signals go is the primary visual cortex, or V1 for short. One type of cell in V1, which we understand the best, is the simple cells, whose receptive fields are well understood (Ringach 2002). Simple cells have responses that depend on the direction and spatial frequency of the stimulus signal. These responses can be modeled as a Gabor function or Gaussian derivative. Hence, the receptive fields of simple cells are interpreted as Gabor-like or directional band-pass filters. The Gabor function can be regarded as a combination of Gaussian and sine functions. There are several parameters to control the shape of a Gabor function. Similarly to LOG visualization, we also describe the Gabor function in figure 1.5 with different parameter settings. Observe how the parameters affect the Gabor function.

image

      Figure 1.5. Visualization of the Gabor function.

      With selective characteristics, hundreds of millions of simple cells work together in V1. Neurobiologists have found that only a few cells are activated when a signal is inputted, which means that simple cells implement a sparse coding scheme. After being processed in V1, the signal is transformed to multiple destinations for further processing in the cortex. The destinations can be categorized into ‘where’ and ‘what’ pathways. The ‘where’ pathway is also known as the dorsal pathway going from V1/V2 through V3 to V5. It distinguishes moving objects and helps the brain to recognize where objects are in space. The ‘what’ pathway, namely the ventral pathway, begins from V1/V2 to V4 and inferior temporal cortex, IT, where the HVS performs content discrimination and pattern recognition (Cadieu et al 2007). Given the emphasis of this book on medical imaging, we emphasize the ‘what’ pathway that is modeled as multi-layer perceptive operations from simple to complex when the visual field becomes increasingly larger, as illustrated in figure 1.6.

image

      Figure 1.6. Multi-layer structures of HVS, perceiving the world in multiple stages from primitive to semantic.

      In addition to the simple cells, there are also other kinds of visual neurons in the HVS. Another kind of visual cell we have studied extensively is the complex cells, which are mainly distributed in V1, V2, and V3. Complex cells integrate the outputs of nearby simple cells. They respond to specific stimuli located within the receptive field. In addition, there are also hypercomplex cells, called end-stopped cells, which are located in V1, V2, and V3, and respond maximally to a given size of stimuli in the receptive field. This kind of cell is recognized to perceive corners and curves, and moving structures.

      To date, the investigation of our brains has been far from sufficient. We only have some partial knowledge of these areas, in particular of deeper layers such as V4 and the posterior regions. Generally speaking, visual cells in V1 and V2 detect primary visual features with selectivity for directions, frequencies, and phases. Some specific cells in V2 also provide stereopsis based on the difference in binocular cues, which helps recover the surface information of an object. In V4, the visual cells perceive the simple geometric shapes of objects in receptive fields larger than that of V2. This shape-oriented analysis capability is due to the selectivity of V4 cells for complex stimuli and is invariant with respect to spatial translation. In posterior regions of the visual pathway, such as the IT, image semantic structures are recognized, which depend on much larger receptive fields than that of V4. In general, billions of various visual neurons construct the hierarchically sophisticated visual system that analyzes and synthesizes visual features