Stephen Winters-Hilt

Informatics and Machine Learning


Скачать книгу

target="_blank" rel="nofollow" href="#ulink_e1334cf4-1c80-59f3-bec5-cf66da8bb417">Figure 1.5 (Left) The general stochastic sequential analysis flow topology. (Center) The general signal processing flow in performing channel current analysis is typically Input ➔ tFSA ➔ Meta‐HMMBD ➔ SVM ➔ Output. (Right) Notable differences occur in channel current cheminformatics during state discovery when EVA‐projection (emission variance amplification projection), or a similar method, is used to achieve a quantization on states, then have Input ➔ tFSA ➔ HMMBD/EVA (state discovery) ➔ meta‐HMMBD‐side ➔ SVM ➔ Output. While, in gene‐finding just have: Input ➔ meta‐HMMBD‐side ➔ Output. In gene‐finding, however, the HMM internal “sensors” are sometimes replaced, locally, with profile‐HMMs [1, 3] (equivalent to position‐dependent Markov Models, or pMM’s, see Chapter 7), or SVM‐based profiling [1, 3], so the topology can differ not only in the connections between the boxes shown, but in their ability to embed in other boxes as part of an internal refinement.

      Source: Based on Winters‐Hilt [1, 3].

      The sequence of algorithmic methods used in the SSA Protocol, for the information‐processing flow topology shown in Figure 1.5, comprise a weak signal handling protocol as follows: (i) the weakness in the (fast) Finite State Automaton (FSA) methods will be shown to be their difficulty in nonlocal structure identification, for which HMM methods (and tuning metaheuristics) are the solution; (ii) for the HMM, in turn, the main weakness is in local sensing “classification” due to conditional independence assumptions. Once in the setting of a classification problem, however, the problem can be solved via incorporation of generalized SVM methods [1, 3]. If facing only classification task (data already preprocessed), the SVM will also be the method of choice in what follows. (iii) The weakness of the SVM, whether used for classification or clustering, but especially for the latter, is the need to optimize over algorithmic, model (kernel), chunking, and other process parameters during learning. This is solved via use of metaheuristics for optimization such as simulated annealing, and genetic algorithm optimization in (iv). The main weaknesses in the metaheuristic tuning effort is partly resolved via use of the “front‐end” methods, like the FSA, and partly resolved by a knowledge discovery process using the SVM clustering methods. The SSA Protocol weak signal acquisition and analysis method thereby establishes a robust signal processing platform.

      The HMM methods are the central methodology or stage in the SSA Protocol, particularly in the gene finders, and sometimes with the CCC protocol or implementation, in that the other stages can be dropped or merged with the HMM stage in many incarnations. For example, in some CCC analysis situations the tFSA methods could be totally eliminated in favor of the more accurate (but time consuming) HMM‐based approaches to the problem, with signal states defined or explored in more or less the same setting, but with the optimized Viterbi path solution taken as the basis for the signal acquisition.

      The HMM “sensor” capabilities can be significantly improved via switching from profile‐Markov Model (pMM) sensors to pMM/SVM‐based sensors, as indicated in [1, 3] and Chapter 7, where the improved performance and generalization capability of this approach is demonstrated.

      In standard band‐limited (and not time‐limited) signal analysis with periodic waveforms, sampling is done at the Nyquist rate to have a fully reproducible signal. If the sample information is needed elsewhere, it is then compressed (possibly lossy) and transmitted (a “smart encoder”). The received data is then decompressed and reconstructed (by simply summing wave components, e.g. a “simple” decoder). If the signal is sparse or compressible, then compressive sensing [190] can be used, where sampling and compression are combined into one efficient step to obtain compressive measurements (the simple encoding in [190] since a set of random projections are employed), which are then transmitted (general details on noise in this context are described in [191, 192]). On the receiving end, the decompression and reconstruction steps are, likewise, combined using an asymmetric “smart” decoding step. This progression toward asymmetric compressive signal processing can be taken a step further if we consider signal sequences to be equivalent if they have the same stationary statistics. What is obtained is a method similar to compressive sensing, but involving stationary‐statistics generative‐projection sensing, where the signal processing is non‐lossy at the level of stationary statistics equivalence. In the SCW signal analysis the signal source is generative in that it is describable via use of a HMM, and the HMM’s Viterbi‐derived generative projections are used to describe the sparse components contributing to the signal source. In SCW encoding the modulation of stationary statistics can be man‐made or natural, with the latter in many experimental situations involving a flow phenomenology that has stationary statistics. If the signal is man‐made, usually the underlying stochastic process is still a natural source, where it is the changes in the stationary statistics that is under the control of the man‐made encoding scheme. Transmission and reception are then followed by generative projection via Viterbi‐HMM template matching or via Viterbi‐HMM feature extraction followed by separate classification (using SVM). So in the SCW approach the encoding is even simpler (possibly non‐existent, other than directly passing quantized signal) and is applicable to any noise source with stationary statistics (e.g. a stationary signal with reproducible statistics, the case for many experimental observations). The decoding must be even “smarter,” on the other hand, in that generalized Viterbi algorithms are used, and possibly other ML methods as well, SVMs in particular. An example of the stationary statistics sensing with a ML‐based decoder is described in application to CCC studies in Chapter 14.

      1.9.1 Stochastic Carrier Wave (SCW) Analysis – Nanoscope Signal Analysis

      The Nanoscope described in Chapter 14 builds from nanopore detection with introduction of reporter molecules to arrive at a nanopore transduction detection paradigm. By engineering reporter molecules that produce stationary statistics (a SCW) together with ML signal analysis methods designed for rapid analysis of such signals, we arrive at a functioning “nanoscope.”

      Nanopore detection is made possible by the following well‐established capabilities: (i) classic electrochemistry; (ii) pore‐forming protein toxin in a bilayer; and (iii) patch clamp amplifier. Nanopore transduction detection leverages the above detection platform with (iv) an event‐transducer pore‐blockader that has stationary statistics and (v) ML tools for real‐time SCW signal analysis. The meaning of “real‐time” is dependent on the application. In the Nanoscope implementation discussed in Chapter 14, each signal is usually identified in less than 100 ms, where calling accuracy is 99.9% if rejection is employed, and improved even further if signal sample duration, when a call is forced, is used with duration greater than 100 ms.

      Nanopore transduction detection offers prospects for highly sensitive and discriminative biosensing. The NTD “Nanoscope” functionalizes a single nanopore with a channel current modulator that is designed to transduce events, such as binding to a specific target. Nanopore event transduction involves