Stephen Winters-Hilt

Informatics and Machine Learning


Скачать книгу

when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

       Library of Congress Cataloging‐in‐Publication Data applied for:

      ISBN: 9781119716747

      Cover Design: Wiley

      Cover Image: © agsandrew/Shutterstock

       This book is dedicated to my family: Cindy, Nathaniel, Zachary, Sybil, Teresa, Eric, and Josh.

      Preface

      The material in this book draws from undergraduate and graduate coursework taken while I was a student at Caltech, and from further graduate coursework and studies at Oxford, University of Wisconsin, and University of California, Santa Cruz. The material also draws upon my teaching experience and research efforts while a tenured professor of computer science at the University of New Orleans and jointly appointed as principal investigator and director of a protein channel biosensor lab at the Research Institute for Children at Children’s Hospital in New Orleans.

      In this text I provide a background on various methods from Informatics and Machine Learning (ML) that together comprise a “complete toolset” for doing data analytics work at all levels – from a first year undergraduate introductory level to advanced topics in subsections suitable for graduate students seeking a deeper understanding (or a more detailed example). Numerous prior book, journal, and patent publications by the author are drawn upon extensively throughout the text [1–68]. Part of the objective of this book is to bring these examples together and demonstrate their combined use in typical signal processing situations. Numerous other journal and patent publications by the author [69–100] provide related material, but are not directly drawn upon this text. The application domain is practically everything in the digital domain, as mentioned above, but in this text the focus will be on core methodologies with specific application in informatics, bioinformatics, and cheminformatics (nanopore detection, in particular). Other disciplines can also be analyzed with informatics tools. Basic questions about human origins (anthrogenomics) and behavior (econometrics) can also be explored with informatics‐based pattern recognition methods, with a huge impact on new research directions in anthropology, sociology, political science, economics, and psychology. The complete toolset of statistical learning tools can be used in any of these domains.

Schematic illustration of a Penrose tiling. A non-repeating tiling with two shapes of tiles, with 5-point local symmetry and both local and global (emergent) golden ratio.

      It is common to need to acquire a signal where the signal properties are not known, or the signal is only suspected and not discovered yet, or the signal properties are known but they may be too much trouble to fully enumerate. There is no common solution, however, to the acquisition task. For this reason the initial phases of acquisition methods unavoidably tend to be ad hoc. As with data dependency in non‐evolutionary search metaheuristics (where there is no optimal search method that is guaranteed to always work well), here there is no optimal signal acquisition method known in advance. In what follows methods are described for bootstrap optimization in signal acquisition to enable the most general‐use, almost “common,” solution possible. The bootstrap algorithmic method involves repeated passes over the data sequence, with improved priors, and trained filters, among other things, to have improved signal acquisition on subsequent passes. The signal acquisition is guided by statistical measures to recognize anomalies. Informatics methods and information theory measures are central to the design of a good finite state automata (FSAs) acquisition method, and will be reviewed in signal acquisition context in Chapters 24. Code examples are given in Python and C (with introductory Python described in Chapter 2 and Appendix A). Bootstrap acquisition methods may not automatically provide a common solution, but appear to offer a process whereby a solution can be improved to some desirable level of general‐data applicability.