Tormod Næs

Multiblock Data Fusion in Statistics and Machine Learning


Скачать книгу

section id="u7af12078-ec0f-5b68-a3b3-69df8f500628">

      

      Applications in the Natural and Life Sciences

       Age K. Smilde

      Swammerdam Institute for Life Sciences, University of Amsterdam,

      Amsterdam, NL and

      Simula Metropolitan Center for Digital Engineering, Oslo, NO

       Tormod Næs

      Nofima

       Ås, NO

       Kristian Hovde Liland

      Norwegian University of Life Sciences

       Ås, NO

      © 2022 John Wiley & Sons Ltd

      All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

      The right of Age K. Smilde, Tormod Næs and Kristian Hovde Liland to be identified as the authors of this work has been asserted in accordance with law.

       Registered Offices

      John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

      John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

       Editorial Office

      The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

      For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

      Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats.

       Limit of Liability/Disclaimer of Warranty

      In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of experimental reagents, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

      A catalogue record for this book is available from the Library of Congress

      Hardback ISBN: 9781119600961; ePDF ISBN: 9781119600985; epub ISBN: 9781119600992;

      Obook ISBN: 9781119600978

      Cover image: © Professor Age K. Smilde

      Cover design by Wiley

      Set in 10/12pt WarnockPro-Regular by Integra Software Services Pvt. Ltd, Pondicherry, India

      1  Cover

      2  Title page

      3  Copyright

      4  Foreword

      5  Preface

      6  List of Figures

      7  List of Tables

      8 Part I Introductory Concepts and Theory1 Introduction1.1 Scope of the Book1.2 Potential Audience1.3 Types of Data and Analyses1.3.1 Supervised and Unsupervised Analyses1.3.2 High-, Mid- and Low-level Fusion1.3.3 Dimension Reduction1.3.4 Indirect Versus Direct Data1.3.5 Heterogeneous Fusion1.4 Examples1.4.1 Metabolomics1.4.2 Genomics1.4.3 Systems Biology1.4.4 Chemistry1.4.5 Sensory Science1.5 Goals of Analyses1.6 Some History1.7 Fundamental Choices1.8 Common and Distinct Components1.9 Overview and Links1.10 Notation and Terminology1.11 Abbreviations2 Basic Theory and Concepts2.i General Introduction2.1 Component Models2.1.1 General Idea of Component Models2.1.2 Principal Component Analysis2.1.3 Sparse PCA2.1.4 Principal Component Regression2.1.5 Partial Least Squares2.1.6 Sparse PLS2.1.7 Principal Covariates Regression2.1.8 Redundancy Analysis2.1.9 Comparing PLS, PCovR and RDA2.1.10 Generalised Canonical Correlation Analysis2.1.11 Simultaneous Component Analysis2.2 Properties of Data2.2.1 Data Theory2.2.2 Scale-types2.3 Estimation Methods2.3.1 Least-squares Estimation2.3.2 Maximum-likelihood Estimation2.3.3 Eigenvalue Decomposition-based Methods2.3.4 Covariance or Correlation-based Estimation Methods2.3.5 Sequential Versus Simultaneous Methods2.3.6 Homogeneous Versus Heterogeneous Fusion2.4 Within- and Between-block Variation2.4.1 Definition and Example2.4.2 MAXBET Solution2.4.3 MAXNEAR Solution2.4.4 PLS2 Solution2.4.5 CCA Solution2.4.6 Comparing the Solutions2.4.7 PLS, RDA and CCA Revisited2.5 Framework for Common and Distinct Components2.6 Preprocessing2.7 Validation2.7.1 Outliers2.7.1.1 Residuals2.7.1.2 Leverage2.7.2 Model Fit2.7.3 Bias-variance Trade-off2.7.4 Test Set Validation2.7.5 Cross-validation2.7.6 Permutation Testing2.7.7 Jackknife and Bootstrap2.7.8 Hyper-parameters and Penalties2.8 Appendix3 Structure of Multiblock Data3.i General Introduction3.1 Taxonomy3.2 Skeleton of a Multiblock Data Set3.2.1 Shared Sample Mode3.2.2 Shared Variable Mode3.2.3 Shared Variable or Sample Mode3.2.4 Shared Variable and Sample Mode3.3 Topology of a Multiblock Data Set3.3.1 Unsupervised Analysis3.3.2 Supervised Analysis3.4 Linking Structures3.4.1 Linking Structure for Unsupervised Analysis3.4.2 Linking Structures