Группа авторов

Handbook on Intelligent Healthcare Analytics


Скачать книгу

      Knowledge means facts, information, and skills obtained in the course of experience. Knowledge is nothing but better understanding of a subject. Healthcare practice is a knowledge based process. Giving the right knowledge during the decision-making process is very important. In healthcare, the knowledge is used for the diagnosis of the disease, the prescription of medicine and the clinical management.

Big data capability Primary technologies Developer Description
Big data collection Sqoop Apache Software Foundation The Apache Sqoop tool is used for transmitting the data among Apache Hadoop and structured databases like relational databases such as Oracle and MySQL. It exports and imports data among relational databases and the Hadoop file system.
Flume Apache Software Foundation The Apache Flume is a service for collecting, aggregating, and moving huge volumes of log files to centralized data stores. The log files are collected from the internet and aggregated in the Hadoop Distributed File System for investigation.
Kafka Apache Software Foundation The Apache Kafka is used to collect data and to perform real-time analysis. The Kafka is used by Uber, Netflix, and PayPal.
Big data storage Hadoop Distributed File System (HDFS) Apache Software Foundation The distributed storage system for Hadoop application is Hadoop Distributed File System (HDFS). This HDFS is a storage system based on a distributed file system. It is a majorcomponent of the Apache Hadoop. The HDFS accommodates large data sets. HDFS framework used for storing files in distributed environments.
Oracle Corporation NoSQL Database is also known as a non-SQL or non-relational database. It has a method for storing and retrieving data from other than tabular relational models. NoSQL is mainly used for storing big data. Unstructured data are managed by NoSQL using dynamic schemas.
Apache HBase Apache Software Foundation Apache HBase is a distributed and non-relational database in the Hadoop framework. HBase provides a way to store sparse data sets. It dynamically distributes the table when the data set is too big to manage.
Apache Cassandra Apache Software Foundation The Apache Cassandra database is a distributed database to manage huge amounts of data in several commodity software. Cassandra is a scalable, high-performance, NoSQL database.
Big data processing MapReduce Apache Software Foundation The Apache MapReduce is used for the distributed processing of large amounts of data in a trustworthy, error-tolerant way. Hadoop MapReduce is a programming language. This framework is used to write applications for managing and processing huge data.
Apache Software Foundation The ApacheHadoop framework is used for storing and processing big data in the cluster of computers that is in a distributed environment instead of a single computer. Hadoop uses a cluster of computers for analyzing large data sets in parallel. The Hadoop consists of following main layers: • Processing/computational layer: MapReduce• Storage layer: HDFS Hadoop MapReduce, Yet Another Resource Negotiator, Hadoop Common, and HDFS are important modules of Apache Hadoop framework. Apache Yarn: Yet Another Resource Negotiator used for resource management and job scheduling. YARN allows processing of HDFS data with different data processing engines. Hadoop Common consists of common libraries and utilities. These libraries and utilities are necessary for supporting the other modules of Hadoop.
Oracle bigdata connector, Oracle data integrator Oracle Corporation The Oracle big data connector is software that combines the Apache Hadoop and oracle database. Apache Hadoop is used for data accumulation and pre-processing. The connector integrates data in the oracle database with Apache Hadoop for the analysis purpose.
Statistical analysis capability R, Oracle R Enterprise Oracle Corporation R programming language is an open-source package for Statistical analysis.Oracle R enterprise refers to the oracle database for advanced analysis. It consists of R software and oracle database features. This feature is used by R users to access the database data without Structured Query Language.
Big data analytics Apache Hive Originally developed by Facebook and it’s an Apache licensed software. The Apache Hive is open-source software for data warehouses. This framework is used for storing, querying and analyzing the large data. Hive is built on Apache Hadoop. Hive framework is used to store and process large data sets quickly. Hive is used for summarizing the data, query and analyzing the data. Hive is written in java.

      The big data sets are analyzed to get the valuable information. The information is transformed into actionable insights. The actionable insights are turned into knowledge using analytical tools. This knowledge can be used for