Группа авторов

Handbook on Intelligent Healthcare Analytics


Скачать книгу

which makes the big data. Most of this data is unstructured. The five common big data characteristics are volume, variety, veracity, velocity, and value. Apart from the abovementioned five dimensions of big data, many researchers also added new dimensions of big data. Other dimensions of big data are volatility, validity, visualization, and variability. Big data is characterized by following commonly adopted V’s. Figure 3.1 and Table 3.1 represent the big data features in term of Vs [24].

Schematic illustration of dimensions of big data.
Dimensions of big data Description
Volume In general, volume refers to quantity or amount. The data that contains gigantic larger data sets is called volume in big data. Big data is known for its voluminous size. The data is being produced on a daily basis from various places.
Variety Meaning of variety is diversity. The data that are generated and collected from diversified sources is called a variety in big data. Usually, big data is in a variety of forms that comprises structured, semi-structured, and unstructured data.
Velocity Velocity means speed. In big data context, the velocity denotes the speed of data creation. Big data is arriving at a faster rate, like a stream of water.
Veracity The veracity is the credibility, reliability, and accuracy in big data and the quality of the data sources.
Variability The variability is not the same as variety. The variability means data that are regularly changing. This is an important feature of big data.
Visualization Big data is illustrated in the pictorial form using the data visualization tools.
Value The potential derived from the big data is known as value.

      3.3.1 Big Data Value Chain

      The value chain of big data is the process of creating potential value from the data. The following steps are the value chain process for creating knowledge insights from big data [12, 25, 29].

       • Production of big data from various sources

       • Collection of big data from various sources

       • Transformation of raw big data into information for storing and processing

       • Preprocessing of big data

       • Storage of big data

       • Data analytics

      Production of big data from various sources

      The production and generation of big data at various sources refers to big data generation. This is the initial step in the value chain process. Big data is known for its volume. The data is generated from various sources on a regular basis. Similarly, medical data is generated by the clinical notes, IoT, social media data, etc. Big data includes internal data of the industry and also includes internet and IoT data.

      Big Data Collection

      The second phase of the value chain process is the collection of data from a fusion of different places. Big data is generated at various sources. All these are in various formats like text, image, and email, which may be structured, semi-structured, and unstructured data. Big data collection process used to retrieve raw data from the various sources. The most common big data sources are computers, smartphones, and the Internet [14].

      Big Data Transmission

      Once the data collection is over, it should be transformed to the preprocessing infrastructure.

      Big Data Pre-Processing

      Big data is collected from a variety of sources. So, the data may consist of inconsistent, noisy, and redundant data. The data preprocessing phase improves the data quality by removing inconsistent, noisy, incomplete, and redundant data. This process improves the data integrity for data analysis.

      Big Data Storage

      The data should be stored for exploration. Big data refers to huge data that cannot be processed and stored using traditional information systems. The large amount of big data can be stored and handled easily with the help of big data storage technologies. All these technologies ensure data security and integrity.

      Big Data Analysis

      3.3.2 Big Data Tools and Techniques

      Technologies are used to analyze, process, and extract the useful insights available in complex raw data sets. The major technologies of big data are storage and analytics technologies. The most accepted big data tools are Apache Hadoop, Oracle NoSQL, etc. The technologies of big data are classified into operational and analytical.

      Operational big data: This data is generated on a daily basis. Ticket booking, social media, online shopping, etc., are examples of operational big data.

      Analytical big data: This data is nothing but analyzing the operational big data for real-time decisions. Examples of analytical big data are the medical field and weather forecast.

      The technologies and tools used in big data are classified on basis of following steps [11]:

       • Gathering of big data

       • Processing the big data

       • Storing the big data

       • Analyzing the big data

       • Visualizing the big data

      The Hadoop contains tools that are used to gather, preprocess, and investigate the huge volume of structured, semi-structured, and unstructured data. Hadoop is an open source, highly scalable, fault-tolerant, and cost-effective tool for faster big data processing.