Seifedine Kadry

Big Data


Скачать книгу

target="_blank" rel="nofollow" href="#ulink_6cc96234-007f-53a8-97c3-99bf543dfec5">Table 1.2 Data Mining vs. Big Data.

S. No. Data mining Big data
1) Data mining is the process of discovering the underlying knowledge from the data sets. Big data refers to massive volume of data characterized by volume, velocity, and variety.
2) Structured data retrieved from spread sheets, relational databases, etc. Structured, unstructured, or semi‐structured data retrieved from non‐relational databases, such as NoSQl.
3) Data mining is capable of processing large data sets, but the data processing costs are high. Big data tools and technologies are capable of storing and processing large volumes of data at a comparatively lower cost.
4) Data mining can process only data sets that range from gigabytes to terabytes. Big data technology is capable of storing and processing data that range from petabytes to zettabytes.

image

      1.4.1 Volume

      Data generated and processed by big data are continuously growing at an ever increasing pace. Volume grows exponentially owing to the fact that business enterprises are continuously capturing the data to make better and bigger business solutions. Big data volume measures from terabytes to zettabytes (1024 GB = 1 terabyte; 1024 TB = 1 petabyte; 1024 PB = 1 exabyte; 1024 EB = 1 zettabyte; 1024 ZB = 1 yottabyte). Capturing this massive data is cited as an extraordinary opportunity to achieve finer customer service and better business advantage. This ever increasing data volume demands highly scalable and reliable storage. The major sources contributing to this tremendous growth in the volume are social media, point of sale (POS) transactions, online banking, GPS sensors, and sensors in vehicles. Facebook generates approximately 500 terabytes of data per day. Every time a link on a website is clicked, an item is purchased online, a video is uploaded in YouTube, data are generated.

      1.4.2 Velocity

image

      1.4.3 Variety

image

      Multiple disparate data sources are responsible for the tremendous increase in the volume of big data. Much of the growth in data can be attributed to the digitization of almost anything and everything in the globe. Paying E‐bills, online shopping, communication through social media, e‐mail transactions in various organizations, a digital representation of the organizational data, and so forth, are some of the examples of this digitization around the globe.

       Sensors: Sensors that contribute to the large volume of big data are listed below.Accelerometer sensors installed in mobile devices to sense the vibrations and other movements.Proximity Sensors used in public places to detect the presence of objects without physical contact with the objects.Sensors in vehicles and medical devices.

       Health care: The major sources of big data in health care are:Electronic Health Records (EHRs) collect and display patient information such as past medical history, prescriptions by the medical practitioners, and laboratory test results.Patient portals permit patients to access their personal medical records saved in EHRs.Clinical data repository aggregates individual patient records from various clinical sources and consolidates them to give a unified view of patient history.

       Black box: Data are generated by the black box in airplanes, helicopters, and jets. The black box captures the activities of flight, flight crew announcements, and aircraft performance information.Figure 1.5 Sources of big data.

       Web data: Data generated on clicking a link on a website is captured by the online retailers. This is perform click stream analysis to analyze customer interest and buying patterns to generate recommendations based on the customer interests and to post relevant advertisements to the consumers.

       Organizational data: E‐mail transactions and documents that are generated within the organizations together contribute to the organizational data.