Alan R. Simon

Data Lakes For Dummies


Скачать книгу

into your new data la...FIGURE 4-10: Your data lake feeding your data warehouse.FIGURE 4-11: Split-streaming data feeds to support both your data lake and your...FIGURE 4-12: Ongoing data interchange between your data lake and your data ware...FIGURE 4-13: A data lake that is much larger than a data warehouse.FIGURE 4-14: A data warehouse that is much larger than a data lake.FIGURE 4-15: Feeding external data into the data lake.FIGURE 4-16: On-demand access to external data for your analytics.FIGURE 4-17: Drilling-site sensors and a data lake at an energy exploration com...FIGURE 4-18: Edge analytics existing outside the control of the data lake.FIGURE 4-19: Remote data from edge analytics can also be sent to the data lake.

      5 Chapter 5FIGURE 5-1: Data flowing into your data lake bronze zone.FIGURE 5-2: Three different operational data feeds into your data lake bronze z...FIGURE 5-3: Multiple subscribers to sensor and video data streams.FIGURE 5-4: Using a streaming service to split-stream data into both a data lak...FIGURE 5-5: Under-the-covers “micro-batching” within streaming input to your da...FIGURE 5-6: The Lambda data ingestion architecture for your data lake.FIGURE 5-7: The Kappa data ingestion architecture for your data lake.FIGURE 5-8: Going for storage simplicity with only object storage in your bronz...FIGURE 5-9: Implementing a multi-component bronze zone.FIGURE 5-10: Ingesting data from a database: object storage versus database in ...FIGURE 5-11: Carrying a bronze zone database through to your data lake gold zon...FIGURE 5-12: Carrying bronze zone object storage through to your data lake gold...FIGURE 5-13: Going back to a database in a multi-component gold zone.FIGURE 5-14: Data streaming doing double duty as bronze zone storage for raw da...FIGURE 5-15: Three different models for linking your analytics with streaming d...

      6 Chapter 6FIGURE 6-1: Refining an image between the bronze zone and the silver zone.FIGURE 6-2: Enriching an image for storage in the data lake silver zone.FIGURE 6-3: Enriching a tweet by determining and attaching sentiment analysis.FIGURE 6-4: Building a master data taxonomy for your data lake.FIGURE 6-5: Decisions, decisions: What should you do with bronze zone data dest...FIGURE 6-6: Redefining your data lake zone boundaries rather than unnecessarily...FIGURE 6-7: Ingesting a raw tweet.FIGURE 6-8: Enriching a tweet followed by shifting your zone boundary rather th...FIGURE 6-9: Step 1: Ingesting raw data into your bronze zone.FIGURE 6-10: Step 2: Moving data into the silver zone rather than copying data.FIGURE 6-11: Deciding whether to keep a raw image after refinement and enhancem...FIGURE 6-12: Your data lake silver zone using Amazon S3.FIGURE 6-13: Dividing your silver zone content among three different flavors of...FIGURE 6-14: Carrying hierarchical storage back into your data lake bronze zone...FIGURE 6-15: Step 1: Refine and enrich an image in your data lake silver zone.FIGURE 6-16: Step 2: Move bronze zone image to S3 Glacier to save on storage co...

      7 Chapter 7FIGURE 7-1: Peeking inside the gold zone.FIGURE 7-2: Building a curated gold zone data package.FIGURE 7-3: Adding database data to object store data inside a gold zone curate...FIGURE 7-4: Using persistent data streams for your gold zone curated data.FIGURE 7-5: Using a specialized data store in your data lake gold zone.FIGURE 7-6: Relocating an infrequently used or retired data package to less-exp...

      8 Chapter 8FIGURE 8-1: Using the data lake sandbox for analytical development.FIGURE 8-2: Migrating curated data from the sandbox to the gold zone as analyti...FIGURE 8-3: Using a data lake sandbox to explore architectural options.FIGURE 8-4: Moving a graph database curated data package from the sandbox into ...FIGURE 8-5: Exploratory analytics and your data lake sandbox.

      9 Chapter 9FIGURE 9-1: Data lakes and passive analytics users.FIGURE 9-2: Light analytics user access to a data lake gold zone.FIGURE 9-3: Light analytics user access to a database within the data lake gold...FIGURE 9-4: A multistep gold zone integration process for a light analytics use...FIGURE 9-5: Using a data abstraction tool for data lake access simplicity.FIGURE 9-6: Using a data abstraction tool to integrate database and object data...

      10 Chapter 10FIGURE 10-1: Your hospital’s legacy systems environment.FIGURE 10-2: Selecting data mart dimensional models to retain for your new data...FIGURE 10-3: Replacing best-of-breed applications with an integrated EHR packag...FIGURE 10-4: Pairing your new EHR system with a data lake.FIGURE 10-5: Setting up curated data packages in your data lake gold zone.FIGURE 10-6: Delaying platform decisions until you gain a broader view of your ...FIGURE 10-7: Your EHR system using both streaming and batch feeds into your dat...FIGURE 10-8: Making key ingestion and bronze zone data set decisions.FIGURE 10-9: Streaming persistent data into the gold zone.FIGURE 10-10: Making different architectural decisions for various data streams...FIGURE 10-11: Putting your silver zone to work.FIGURE 10-12: Adding data pipelines to your data lake buildout.FIGURE 10-13: Bringing your data lake sandbox into the picture.

      11 Chapter 11FIGURE 11-1: Public versus private clouds: a visual analogy.FIGURE 11-2: Allocation of responsibilities for SaaS, PaaS, and IaaS.

      12 Chapter 12FIGURE 12-1: The fundamental structure of Amazon S3.FIGURE 12-2: Mimicking folders in Amazon S3 through filenames.FIGURE 12-3: Building your entire AWS data lake using only S3 for data storage.FIGURE 12-4: Using Glue Crawler and Glue Data Catalog to maintain up-to-date da...FIGURE 12-5: Using a Lake Formation blueprint for data lake ingestion.FIGURE 12-6: Using Amazon Kinesis Data Streams for hospital patient vital signs...FIGURE 12-7: Athena using the Glue Data Catalog to access S3 data with SQL.FIGURE 12-8: Using Amazon Redshift in your data lake’s gold zone.FIGURE 12-9: An end-to-end hospital data lake built on AWS services.

      13 Chapter 13FIGURE 13-1: Organization of the Azure cloud.FIGURE 13-2: An Azure data lake framework.FIGURE 13-3: ADLS Gen2, the best of both worlds.FIGURE 13-4: ADLS containers, folders, and files.FIGURE 13-5: Ingesting, copying, and sinking data along an ADF pipeline.FIGURE 13-6: Using Azure Event Hubs for a publish-and-subscribe model.FIGURE 13-7: Bidirectional messaging and streaming with Azure IoT Hub.FIGURE 13-8: Using Azure SQL Database in your Azure data lake.FIGURE 13-9: Azure data lake architecture for IoT analytics.FIGURE 13-10: Azure data lake architecture for industrial IoT predictive mainte...FIGURE 13-11: Azure data lake architecture for defect analysis and prevention.FIGURE 13-12: Azure data lake architecture for rideshare company forecasting.

      14 Chapter 14FIGURE 14-1: Your data lake four-element scorecard.FIGURE 14-2: Dividing each data lake evaluation criteria into scoreable element...FIGURE 14-3: Focus only on your raw data.FIGURE 14-4: Identifying your raw data hot spots.FIGURE 14-5: Diving deep into your data lake’s quality and governance.FIGURE 14-6: The ominous results.FIGURE 14-7: Grading your data velocity and latency.FIGURE 14-8: Good news on the data velocity and latency front.FIGURE 14-9: Grading your component architecture.FIGURE 14-10: Bringing together all of your data lake evaluation scores.

      15 Chapter 15FIGURE 15-1: The current hospital operational applications.FIGURE 15-2: Peer analytical solutions, one for administrative data and one for...FIGURE 15-3: A downstream data warehouse taking feeds from both Hadoop and AWS.FIGURE 15-4: The current state survey results.FIGURE 15-5: Cataloging and assigning data lake issues.FIGURE 15-6: A two-step process to migrate the hospital’s entire data lake onto...FIGURE 15-7: Introducing streaming to benefit both the medical operations appli...FIGURE 15-8: Adding shells for the silver and gold zones.FIGURE 15-9: Adding a data warehouse component into the overall data lake archi...FIGURE 15-10: Placing master data management in your silver zone.FIGURE 15-11: Addressing the data warehouse–versus–data lake controversy withou...FIGURE 15-12: The data lake remediation timeline.FIGURE 15-13: The inevitable trio of technology, human and organizational facto...

      16 Chapter 16FIGURE 16-1: The starting point for the operating room efficiency study.FIGURE 16-2: The first data pipeline to feed existing raw data into a curated g...FIGURE 16-3: Batch ETL of patient bedside data in the current hospital data lak...FIGURE 16-4: Streaming data and streaming analytics for the real-time patient d...FIGURE 16-5: Emergency room data fed through the bronze zone into the silver zo...FIGURE 16-6: Building the first emergency room and inpatient cross-reference wi...FIGURE 16-7: Replacing a batch data feed with split-streaming.FIGURE 16-8: The starting point for analyzing message content versus patient ou...FIGURE 16-9: Building a batch interface between the app and the data lake for m...FIGURE 16-10: Enriching semi-structured data and then repositioning the data in...FIGURE 16-11: Completing the curated data package and the associated analytics.

      17 Chapter 17FIGURE 17-1: Dividing your current-state assessment into data and analytics.FIGURE 17-2: Harvey balls for scoring.FIGURE 17-3: Parallel paths of your analytics assessment.FIGURE 17-4: A sample analytics scorecard.FIGURE 17-5: Your data architecture