Alan R. Simon

Data Lakes For Dummies


Скачать книгу

it came to analytics and data — between your IT organization and the business users who are supposed to be their customers. Not good!

      The data lake presents your organization with an opportunity for a fresh start. You can apply many of the best practices and also the painful lessons from 30-plus years of data warehousing to your data lake efforts and avoid repeating the mistakes and shortcomings of the past. As your data lake gets built, no matter if you’re on the IT side or the business side of your company, you can help rebuild that essential trust, especially when it comes to all-important analytics and the resulting data-driven insights.

      Sounds like a great idea, right?

Schematic illustration of the vision of an enterprise data warehouse.

      FIGURE 2-1: The vision of an enterprise data warehouse.

      Dealing with the data fragmentation problem

      Okay, so maybe the idea of “Do your own thing, and build your own data mart” got out of control. Now that you can see what a mess that approach created, why not just retire those data marts and fold them into your enterprise data warehouse that’s probably underutilized?

      

A collection of independent data marts is almost always hampered by a lack of common master data (for example, to sales, a “customer” may be something different than a “customer” is to your marketing team), different software packages and technologies across the data marts, and other challenges. Taken together, these challenges make it almost impossible to consolidate separate, independent data marts back into a single data warehouse. Most organizations instead throw their hands up in the air and say that they’re following a federated data warehouse approach. You “create” a federated data warehouse by simply declaring that some or all of your data marts are part of a “federation” that, when considered together, are sort of like a data warehouse. “Um … yeah, that’s our story, and we’re sticking to it. It’s magic!” (Not really … and not all that valuable from an enterprise-wide perspective.)

Schematic illustration of the reality of numerous stand-alone data marts.

      FIGURE 2-2: The reality of numerous stand-alone data marts.

      Decision point: Retire, isolate, or incorporate?

      What should you do about your proliferation of data marts now that your organization is building a data lake? The short answer: Get rid of the data marts … or at least most of them!

      You have three main options for how to deal with your proliferation of independent data marts as part of your data lake initiative:

       Retire some or all of the data marts, and replace them with data lake functionality.

       Isolate some of the data marts, and leave them in place alongside your new data lake.

       Incorporate some of your data marts as components of your data lake.

      Data mart retirement

      If your existing data marts are creaking and groaning and are now coming up short even for the analytical needs of their respective users, here’s a great idea: Get rid of them!

Schematic illustration of using a data lake to retire data marts.

      FIGURE 2-3: Using a data lake to retire data marts.

      

Chances are, most of your data marts, especially those that have been around for a while, support descriptive analytics (basic business intelligence functions such as drilling deeper into summarized data to gain additional insights from lower levels of your data). But what about advanced analytical needs such as machine learning or other data mining and artificial intelligence–enabled analytical needs? Probably not so much!

      So, why keep those aging data marts around? Redirect the data feeds from your source systems into your new data lake, and rebuild your analytics for accounting, your human resources (HR) organization, sales and marketing, and other parts of your enterprise within the data lake environment.

      Data mart isolation

      What if one of your existing data marts is an absolute work of genius? Suppose that three or four years ago, your company built a data mart to support your annual strategic planning cycle. Your strategic planning data mart has data feeds from numerous applications and systems around your enterprise. Do you really want to reinvent the wheel just because you’re now building a data lake?

      Great news: You don’t have to throw away your data mart baby along with the data lake water! (Okay, maybe not the best metaphor, but you get the idea.)

Schematic illustration of leaving a data mart intact and alongside your data lake.

      FIGURE 2-4: Leaving a data mart intact and alongside your data lake.

      Data mart incorporation

Schematic illustration of incorporating a data mart into the data lake.

      FIGURE 2-5: Incorporating a data mart into your data lake.