Lillian Pierson

Data Science For Dummies


Скачать книгу

calculus helps as well. Foolish or not, it’s my high hope that all readers have subject matter expertise to which they can apply the skills presented in this book. Because data scientists need to know the implications and applications of the data insights they derive, subject matter expertise is a major requirement for data science.

      As you make your way through this book, you see the following icons in the margins:

      

The Tip icon marks tips (duh!) and shortcuts you can use to make subject mastery easier.

      

Remember icons mark information that’s especially important to know. To siphon off the most important information in each chapter, just skim the material represented by these icons.

      

The Technical Stuff icon marks information of a highly technical nature that you can normally skip.

      

The Warning icon tells you to watch out! It marks important information that may save you headaches.

      Data Science For Dummies, 3rd Edition, comes with a handy Cheat Sheet that lists helpful shortcuts as well as abbreviated definitions for essential processes and concepts described in the book. You can use this feature as a quick-and-easy reference when doing data science. To download the Cheat Sheet, simply go to www.dummies.com and search for data science for dummies cheat sheet in the Search box.

      If you’re new to data science, you’re best off starting from Chapter 1 and reading the book from beginning to end. If you already know the data science basics, I suggest that you read the last part of Chapter 1, skim Chapter 2, and then dig deep into all of Parts 3 and 4.

      Getting Started with Data Science

       Get introduced to the field of data science.

       Delve into vital data engineering details.

       Discover your inner data superhero archetype.

      Wrapping Your Head Around Data Science

      IN THIS CHAPTER

      

Deploying data science methods across various industries

      

Piecing together the core data science components

      

Identifying viable data science solutions to business challenges

      

Exploring data science career alternatives

      For over a decade now, everyone has been absolutely deluged by data. It’s coming from every computer, every mobile device, every camera, and every imaginable sensor — and now it’s even coming from watches and other wearable technologies. Data is generated in every social media interaction we humans make, every file we save, every picture we take, and every query we submit; data is even generated when we do something as simple as ask a favorite search engine for directions to the closest ice cream shop.

      Although data immersion is nothing new, you may have noticed that the phenomenon is accelerating. Lakes, puddles, and rivers of data have turned to floods and veritable tsunamis of structured, semistructured, and unstructured data that’s streaming from almost every activity that takes place in both the digital and physical worlds. It’s just an unavoidable fact of life within the information age.

      In its truest form, data science represents the optimization of processes and resources. Data science produces data insights — actionable, data-informed conclusions or predictions that you can use to understand and improve your business, your investments, your health, and even your lifestyle and social life. Using data science insights is like being able to see in the dark. For any goal or pursuit you can imagine, you can find data science methods to help you predict the most direct route from where you are to where you want to be — and to anticipate every pothole in the road between both places.

      The terms data science and data engineering are often misused and confused, so let me start off by clarifying that these two fields are, in fact, separate and distinct domains of expertise. Data science is the computational science of extracting meaningful insights from raw data and then effectively communicating those insights to generate value. Data engineering, on the other hand, is an engineering domain that’s dedicated to building and maintaining systems that overcome data processing bottlenecks and data handling problems for applications that consume, process, and store large volumes, varieties, and velocities of data. In both data science and data engineering, you commonly work with these three data varieties:

       Structured: Data that is stored, processed, and manipulated in a traditional relational database management system (RDBMS) – an example of this would be a MySQL database that uses a tabular schema of rows and columns, making it easier to identify specific values within data that’s stored within the database.

       Unstructured: Data that is commonly generated from human activities and doesn’t fit into a structured database format. Examples of unstructured data is data that comprises email documents, Word documents or audio / video files.

       Semistructured: Data that doesn’t fit into a structured database system but is nonetheless organizable by tags that are useful for creating a form of order and hierarchy in the data. XML and JSON files are examples of data that comes in semi-structured form.