calculus helps as well. Foolish or not, it’s my high hope that all readers have subject matter expertise to which they can apply the skills presented in this book. Because data scientists need to know the implications and applications of the data insights they derive, subject matter expertise is a major requirement for data science.
Icons Used in This Book
As you make your way through this book, you see the following icons in the margins:
The Tip icon marks tips (duh!) and shortcuts you can use to make subject mastery easier.
Remember icons mark information that’s especially important to know. To siphon off the most important information in each chapter, just skim the material represented by these icons.
The Technical Stuff icon marks information of a highly technical nature that you can normally skip.
The Warning icon tells you to watch out! It marks important information that may save you headaches.
Beyond the Book
Data Science For Dummies, 3rd Edition, comes with a handy Cheat Sheet that lists helpful shortcuts as well as abbreviated definitions for essential processes and concepts described in the book. You can use this feature as a quick-and-easy reference when doing data science. To download the Cheat Sheet, simply go to www.dummies.com
and search for data science for dummies cheat sheet in the Search box.
Where to Go from Here
If you’re new to data science, you’re best off starting from Chapter 1 and reading the book from beginning to end. If you already know the data science basics, I suggest that you read the last part of Chapter 1, skim Chapter 2, and then dig deep into all of Parts 3 and 4.
Part 1
Getting Started with Data Science
IN THIS PART …
Get introduced to the field of data science.
Delve into vital data engineering details.
Discover your inner data superhero archetype.
Chapter 1
Wrapping Your Head Around Data Science
IN THIS CHAPTER
Deploying data science methods across various industries
Piecing together the core data science components
Identifying viable data science solutions to business challenges
Exploring data science career alternatives
For over a decade now, everyone has been absolutely deluged by data. It’s coming from every computer, every mobile device, every camera, and every imaginable sensor — and now it’s even coming from watches and other wearable technologies. Data is generated in every social media interaction we humans make, every file we save, every picture we take, and every query we submit; data is even generated when we do something as simple as ask a favorite search engine for directions to the closest ice cream shop.
Although data immersion is nothing new, you may have noticed that the phenomenon is accelerating. Lakes, puddles, and rivers of data have turned to floods and veritable tsunamis of structured, semistructured, and unstructured data that’s streaming from almost every activity that takes place in both the digital and physical worlds. It’s just an unavoidable fact of life within the information age.
If you’re anything like I was, you may have wondered, “What’s the point of all this data? Why use valuable resources to generate and collect it?” Although even just two decades ago, no one was in a position to make much use of most of the data that’s generated, the tides today have definitely turned. Specialists known as data engineers are constantly finding innovative and powerful new ways to capture, collate, and condense unimaginably massive volumes of data, and other specialists, known as data scientists, are leading change by deriving valuable and actionable insights from that data.
In its truest form, data science represents the optimization of processes and resources. Data science produces data insights — actionable, data-informed conclusions or predictions that you can use to understand and improve your business, your investments, your health, and even your lifestyle and social life. Using data science insights is like being able to see in the dark. For any goal or pursuit you can imagine, you can find data science methods to help you predict the most direct route from where you are to where you want to be — and to anticipate every pothole in the road between both places.
Seeing Who Can Make Use of Data Science
The terms data science and data engineering are often misused and confused, so let me start off by clarifying that these two fields are, in fact, separate and distinct domains of expertise. Data science is the computational science of extracting meaningful insights from raw data and then effectively communicating those insights to generate value. Data engineering, on the other hand, is an engineering domain that’s dedicated to building and maintaining systems that overcome data processing bottlenecks and data handling problems for applications that consume, process, and store large volumes, varieties, and velocities of data. In both data science and data engineering, you commonly work with these three data varieties:
Structured: Data that is stored, processed, and manipulated in a traditional relational database management system (RDBMS) – an example of this would be a MySQL database that uses a tabular schema of rows and columns, making it easier to identify specific values within data that’s stored within the database.
Unstructured: Data that is commonly generated from human activities and doesn’t fit into a structured database format. Examples of unstructured data is data that comprises email documents, Word documents or audio / video files.
Semistructured: Data that doesn’t fit into a structured database system but is nonetheless organizable by tags that are useful for creating a form of order and hierarchy in the data. XML and JSON files are examples of data that comes in semi-structured form.
It used to be that only large tech companies with massive funding had the skills and computing resources required to implement data science methodologies to optimize and improve their business, but that’s not been the case for quite a while now.