Alex J. Gutman

Becoming a Data Head


Скачать книгу

the like are churning out critical thinkers at lightning speed. And if working in data is all about uncovering the truth, then Data Heads want to do just that.

      What does it mean, then, when they sit down to a project that doesn't whet their appetite? What does it mean for them to have to work on a poorly defined issue where their skills become bragging rights for executives but don't actually solve meaningful problems?

       Lack of clear question to answer (30.4% of respondents experienced this)

       Results not used by decision makers (24.3%)

       Lack of domain expert input (19.6%)

       Expectations of project impact (15.8%)

       Integrating findings into decisions (13.6%)

      This has obvious consequences. Those who aren't satisfied in their roles leave.

      The very premise and structure of this book is to teach you to ask more probing questions. It starts with the most important, and sometimes hardest, question: “What's the problem?”

      When these questions are answered, you are ready to get to work.

      1 1 A robust data strategy can help companies mitigate these issues. Of course, an important component of any data strategy is to solve meaningful problems, and that's our focus in this chapter. If you'd like to learn more about high-level data strategy, see Jagare, U. (2019). Data science strategy for dummies. John Wiley & Sons.

      2 2 2017 Kaggle Machine Learning & Data Science Survey. Data is available at www.kaggle.com/kaggle/kaggle-survey-2017. Accessed on January 12, 2021.

       “If we have data, let's look at data. If all we have are opinions, let's go with mine.”

       —Jim Barksdale, former Netscape CEO

      Many people work with data without having a dialect for it. However, we want to ensure we're all speaking the same language to make the rest of the book easier to follow. So, in this chapter, we'll give you a brief crash course on data and data types. If you've had a basic statistics or analytics course, you'll know the terms that follow but there may be parts of our discussion not covered in your class.

      The terms data and information are often used interchangeably. In this book, however, we make a distinction between the two.

      An Example Dataset

      A table of data, like Table 2.1, is called a dataset.

      Notice that it has both rows and columns that serve specific functions in how we understand the table. Each row of the table (running horizontally, under the header row) is a measured instance of associated information. In this case, it's a measured instance of information for a marketing campaign. Each column of the table (running vertically) is a list of information we're interested in, organized into a common encoding so that we can compare each instance.

      The rows of each table are commonly referred to as observations, records, tuples, or trials. Columns of datasets often go by the names features, fields, attributes, predictors, or variables.

      Know Your Audience

      Data is studied in many different fields, each with their own lingo, which is why there are many names for the same things. Some data workers, when talking about the columns in a dataset, might prefer “features” while others say “variables” or “predictors.” Part of being a Data Head is being able to navigate conversations within these groups and their preferences.

      A data point is the intersection of an observation and a feature. For example, 150 units sold on 2021-02-01 is a data point.

Date Ad Spending Units Sold Profit Location
2021-01-01 2000 100 10452 Print
2021-02-01 1000 150 15349 Online
2021-03-01 3000 200 25095 Television
2021-04-01 1000 175 12443 Online