Dana K. Keller

The Tao of Statistics


Скачать книгу

Even with the most evenhanded intentions, unconscious biases can creep into even the best of research designs and processes. We will touch on this point several times in later chapters.

      The high school principal has student records in an electronic form, meaning that his data collection will be inexpensive. Having electronic student records also means that the principal has access to a wide variety of data for his students. Throughout the years, the school system has collected demographic data on its students. The principal also has the funds for conducting a survey on his essentially captive audience. Although he is not new as a principal, the extent of the electronic data available has him a bit intimidated. When he used to have to get the data from students’ physical files in the office, his “research” questions were quite modest and constrained. Now that he can get hundreds of times the amount of information with only a few mouse clicks, he is somewhat more reflective, less impulsive, less likely to “just run the data” than he had thought that he would be.

      The director of public health has all of the state’s Medicaid information available to her electronically, which has greatly expanded since the 2014 provisions of the ACA were implemented. She also is authorized to conduct a single, limited survey if it can be seamlessly appended to one that is currently required by the state. She has less information on each person than does the high school principal, but the information she has is for a much larger number of people. When she accesses the data warehouse, she always pulls highly detailed data (i.e., disaggregated). She knows that she can always collapse (i.e., aggregate) it later, but not the reverse.

      Having data for large numbers of people and access to computers allows her to address important public health questions that would have gone unanswered not many years ago. Just as the principal has access to far more information than he used to have, the director of public health had that increase in access several years earlier. She is used to the amount and has started to understand the data’s strengths and limitations.

      Both the principal and the director of public health face the issue of data privacy for the individuals for whom they have data, although being in public health, the director is under far more scrutiny for confidentiality than the principal is due to ever more challenging provisions of the Health Insurance Portability and Accountability Act (HIPAA). Well-established protocols exist for the proper handling of these issues for the principal, but the director finds herself challenged with a need to update her protocols almost annually. Remember, data privacy has ethical and legal standing. Expected processes and procedures exist and are also regularly updated for research involving people and their data. Keep the importance of this issue in mind when using or when reporting human subjects’ data. Ignorance is not a valid excuse, and the penalties for knowingly, or even unknowingly, releasing personal health information or personally identifiable information can be severe.

      4. Data—Measurement

       Perceivable

       Describable

       Scores

      If you can perceive it, you can measure it. A measurement is an assigned value for a single characteristic. The way a characteristic is captured and, therefore, the way its data should be interpreted determine the measure being used to address the question at hand. Some measures are more accurate than others. Perfect measurement exists only in fantasy; we do the best we can.

      Good measurement not only is sufficiently accurate but also places its objects into mutually exclusive categories or scores (or “codes”). Some measures divide people into categories, such as gender. Other measures are more abstract continua, such as perception scales that ask the extent to which a respondent agrees with a statement. Regardless of the type of measurement, sufficient accuracy and mutual exclusivity are needed. The rest of measurement is an extension of those simple concepts.

      Data . . . the who, what, where, when, why, and how. Put the pieces together like a jigsaw puzzle, and voilà, you have meaningful information from what was a pile of otherwise useless facts. Be careful: In statistics, as in a jigsaw puzzle, cutting corners and force-fitting pieces can result in a very misleading picture. These shortcuts are sometimes difficult to notice and even more difficult to resolve.

      Along with grades, the principal’s school keeps information on standardized test scores, disciplinary actions, health records, extracurricular activities (e.g., clubs), and sporting achievements. Depending on state and federal laws, the principal will have varying levels of access to student records. To try to avoid having to accommodate some of the more sensitive aspects of HIPAA, he tries to avoid using both health and personal information whenever possible in his work.

      The director of public health has access to all of the public health and some other state databases. Again, her access is legally limited because the data are about health issues, such as immunizations and outbreaks of certain reported diseases. State and federal laws are quite strict on the access to and use of these types of data.

      5. Data Structure—Levels of Measurement

       What can be built?

       Ask the ground

       Turn over rocks

       Dig in the dirt

      The grounding for statistics is the level of measurement of the data. Some statistics are appropriate for some levels of measurement; others are not. This is an area where one needs to understand the deeper structure of the data to know which statistics would be meaningful. For example, the data’s level of measurement limits the choice of the most often-used statistic—the average, what statisticians call the central tendency. There are three common choices of averages: the mean, median, and mode (with somewhat esoteric versions within each). These different types of averages are not equally appropriate for data at different levels of measurement. Specific levels of measurement and the impact of each on the choice of statistics will be discussed soon.

      The topic sounds complicated, but is not. Once you understand how data differ according to their level of measurement, you will quickly grasp which statistics are appropriate for a given set of conditions. Fortunately, many statistical techniques have options that can account for the various levels of measurement.

      Through the questions being asked by the high school principal and the director of public health, we will encounter four of levels of measurement (i.e., nominal, ordinal, interval, and ratio, to be explained next) in their various data sets or from potential survey responses. They, and we, will accommodate these levels of measurement as we progress through this book.

      Important to getting the statistics correct is the recognition and accommodation of each variable’s level of measurement. Even researchers with decades of experience occasionally will be embarrassed by having used a statistic in a way that was inconsistent with the level of measurement requirements of that statistic. Though an issue with level of measurement rarely is a knockout punch, these issues tend to be varyingly important limitations on the confidence that researchers (should) have in their results.

      5.A. Nominal

       Nominal says different

       No more does it claim

       Others shouldn’t either

      The nominal level of