Robert J. Moffat

Planning and Executing Credible Experiments


Скачать книгу

representative of the IAU population was the poll? For a sample size of 424 randomized members out of 11,000, a typical confidence interval could be the claimed 5%. The confidence interval surrounds an estimate, however a voice vote denies one. With a savvy, engaged population of size 11,000, why not take a vote of the full IAU population? Why rely on a poll to redefine international standards?

      The IAU clearly has the right to set its own standards and definitions. Via the 2006 Pluto poll however, the IAU leaders decided for discriminating communication within its own community. How well did the IAU poll represent all science teachers when it demoted Pluto? By its decision, one consequence was to make existing text books obsolete, affecting the budgets of school systems worldwide.

      Did the IAU leadership poorly serve its members, other scientists, and students across the world by relying on a poll instead of a full vote? From history, can we find examples where polls with small margins of error failed to predict the vote?

      Another survey of note, recently conducted by the journal Nature, hearkened back to the Reproducibility Crisis announced by Ioannidis. Nature published (2016) a survey of researchers across the living and hard sciences asking “Is There a Reproducibility Crisis?” The author, Monya Baker, highlights a contradiction within the survey (Baker 2016): although researchers expected high confidence in their own fields, they admitted an inability to replicate results a majority of the time.

      Physical measurements are subject to errors from many sources. Try as we may, we can never entirely correct for all the possible error sources. Sometimes we may simply decide it is too expensive to correct for a small possible error. Sometimes, we don’t know an error is present. In the best of situations, we must admit that even our corrected value is probably not exact. To deal with this situation we need a quantitative way to estimate the possible residual error and to describe it to potential users of the data. This problem was recognized by Airy (1879), who used the term “uncertainty,” which he defined as “The possible value that the residual error may have.”

      Most engineering experiments involve several measurements, and the result of the experiment is derived from the measured values using a set of equations called the “data interpretation program.” The challenge is to estimate the uncertainty in the derived result as a consequence of the recognized uncertainties in the measurements.

      Low uncertainty, by itself, is no assurance of accuracy in our results. The mathematical process by which we estimate the uncertainty in the result is referred to as “uncertainty analysis.”

      Uncertainty analysis is a powerful tool that, properly used, can help an experimenter develop a credible experiment. For example, we can use uncertainty analysis during the planning phase in order to select the approach offering the least uncertainty. We can use it to choose the most appropriate instruments. In the “shakedown and debugging phase” of an experiment, it helps to attribute which residual errors cause differences between our result and the expected result.

      Uncertainty analysis was introduced to the American technical literature in a paper by Kline and McClintock (1953). Uncertainty analysis focuses on estimating how much uncertainty there is in the derived result as a consequence of the acknowledged uncertainties in the measurements. It follows a well‐defined set of operations involving the data interpretation equations and the input data. The basic mathematics have not changed since that first description, but the techniques have been elaborated and extended considerably. See references at the end of this chapter and chapters 10, 11.

      Uncertainty analysis is an essential element in experiment planning, before and during execution. Uncertainty analysis is a tool by which we can identify the significance of small changes in the output. If the observed difference is larger than the derived uncertainty, this is evidence that the process being studied is not well modeled by the equations used – in other words, something is going on that we don't know about. Chapters 10 and 11 are devoted to uncertainty analysis.

      Reporting the uncertainty of every reported value is essential for others to believe our science.

      1 Airy, S.G.B. (1879). Theory of Errors of Observation. London, UK: Macmillan and Company.

      2 Baals, D.D. and Corliss, W.R. (1981). Wind Tunnels of NASA. NASA SP‐440.

      3 Baker, M. (2016). Is there a reproducibility crisis? Nature 533: 452–454.

      4 Button, K.S., Ioannidis, J.P.A., Mokrysz, C. et al. (2013). Power failure: why small sample size undermines the reliability of neuroscience.

      5 Feynman, R. (1963). The Feynman Lectures on Physics, Volumes 1–3. MA: Addison‐Wesley.

      6 International Astronomical Union (2006). IAU 2006 General Assembly: Result of the IAU Resolution votes. https://www.iau.org/news/pressreleases/detail/iau0603.

      7 Ioannidis, J.A. (2005a). Contradicted and initially stronger effects in highly cited clinical research. JAMA 294 (2): 218–228. https://doi.org/10.1001/jama.294.2.218. PMID 16014596.

      8 Ioannidis, J.P.A. (2005b). Why most published research findings are false. PLoS Medicine 2 (8): e124.

      9 Ioannidis, J.P.A. (2012). Why science is not necessarily self‐correcting. Perspectives on Psychological Science 7 (6): 645–654.

      10 Kline, S.J. and McClintock, F.A. (1953). Describing the uncertainties in single‐sample experiments. Mechanical Engineering 75: 3–8.

      11 Muller, R. (2016). Now: The Physics of Time. NY: W.W. Norton & Company Ltd. ISBN‐13: 978‐0393285239.

      12 Penrose, R. (2005). The Road to Reality: A Complete Guide to the laws of the Universe. Knopf. ISBN‐13: 978‐0679454434.

      13 Penrose, R. (2016). Fashion, Faith, and Fantasy in the New Physics of the Universe. NJ: Princeton University Press. ISBN‐13: 978‐0‐691‐11979‐3.

      14 Penzias, A.A. and Wilson, R.W. (1965). A Measurement of Excess Antenna Temperature at 4080 Mc/s. Astrophysical Journal 142: 419–421.

      15 Pomeroy, S.R. (2012). The key to science (and life) is being wrong. Scientific American. https://blogs.scientificamerican.com/guest‐blog/the‐key‐to‐science‐and‐life‐is‐being‐wrong/.

      16 Quantum Coffee (2014). Nobel Prizes in physics: Theorists vs. experimentalists. https://quantumcoffee.wordpress.com/2014/06/08/nobel‐prizes‐in‐physics‐theorists‐vs‐experimentalists

      17 Simera, I., Moher, D., Hoey, J. et al. (2010). A catalogue of reporting guidelines for health research. European Journal of Clinical Investigation 40 (1): 35–53.

      18 Skedung, L., Arvidsson, M., Chung, J.Y. et al. (2013). Feeling Small: Exploring the Tactile Perception Limits. Scientific Reports 3: 2617.

      19 Smolin, L. (2006). The Trouble with Physics: The Rise of String Theory, the Fall of a Science, and