a finding so important it is worth changing one’s practice. Although there are many treatments, pharmaceuticals, and practices associated with statistically significant differences, most of those changes are insufficient to lead doctors to prescribe a different drug or use a different treatment modality (Leyva De Los Rios, 2017). This is because for every treatment a physician begins, there is usually another treatment that must be withdrawn. Similarly, starting a new reading program usually means withdrawing an old reading program. Therefore, the decision maker must consider not only the impact of adding a new program but also the impact of withdrawing the old program.
The failure to distinguish the importance of clinical significance compared to statistical significance was first illustrated during the Race to the Top era (2009–2017), during which federal grant incentives led districts to pile one program on top of another. Each program might have been significant when compared with no program, but almost none showed value when they were simply part of a constellation of many duplicative programs. I observed schools with three different data-analysis protocols, two different mathematics programs, and seven different literacy programs, which all vied for the time and attention of teachers who, not surprisingly, were unable to implement any of these new and expensive programs well. There were no grants for educational leaders who decided to stop doing something. My research (Reeves, 2011b), based on more than two thousand school plans, demonstrates that when schools have more than six instructional initiatives, student performance declines, even as those schools spend more money and acquire more programs. Moreover, the most fragmented schools—those burdened by more and more programs—are most likely to be schools with high percentages of students from low-income families, high percentages of students who are learning English, and high percentages of students with special needs (Reeves, 2011b). In brief, the schools most needing focused leadership are the least likely to have it. It is no wonder the programs claiming “significance” in the laboratory or another controlled setting did not exhibit similar results in the real world of teachers and students overwhelmed by multiple demands on their time from many different programs.
To be clear, I am not suggesting that the differences between the research claims of the advocates of educational programs and the reality teachers and administrators experience are due to malice or corruption on the part of vendors. Rather, I am making the observation that the environment the research salespeople cite may be substantially different from the environment of the practitioners who actually try to use these programs in the real world. Therefore, it is essential for people making buying decisions to inquire about the actual environment of the research. Moreover, major decisions about curriculum, assessment, instructional practices, leadership techniques, and financial commitments are always better informed if leaders follow the discipline of mutually exclusive decision making (Lafley, Martin, Rivkin, & Siggelkow, 2012). Good leaders can make bad decisions if they fail to practice the fundamental disciplines of gathering information and considering alternative hypotheses (Campbell, Whitehead, & Finkelstein, 2009).
In order to be more critical consumers of research, leaders must persistently ask, “If I am going to decide to implement X, then what will I give up—in time, money, and professional energy?” One of the great traps in this line of inquiry is the myth that because a grant funds a new initiative, it is therefore free. But no decision is free. Even if there is no impact on the budget, there is definitely an impact on time and attention. Because leaders cannot monitor and focus on more than about half a dozen major initiatives, every additional initiative beyond that threshold, even if it appears to be cost-free, not only takes a toll on the leader’s time and attention but also encroaches on every other initiative in the system.
The only remedy for this is organizational and leadership focus. Effective strategic plans are not merely an accumulation of programs and tasks to be implemented. Rather, strategy is also the art of deciding what not to do. My experience suggests that the primary complaint that teachers have is time. They are intelligent and hardworking, but simply overwhelmed with the sheer quantity of tasks they are expected to accomplish. Therefore, leaders recognize that time is a zero-sum game—every hour allocated to one task is an hour not available for another. The focused leader who, for example, wants to encourage collaborative scoring of student work in order to deliver consistent expectations of students and reliable scoring by teachers provides time in staff meetings and collaborative team meetings to accomplish those important tasks. That means that the focused leader is deciding not only that collaborative scoring is vital but also that competing activities in those meetings—like the primitive practice of making verbal announcements—will be discarded. While many leaders claim to value focus, few can articulate how they will save time by discontinuing announcements, stop the expectations that texts and emails be responded to within minutes of receipt, and ban classroom activities—such as twenty-year-old word search puzzles—that have zero educational value. When leaders decide what to stop doing, teachers know that they and their time are respected.
Argument 5: “Results Are Distorted Because the Criteria for Success Are Too Low”
Many studies of high-poverty schools’ success sometimes receive criticism because readers claim the schools’ standards for success are too low and not reflective of success in the real world. For example, the New York Regents exam provides a four-point scale for students, and a score of three or four is regarded as passing (Pondiscio, 2019). It could be argued that the bar should be higher, but the plain fact is that only a minority of urban schools in New York meet the standard of a three. In the original equity and excellence research (Reeves, 2004), the criterion for meeting standards is only at the basic level, which was the criteria at that time used by the state. Some critics have approached me in meetings and argued that this bar is too low to count as success. But in that review of 135 high-poverty schools, only seven met the basic criteria. Few people argue against setting the bar high for student achievement with classroom expectations to match, but when only seven of 135 schools meet a criterion, it seemed to me that it was, to put it mildly, evidence of comparative success. Thus, these U.S. schools met the state criteria at a far higher level than most schools with similar demographic characteristics. But the criticism is nevertheless well taken. In many states, students can score only 40 percent of the answers on the test correctly and still be labeled proficient. Part of the flaw in descriptions of proficient is that proficiency is a moving target. Even in states claiming a commitment to standards-based education for more than two decades, some change the cut scores—the percentage of correct answers the state deems adequate—every year. If too few students and schools do well, then the state lowers the cut score. If too many of the students and schools do well, then the state raises the cut score. This procedure is precisely the opposite of standards-based assessment. Safety professionals do not, for example, relax or strengthen the criteria for left-hand turns for teenage drivers or safe landings for pilots based on annual variations in other drivers’ or pilots’ performance. The standard is the standard. This is also a reason the best and most reliable measurements of student achievement are based on consistent criteria during the same year in a class with largely the same students, teacher, curriculum, and assessments. Although this emphasis on consistency is imperfect, it is far superior to attempts to draw inferences about student success when the tests and criteria for proficiency change from one year to the next, and when the students compared are also different from one year to the next.
In the absence of a national assessment of student performance, accompanied by a systematic analysis of teaching and leadership practices in every school, the best data that we have is that provided by districts and states. This leads to inevitable variation about what success really means. While that is a legitimate concern, it does not deny the fact that when the same assessment is given to a wide variety of students in the same subject and same grade with the same socioeconomic status, some do better than others. The explanation is neither money nor zip code, but teaching and leadership practices. The United States does provide the National Assessment of Educational Progress (NAEP), labeled the “Nation’s Report Card,” but it offers nothing in the way of school-by-school analysis of instructional and leadership practices.
Argument 6: “The Funding Is Higher for Successful Schools”
It is true that successful high-poverty schools often have higher levels of funding than other schools without high populations