dual‐route models. The authors assert the value of maintaining the approach but improving it.
Nested, incremental theorizing. Each model should account for the same phenomena as the previous one as well as new ones. Each model should bear a transparent relation to the ones it supersedes.
Emphasis on critical findings that can discriminate between theories.
Evaluating performance using a single “gold standard” study of a phenomenon.
Taking the number of phenomena a model simulates as the main criterion for success.
Regarding continuity, Perry et al. (2007) correctly noted the historical importance of the dual‐route theory and expressed the goal of extending the lineage. However, theories are mainly judged by criteria such as whether they address important questions, explain empirical phenomena in principled ways, and yield insights that advance understanding. Elements of previous models are worth retaining if they advance these goals, not merely out of allegiance. As we have observed, the dual‐route model’s defining assumptions are incompatible with basic behavioral phenomena. The value of maintaining continuity with the approach is therefore unclear.
In practice, the hybrid models are mainly noteworthy for replacing the GPC nonlexical route with networks that are variants of the connectionist models. The orthography➔phonology pathway includes layers of units with weights on connections between them adjusted by a connectionist learning procedure. As in the SM89 and later models, the orthography➔phonology network in Perry et al. (2007) generated correct pronunciations for almost all words (about 90% of those tested), including regular and exception words, as well as nonwords. It is therefore categorically unlike the nonlexical route in dual‐route models, which could not pronounce exceptions to the rules, necessitating a second, lexical route. With the extirpation of these rules, and an orthography➔phonology network that correctly reads 90% of all words, that rationale no longer applies. In fact, the lexical route plays little role in the hybrid models. Its main function is to fulfill the a priori commitment to continuity with DRC.
A better approach would be to determine if any empirical phenomena demand the inclusion of a lexical route in the hybrid model. In Perry et al.’s (2007) model, the lexical route is used to store the frequency of each word (estimated from norms), as a parameter on each lexical node. However, this is unnecessary. Word frequency effects arise in connectionist networks employing distributed representations because the number of times a word is presented affects the settings of the weights. The effects arise from learning and using words and are modulated by similarities across words. Thus, the effects are dynamical rather than reflecting a fixed parameter. Similarly, continuity between models can be assessed by examining whether they account for phenomena in the same way. In the DRC models, regularity and consistency effects were attributed to conflicting output from the lexical and nonlexical routes. The CDP models retain the lexical route, but not its role in producing these effects, which arise wholly within their connectionist network. The lexical route is a vestigial organ whose removal has little impact on performance.
The “nested incremental” modeling claim is puzzling. The idea is that each model should account for the same phenomena as a previous model, plus additional ones, with clear explanations for how the model was changed. However, the DRC models did not produce correct simulations of basic phenomena to build on. Moreover, Perry et al. (2007) also asserted that modeling should focus on “critical phenomena.” We are sympathetic to this view: The goal is to identify essential properties of word recognition, not to simulate as many effects as possible, which can include ones that are artifactual or unrepresentative. However, the focus on “critical phenomena” resulted in dropping numerous effects from consideration (e.g., pseudohomophone effects, position of irregularity effects), allowing the researchers to focus on improving the treatment of consistency, regularity, and nonword pronunciation. Thus, the models are not “nested” with respect to coverage of the data.
Finally, Perry et al. (2007) embraced Coltheart et al.’s (2001) strategy of evaluating models using a benchmark study of each phenomenon. Why studies such as Paap and Noel (1991) and Weekes (1997) are treated as “gold standards” is unclear. Their methods were not more advanced than in other research, and their results were not highly representative. The “gold standard” approach also obviates the requirement to report other simulations that fail, the “file‐drawer” issue again. These weak criteria for assessing model performance also vitiate the importance assigned to the number of phenomena simulated.
Philosophy aside, how well does the CDP+ model perform? We have conducted numerous simulations with it that can be repeated using publicly available data (see archive). The picture is mixed. The model produces consistency effects for words, whereas the DRC model did not. That is an advance. It produces the consistency effect in their “gold standard” study (an experiment by Jared, 2002), but missimulates other studies, including the Jared (1997) study that Coltheart et al. (2001) took as their benchmark study. The model performs much better on nonwords than the DRC, reproducing the nonword consistency effects from Glushko (1979); see Pritchard et al. (2012) for other concerns, however. Like DRC, CDP+ produces an overall length effect for nonwords but not words, but misses the effect for lower frequency words.
In general, the CDP+ model yields better coverage of basic naming phenomena than the DRC model, although some anomalies remain. The CDP+ model performs better because it employs an orthography➔phonology processing architecture that incorporates properties that were central to the triangle model’s successful simulation of the same phenomena. Although the developers of the hybrid models emphasize their continuity with the dual‐route approach, what is striking is their similarity to the triangle approach. The connectionist network in the Perry et al. (2007) model was correct on about 90% of the tested words; the Perry et al. (2019) version learned 80% of the words in a 32,000‐word vocabulary. As they noted, “The remaining 20% are too irregular (e.g., yacht, aisle, chef) to be learned through decoding” (p. 387). The words it misses require input from another source, for which they use a phonological lexicon. The similarities to the division of labor account developed by Plaut et al. (1996) are clear, though unacknowledged. In that theory, the additional input arrives from the orthography➔semantics➔phonology parts of the triangle. This account has several advantages over the CDP+ approach, using a phonological lexicon. It correctly predicts the semantic effects on reading aloud previously discussed. It employs the same components and learning procedure as the rest of the model, rather than requiring a second type of architecture as in the hybrid models. The inclusion of semantics and mappings to orthography and phonology are needed for independent reasons (e.g., their roles in comprehension, disambiguation of homophones such as plane – plain, spelling). Finally, the orthography➔semantics computation is “lexical” in the sense that people can only learn such mappings for words, and frequency has a bigger impact on these mappings compared to orthography➔phonology because they are mostly arbitrary (e.g., cat could as well have represented dog or lump), at least for monomorphemic words. Thus, the functions that hybrid models assign to the lexical route can be subsumed by the orthography➔semantics➔phonology parts of the triangle model, while accounting for additional phenomena.
This analysis of the lexical route as a placeholder for the orthography➔semantics➔phonology side of the triangle gains additional support from research by Perry et al. (2019). This implementation of the CDP+ model employed a simpler orthography➔phonology architecture than other CDP+ models: It is a two‐layer network with direct connections between orthography and phonology and no hidden layers. With reduced capacity this network can encode simple mappings but not more complex ones, increasing dependence on the lexical system. Perry et al. (2019) related this reduction in capacity to developmental dyslexia.
This is again the division of labor account from the triangle theory. Seidenberg and McClelland (1989) and Harm and Seidenberg (1999) reduced the capacity of the orthography➔phonology network by decreasing the number of hidden units rather than wholly eliminating them. The network is then limited to learning relatively simple spelling‐sound correspondences, requiring additional input from orthography➔semantics➔phonology. Plaut et al. (1996) provided simulations and formal analyses of these effects.