the weight of evidence for any style shift can be considered cumulatively after any identified break in style.
A further point in TG’s preliminary evaluation is that each email text was relatively short. At the most basic level, the problem of dealing with short texts is that they do not provide the analyst with as much material as longer texts, from which distinctive and consistent features might be identified. Generally, more evidence is simply better.4 Slightly more technically, the issue is that linguistic observations in less text will give rise to fewer examples of the feature, and this means that generalization into a pattern of use will be less reliable.
For example, imagine trying to predict the bias of a weighted coin: if you flipped it only a few times you would be unlikely to be able to estimate the bias correctly, but if you flipped it a few hundred times you might have a very good estimate. The same thing happens when you measure the relative frequency of a word (i.e., its percentage out of the total words in the text). If one looks at a single, short sentence from a text, the word ‘the’ might occur once in five words, but we would not want to generalize from such an observation that the word occurs once every five words across the entire text. Only after we have seen a sufficient number of tokens or instances of a word can we start to make such estimations. Texts that are fewer than 500 words long are therefore generally seen as being too short for the application of stylometric approaches to authorship analysis (although recently this number has been decreasing; see Grieve et al., 2019), and, often in a forensic context, the entire data set might be smaller than this.
Finally, one last complication with this data was that, although it consisted of emails, the police provided us with access only to screenshots of the texts. Because these were simple images, they could not be automatically analyzed computationally. As a result, we needed to convert these images into text using optical character recognition software, which was a relatively time-consuming process and required thorough checking against the image files to ensure that even minor punctuation features were correctly digitized.
The outcome of TG’s evaluation phase of the analysis was the judgment that this data set as a whole was well suited for analysis. Cases like this with small, closed sets of authors, sufficient data, and register control do occur with some regularity, despite claims sometimes made in the stylometry literature in particular (e.g., Luyckx & Daelemans, 2011). Law enforcement agencies can often provide these types of problem—especially with online language use providing essentially permanent records of data available. Researchers with relatively little forensic experience appear to focus their efforts on more and more challenging problems. For practical casework problems, these more complex research projects are less relevant. Such academic authorship studies are, of course, important, but many issues around the “easier” sorts of cases have not yet been resolved. By sharing actual investigative linguistic casework with the researchers and the public, the forensic linguistic community can help provide a picture of the landscape of actual forensic problems.
ANALYSIS
As noted already, the purpose of separating the analysis into stages was to allow TG to pass the data in the case to JG in a controlled way. Specifically, in line with the protocol published in Grant (2012) and, given the time series nature of the data, TG began by providing JG with only the two sets of known writings for Debbie and Jamie Starbuck. TG had requested from the police contact that he, too, should not be informed of any particular suspected breakpoint in the data series. In spite of this, the emails were provided to TG in two files of known and disputed emails. To resolve this, TG removed the last few emails from Debbie’s known emails and added them to the disputed set to create a blind test set of emails for JG’s analysis. The advantage of having a second party manage the data access for the primary analyst is that it allows for practical issues such as this to be taken from the hands of the police, who may not fully understand the requests to provide data in certain ways to assist in the outcome.
JG analyzed the known writings, primarily by hand, to identify a linguistic feature set that showed pairwise distinctiveness between the two possible authors—that is to say, features were identified that were consistently used by one author, but not by the other (Table 2.1). Most notably, this approach prevented confirmation bias against any hypothesis as to who had written the disputed material. This is especially important in the context of the careful stylistic analysis for texts in forensic linguistics, which relies almost entirely on the judgment of the analyst as opposed to quantitative stylometric approaches (e.g., see Grieve, 2007), which generally involve the use of preselected feature sets (e.g., function word frequencies).
Table 2.1 Linguistic Feature Examples
Feature | Debbie | Jamie |
---|---|---|
Sentence length | Long sentences (24 words per sentence average) I’m now back in Oz, after 5 weeks In NZ—had a good time, though it felt so much more remote than here (guess it is!) and I really felt that, being there. | Short sentences (10 words per sentence average) I knew I’d forget something. 2 things in fact. |
One-word sentences | No tokens | Occasional use Sorry. I thought I’d replied. |
Run-on sentences | Relatively common Are you enjoying your new car, what is it? | No tokens |
Awhile | No tokens | 3 tokens Shouldv’e done that awhile ago. |
Inserts | Relative uncommon ha ha—you’re entirely responsible for how or where it goes | Relatively common Umm….you haven’t actully apologised for anthing despite your insistence otherwise. |
Emoticon usage | No tokens | 9 tokens Its gorgeous:) hope you enjoyed your holiday.) |
One basic distinction between a stylistic approach and a stylometric approach is that the stylistic approach generally involves a data-driven generation of a case-specific feature set, whereas stylometric analysis tends to rely on predesigned feature sets. The strength of one approach can be the weakness of the other in that feature sets arising from stylistic approaches resist generic validation studies but lead to explanation-rich outcomes that are easier to explain to non-specialists like police, lawyers, or juries. In contrast, stylometric features can be validated in independent testing—such that they can be applied consistently by researchers and minimize the need for analysts to rely on their own judgment—but the abstract nature of these analyses can resist informative explanation.
The feature set provided by this initial stage of JG’s analysis in the Starbuck case was then provided to TG and also sent to the police. The purpose was to provide an evidence trail that the feature set had been “locked” prior to JG receiving the disputed material. This step in itself did not strengthen the mitigation of confirmation bias, but it did strengthen the robustness of the analysis as an evidential product.
JG identified a wide range of linguistic forms that distinguished between the styles of these two authors, features that were predominantly used by one or the other in the known sets of emails. The process through which these features were identified involved a combination of close manual stylistic analysis and also computational analysis giving rise to some stylometric features.
The stylistic approach involved carefully reading the texts and identifying apparent differences in their