Группа авторов

Methodologies and Challenges in Forensic Linguistic Casework


Скачать книгу

over to identify seemingly distinctive linguistic forms being used relatively consistently across their own writings. The texts were read both without constraint and with a focus of specific levels of linguistic analysis—word choice, punctuation, spelling, sentence grammar, and discourse structure. Lists of unusual and distinctive features were compiled for both authors and then compared, and a list of potentially distinctive features was produced.

      This procedure, the standard procedure in forensic authorship analysis, is far from perfect. It depends on the expertise of the analyst; it is based primarily on positive evidence, and it is not replicable. Two analysts may honestly come to two different subsets of linguistic features and, without attention to the broader design of the analysis, may be biased and may create unreliability. The main advantage of manual feature selection is that it seems to be better at identifying the most unusual and thus, potentially, the most distinctive features. As opposed to a computational approach to feature selection, a human reader is especially good at spotting entirely new feature types that may have never been considered before. Furthermore, once these features are identified, the writing samples can then be searched both by hand and computationally to ensure that all occurrences of these forms have been extracted. In stylistic authorship analysis, feature selection is manual, but feature counting need not be and, where possible, should not be.

      In total, JG identified 51 different feature types that appeared to distinguish between the possible writings of Debbie and Jamie Starbuck, which were informally classified as belonging to nine levels of analysis:

       Text level (average text length, common email openings and closings)

       Paragraph level (average paragraph length, common paragraph initial words)

       Sentence level (average sentence length, common sentence initial words)

       Phrase level (common two-word n-grams, common three-word n-grams)

       Word levels (average word length, common function word)

       Abbreviations, acronyms, and emoticons (common text messaging acronyms, common emoticons)

       Contractions (common standard contracted forms, common nonstandard contracted forms)

       Spelling and case (common spelling errors, repetition of letters for emphasis)

       Punctuation (common use of exclamation marks, nonstandard semicolon usage)

      Some of these feature types consist of a single measurement (e.g., average word length in characters or spelling of “a lot” as one or two words), whereas others consisted of a large number of individual features (e.g., frequency of common function words or words that are commonly used in sentence initial position). Examples of these features are provided in Table 2.1. In addition, JG also recorded general holistic impressions of the two authors. For example, he found Debbie’s style to be more narrative and informal than Jamie’s.

      ATTRIBUTION

      After the known writings had been analyzed and JG had identified the feature set that distinguished between the known emails of Jamie and Debbie Starbuck, only then did TG pass to JG the remaining set of 29 potentially disputable emails. In this stage, JG went through each of the texts by hand and searched for each of the feature types systematically. As there were relatively few texts, and because a number of the features were difficult to search computationally, this process was primarily carried out by hand to ensure that no evidence was missed. For each text, the number of features that matched each author was recorded, and then the length of these lists and the nature of the features were considered to come to an attribution judgment.

      In some emails, the evidence was mixed: it contained features associated with both Debbie’s and Jamie’s known styles. For example, some disputed emails contained some features associated with Debbie’s known emails—they were relatively long or started sentences with the word “And.” However, where these emails predominantly contained features associated with Jamie’s style, we still attributed the email to Jamie. If the mix was more balanced or if there were very few features in either direction, no attribution was made.

      Overall, the feature set proved clear enough to attribute a substantial number of the texts across the timeline to Jamie Starbuck. For example, one of the most distinctive features was sentence length, as well as the way that longer sentences were constructed. In general, Debbie used considerably long sentences, including frequent use of sentence coordination and often run-on sentences linked with comma splices and dashes. Alternatively, Jamie often used short sentences, including one-word sentences, and very rarely run-on sentences (Table 2.1). These types of sentential patterns were far more common in the questioned documents offering strong evidence they were more likely to have been written by Jamie. Various other features associated strongly with Jamie’s writing samples were also present in these texts, including the deletion of apostrophes before the genitive marker in possessives, the use of highly informal features like interjections and emoticons, and the presence of several common compounded words including awhile, in between, and up-to-date, with those specific spellings.

      These two pieces of evidence clearly pointed to Jamie as the author of the questioned documents. The break point was indicated at the second email in the disputed group. Following this second email, that is to say by May 3, 2010, it appeared that Jamie alone was writing emails from Debbie’s account. This placed the date of the account takeover to be before the couple