topic.
Louise Kennedy did not go to a coach, but to the public. The media around the world then mocked the Australian immigration authorities. But they did not react at all. On the contrary, the company providing the automatic language tests just pointed out that the requirements for potential immigrants were very high. Of course, it was not the performance standards that prevented a young and highly qualified native speaker from getting permanent residency, it was the algorithmic system that was simply not able to process the Irish accent correctly. The voice recognition software used in Australia is not yet capable of testing sentence structure, vocabulary and the ability to logically render complex information. That is the heart of the problem. The refusal to admit the obvious makes the incident look like a parody.
Yet the story has a very serious side. Ultimately, the software does not safeguard the Australian state’s legitimate interests when it comes to immigration, neither does it provide justice for those individuals who have worked diligently in the hope of gaining a residence permit. Alice Xu and Louise Kennedy found ways to circumvent the deficient algorithmic system. One exploited the software’s weaknesses and told it exactly what it wanted to hear. The other married an Australian, allowing her to stay in the country permanently. But people should not have to adapt to meet the needs of a faulty algorithm; dysfunctional software should be adapted to meet people’s needs instead.
Wrong conclusions: Algorithms misinterpret data
Today, social media are an important source of information for many people, and users get the messages on their screen that interest them the most. Facebook, for example, tries to ensure that people will spend as much time as possible on the social network, viewing as many texts, videos and photos as possible and commenting on them. The messages that the site’s algorithms automatically present to each user should therefore be as relevant as possible to her or him. But how do you define and measure relevance?
For Facebook, the key indicator is individual user behavior. If someone lingers even a moment longer on a message, clicks on a button or calls up a video, the platform sees this as a sign of increased interest. The more intensively and the longer people interact with content, the more relevant that content must be – that, at least, is the assumption. Using this sort of analysis, the algorithm calculates who it will supply with which news from which source in the future. The problem with this is that the more disturbing a post, the more likely it is that someone will spend time with it. The software again will see this as interest and send the user additional messages of the same kind. If you wanted to measure relevance in a way that benefits society, it would have to be done differently. Basic values that are important to the common good, such as truth, diversity and social integration, play a subordinate role here at best. What counts instead is getting attention and screen time (see Chapter 13).
Facebook not only tries to find out what users like best but also what they do not like at all. This led to a long-standing misinterpretation because a wrong impetus was being measured: If a user clicked on the “hide post” option, the algorithm interpreted this as a clear sign of dissatisfaction and accordingly did not show the person any further messages of a similar kind. This was true until 2015, when someone took a closer look and discovered that 5 percent of Facebook users were responsible for 85 percent of the hidden messages.
These so-called super hiders were a mystery. They hid almost everything that appeared in their news stream, even posts they had commented on shortly before. Surveys then revealed that the super hiders were by no means dissatisfied. They just wanted to clear away read messages, just as some people keep their inbox clean by continually deleting e-mails. Having discovered what was going on, Facebook changed its approach. Since then, it no longer necessarily interprets hiding a post as a strong signal of displeasure.4
In this case, the algorithm did what it was told, but with an unwanted result. Wrong criteria led to wrong conclusions. The algorithm was unable to detect the super-hider phenomenon. An investigation initiated and evaluated by humans was required to uncover what was truly happening. Anyone who uses algorithmic systems is well advised to regularly question and check the systems’ logic and meaningfulness.
Discriminatory data: Algorithms amplify inequalities
It is a mild spring day in Fort Lauderdale, Florida, in the US. Brisha B. is late, hurrying to school to pick up her stepsister.5 She sees an unlocked child’s bicycle at the roadside. The 18-year-old takes it, rides a few yards on the bike, which is far too small for her, and leaves it lying on the ground. Someone yells after her: “That belongs to my son!” Too late. Neighbors recognize Brisha and call the police. She is charged with stealing. At the preliminary hearing, the judge sets bail at $1,000 although the prosecutor has not requested any bail at all. Brisha has to spend two nights in prison before her family can find the money to free her.
Also in Florida, the same county, a few months earlier: Vernon P. is caught at the hardware store taking tools worth $86.35 – a similar amount the bicycle stolen by Brisha was valued at. Vernon’s criminal past, however, reads more dramatically than Brisha’s. The 41-year-old has been convicted several times and has already spent five years behind bars for armed robbery. Unlike Brisha, who in the past had only committed a few minor offenses, he initially remains at large.
In both cases, the COMPAS algorithmic system described in Chapter 1 assisted the judges in deciding on detention, bail or freedom. The software calculates and quantifies the probability of a suspect’s recidivism and has been used in many US states for several years in preliminary hearings and pre-trial motions. Brisha’s risk of recidivism in the near future was estimated by COMPAS at 8 on a scale from 1 to 10 – a fairly high score. Vernon’s likelihood of reoffending, on the other hand, was rated only 3 by the algorithm.
Two similar minor offenses. A young woman who has previously only come under scrutiny for misdemeanors. A grown man, convicted multiple times of robbery. She is classified as a high-risk criminal. His risk of reoffending is considered low. She is black. He is white. COMPAS was introduced to ensure that skin color no longer plays a role in determining penalties. That, in any case, was one of the hopes when the software was first adopted. While judges formerly made inconsistent decisions or unconsciously discriminated against individuals, law enforcement authorities expected the algorithm would assess every person neutrally and without bias, regardless of background or skin color. As a result, the software was not allowed to use these parameters when recommending a sentence.
Even if New York City managed to reduce the total number of inmates with the help of COMPAS because more defendants were released on bail or probation, the hope that there would be no more discrimination has not yet been fulfilled. While Brisha did not attract attention again after stealing the bicycle, a short time later Vernon ended up in prison for eight years after committing another serious theft. Brisha and Vernon are not isolated cases. The ill-judged results follow a clear pattern. The software systematically overestimates the risk of black people relapsing just as it underestimates the probability that whites will do so.
Proof of this imbalance was provided by the US non-profit newsroom ProPublica.6 The organization, which is dedicated to investigative journalism in the public interest and was awarded the Pulitzer Prize in 2010, examined 7,000 uses of COMPAS in Florida. The investigation revealed that skin color plays a decisive role in calculating the probability of recidivism (see Chapter 12), even though the software is not allowed to include this characteristic at all and was introduced precisely with the aim of preventing discrimination.
Obviously, a suspect’s race finds its way into the algorithmic prognosis through the back door. One reason for this is that many of the criteria that COMPAS considers correlate with skin color in the US. Social environment, for example, or housing. In addition, the software compares each case with a control group of over 7,000 imprisoned criminals. The greater the similarity between the personal circumstances of a suspect and those of the convicted criminals,