How to investigate your government through algorithms

Some kinds of reporting-by-the-numbers are anything but lazy. Take investigations looking into algorithms — examining the formulas used by the government to determine who is more likely to commit a crime or how likely your building is to have a fire inspection.

Speaking at the recent International Festival of Journalism, Nick Diakopoulos, assistant professor at the University of Maryland’s Philip Merrill College of Journalism and a member of its Human Computer Interaction Lab, gave a solid primer on how to get started.

He’s been studying the wider reach of algorithms in society, government and industry for about four years, coming at it from a computer science background as a “techie who worked my way into journalism.” Boyish, bespectacled and occasionally prone to professorial turns of phrase like “algorithmic accountability,” Diakopoulos offered a look into the numbers that shape our lives.

What they are

Photo: Jon Oropeza via Flickr. CC-licensed.

At the most basic level algorithms are like recipes, Diakopoulos says. They have ingredients, assembly instructions and a sequence or order for those instructions — your basic how-to method for doing something. Where the analogy falters, he says, is that unlike the sequence that results in a good plate of pasta al pomodoro, algorithms are decision-making formulas. “The crux of algorithmic power is how they make decisions, or have the potential to make decisions, potentially without any human involvement.”

These break down broadly into four types of decisions: prioritization, classification, association and filtering. Familiar examples include search engines top-ranking better sources of information; YouTube’s strainer for picking up copyrighted material; connecting terms—as in the slander lawsuit over Google’s auto-complete — and filtering, whereby some news sources rank higher than others.

Why you should care

Far from being impartial gatekeepers or shortcuts, algorithms are designed by humans — often with built-in bias that can shape our daily lives. They are deciding what schools kids attend, who gets released on parole and who your next date is.

“It’s time to start getting skeptical about algorithms,” Diakopoulos says. “It’s time to start asking questions to learn more about how these systems function and get more details on how they work.”

That’s where algorithmic accountability — pulling back the curtain on the formulas —  comes in. Diakopoulos cites a ProPublica investigation into software used in crime cases that asks a number of seemingly benign questions  — “What neighborhood do you live in?”  “What’s your education level?” “Are you in touch with your family?” — to arrive at a flight or future crime risk. Looking at the results in 7,000 cases, reporters discovered that the resulting “risk assessments” are not only biased against blacks but only slightly more accurate than a coin toss for predicting who will commit more crimes.

“Algorithmic accountability means investigating these systems and trying to understand how these quantifications affect people,”  he says. His team’s investigations have lead to articles including “How Google shapes the news you see about the candidates” and “Uber seems to offer better service in areas with more white people.”

Diakopoulos shows waiting times by area in the Uber investigation.

Where to start

There are a few main ways to investigate algorithms – more on these below. But because Diakopoulos realizes that most people didn’t know where to start — “We wanted to lower the bar for journalists getting involved” — he and a team devised algorithmtips.org

First, they amassed potentially newsworthy algorithms in use by the U.S. federal government — starting with 5,000 leads for algorithms that aren’t even talked about on the websites. Then they filtered, tagged and enriched those leads — fleshing out descriptions, why these algorithms matter, which level of government they influence, whether or not it’s proprietary and so on. They’ve got around 170 algorithms to get you started – from Zika treatment for pregnant women to highway planning and a hiring system for government employees.

Diakopoulos wants to get journalists in the habit of looking for algorithms, so he put the room of about 30 people to work showing how relatively simple it is to find them.  In 10 minutes, using the terms “automatic assessment site:*.gov” the room turned up a number of interesting items.  My group came out with a byzantine formula for calculating the cultural value of movies seeking funding from the Italian Cultural Ministry with terms “valutazione automatica site:*.gov .it ” ; another testing out the .uk.gov found a formula for evaluating risk of recidivism for domestic violence. If you want to take it further, there’s an entire spreadsheet of terms on algorithmtips.org  (They are also looking to add examples from governments worldwide, see the “volunteer” button.)

Digging deeper

If you’re interested in delving into ways to Diakopoulos gave an overview of a few main ways to dig into algorithms, along with the pitfalls of each.

  • Code audit

Requires specialized knowledge, versioning, system setup, FOlA access, data access

“Algorithms run on computers, so at one level of algorithm monitoring, you could try to read someone else’s code and try to understand what that code is doing. But there’s all kinds of challenges, because you may need really specialized knowledge to understand code and you can run into version issues — like which version of the code is actually running on the system. In some cases, there can be difficulty getting the code, if it’s proprietary, there’s going to be a lot of resistance… Even if the government is using some system they built themselves, how responsive are freedom of information requests at getting code? Not terribly. There are some examples in the US where journalists have been able to get source code from the government but there’s no sort of uniform application of the law to source code.”

  • Noninvasive user audit

User surveys about their experience. Non-experimental, sampling size issues, self-reporting

“Say you’re studying the news feed algorithm on Facebook, you simply create a survey and ask people, “What’s your experience with this algorithm? How often do you see personalization in your newsfeed? How often do you see advertising in your newsfeed?” This could be useful to understand people’s experience, but you can run into issues like sampling. Are you randomly sampling people? You also have the issue of self-reporting bias, so the people who respond to your survey might have a different experience than the people who didn’t respond to your survey.”

  • Scraping audit

Analyze input/output. End-user license agreement and Computer Fraud and Abuse Act (CFAA) issues

“The idea is that if you think an algorithm’s a black box with inputs going in on one end and outputs or decision coming out, a scraping audit gathers data on all the different inputs and all the different outputs. Then it tries to relate them or correlate inputs to outputs. There are legal issues right now —some court cases in the U.S. —  I warn journalists to be aware that if you’re scraping a website, there are some laws and rules depending on your jurisdiction.”

  • Sock puppet

lmpersonate users

“If you wanted to study the Facebook algorithm to know how your newsfeed is being personalized, maybe create 1,000 or 10,000 fake accounts and impersonate different types of users and then look at how the newsfeed looks to all those different versions of users.”

  • Crowdsourced audit

Have real users report back data about platform
Combine with software reporting

“Here you work with the public and real users report back, say what their newsfeed looks like… In fact, the New York Times has run investigations like this where you actually install a browser plug-in, and you give it permission so whenever you load certain pages, that browser plug-in is reporting data back to that journalistic organization.”

Algorithms will be a steady line of work for muckrackers with patience. “If you’re used to journalism where you find a document and it nails someone, you’ll be disappointed with this kind of work,” he cautions. For journalists without a computer science background he says much can still be done. “Some data and some critical thinking and you start a conversation.”

Cover Photo // CC BY NC