[source]
One of the first things they teach you in law school is that circumstantial evidence is just as powerful as direct evidence. In fact, they say, the distinction is irrelevant and misleading. “But that’s only circumstantial evidence!” is the type of thing that non-lawyers think is a winning argument, but which isn’t.
Why is this?
Let’s start with the distinction: Technically, direct evidence is evidence from someone’s direct sense impressions: I saw or heard the suspect do this. Circumstantial evidence, by contrast, is based on logical inference, not sense impressions. The classic example is two subway passengers. Jenny enters the subway from the street, where it is raining directly on her. She has direct evidence that it’s raining. Helmut is already in the subway waiting for the train. He sees people coming down the escalator holding wet umbrellas. He infers, from circumstantial evidence, that it must be raining outside.
Of course, there could be another explanation: maybe there’s been a symbolic protest involving umbrellas, or maybe there was a water pipe leak. You never know. But Helmut compares what he saw to the general background knowledge of human experience: how likely is it that people would be holding wet umbrellas for a reason other than the fact it’s raining? The answer is: extremely unlikely. As they say, if you hear hoofbeats behind you, assume it’s a horse, not a zebra.
I’ve been thinking a lot about circumstantial evidence for a few of reasons lately. One is the Jens Soering case, in which Soering claims that DNA testing has exonerated him. Here is an excellent German-language description (g) of why this claim can’t be taken at face value. I’ll get to that in a later post.
While reading it, I was struck by the similarity between DNA identification and circumstantial evidence: both build on sequences of compound probability.
I. Compound Probabilities in DNA
First, a bit about DNA: The current system for DNA identification in the USA, CODIS, relies on 13 separate DNA markers. The markers are made up of loci (plural of locus). Here are the 13 CODIS marker positions in the human chromosome:
Each of the 13 loci contains 2 alleles, which are variant forms of a gene. One allele comes from your father, the other from your mother. These alleles may be 2 identical copies of the gene, or 2 different versions. The number of potential alleles (i.e., different forms of the gene) at each of these loci varies depending on a variety of factors. However, large-scale DNA studies have determined that there are about 10-20 possible allele variations at each location CODIS uses.
So how does this generate matches? First, you take the DNA of the suspect. The profile might look something like this:
This shows the pairs of alleles, some identical, some different, at each of the 13 loci. The last AMEL gene identifies the gender of the DNA donor based on its presence on either the X and Y chromosomes, if a male, or the two X chromosomes, if female.
After you have develop the suspect’s DNA, you must compare it to the population as a whole (or, depending on factors too complex to get into here, a sub-group of the population to which the suspect belongs). How many people in the population as a whole have these specific allele combinations at these specific loci? That “frequency” is shown underneath genotype in the above chart. Population studies have determined how many separate alleles can exist, and how frequently they arise, in many different population groups all across the world. Here’s the Strider Database for European countries.
Comparing the defendant’s DNA with these underlying frequencies is what makes DNA testing work. It is the multiplying compound probability which allows such specific determinations. To figure the compound probability of independent events (which is basically what we have here), you multiply them together. Let’s just take the first 3 loci in our example. Turning the percentages into numbers, we multiply .082 x .044 x .017, yielding 0.000061336. That is, only .0061336%, or about 1 in 16,000 people, will share the same lineup of alleles at these 3 loci. That’s only 3 loci; adding in the frequency of the remaining 13 loci quickly gets you into incredibly tiny probabilities: 1 in billions or even quadrillions of people. From 2017, the American DNA database started testing 20 loci, which will yield even more absurdly high numbers.
Yet, as any DNA expert will tell you, the test does not prove the defendant was the one who left the DNA at the crime scene. It simply says the suspect cannot be excluded as the donor of the DNA. It is always possible that another person with those same DNA markers left the DNA at the crime scene. Assuming the DNA at the crime scene only yielded the first 3 loci, then the suspect could argue: “Yes, I have the same alleles at these 3 loci, but so do 1 in 16,000 other people”. That means, in the USA, there are about 20,000 other people who could have left that DNA at the crime scene.
That’s not a very strong argument for your innocence, but still, it’s an argument. However, a match of all 13 loci usually yields a figure like 1 in a trillion: that is, if you randomly sampled trillions of humans, you would, on average, find one per trillion who had the same alleles at the same loci. Since there aren’t trillions of humans, this is, for all intents and purposes, an indisputable match. Now, reliable DNA typing requires a lot of preconditions to be met: the DNA databases must be accurate and well-maintained, you must choose the right population sub-group to compare the suspect with, the DNA shouldn’t be contaminated or degraded, etc. There are still some open questions on these issues. But by and large, DNA testing is reliable. Even if an extremely conservative interpretation of the results yields a match of 1 in a million instead of 1 in a trillion, that’s still quite powerful evidence.
II. Compound Probabilities in Circumstantial Evidence
This is also exactly how circumstantial evidence works. There are various ways of categorizing circumstantial evidence, but none is definitive. Most lawyers categorize it based on its strength (i.e., how powerful an inference of the suspect’s guilt it permits):
First, evidence which fails to exclude the suspect. For instance, the suspect has no alibi for the time of the crime, and was close enough to the crime scene to have reached it. This fails to exclude the suspect, but there may well be thousands of other people who fit these criteria.
Second, circumstances which are consistent with the defendant committing the crime. For instance, the suspect drives the same color car as witnesses reported seeing. The suspect was known to possess a knife like the one used in the crime. The suspect spent a lot of money right after the robbery.
Third, circumstances which are unlikely or extremely unlikely unless the suspect committed the crime. For instance, a car with the suspect’s license plate numbers was reported leaving the crime scene, or the victim’s blood was found in the suspect’s home. DNA is this kind of circumstantial evidence: it doesn’t prove the suspect was at the crime scene, it merely proves it is extremely unlikely that the suspect was not at the crime scene.
Let’s take an example. John is suspected of stealing Jane’s car from the street and selling it to a chop shop. The police have the following evidence:1
- John has no alibi; he says he was at home alone watching TV at the time of the theft. He lives a 10 minutes walk from where Jane parked her car.
- John has a previous conviction for car theft.
- A person matching John’s description was seen walking on the street near Jane’s car shortly before the car theft.
- A CCTV camera caught a masked person driving Jane’s car to a chop shop where John used to work.
- John bought a boat for $10,000 in cash one week after Jane’s car was stolen, and says he won the money it ‘a long time ago in a poker game’ he can no longer recall.
- When questioned by police, John was shown to be in possession of Jane’s mobile phone (John, like most criminals, isn’t very bright).
Lets assign some probabilities here. John has no alibi and lives nearby, but then again, thousands of other people also do. We’ll assign this factor 1: it doesn’t really change any probabilities, it just fails to exclude John.
Previous conviction for the same crime: Only a small number of people have ever been convicted for car theft, and there’s a 30% recidivism rate. However, this is only a statistical association — and most car thieves don’t re-offend. So we’ll again be charitable and assign this a probability of .75: that is, it there is a 75% likelihood that John’s criminal record is unrelated to the theft of Jane’s car.
Person seen nearby: Let’s also give this a generous 75% probability. That is, we’ll say that there’s a probability of 75% that the person seen near the car was not John.
CCTV camera image: Again, this is only moderately strong evidence — the man in the car was wearing a mask, hundreds of people used to work at that chop shop, certainly someone who never worked there might also take a stolen car there to sell. We’ll also assign this 75%.
Boat purchase: Now we’re getting warmer. The car was likely worth around $15,000 on the black market. John has no regular income, spends everything he gets immediately, and cannot provide any details about the poker game. He bought the boat in cash, which is highly unusual. All of these circumstances suggest involvement in the car theft. So we’ll assign this a probability of 40% — that is, there’s a 60% probability that the $10K came from the car theft, since John, despite having a strong incentive to do so, cannot provide any alternate explanation. There is, however, still a 40% probability that the money came from some other source which John chooses not to account for, despite having every incentive to do so.
Mobile phone: This is obviously the strongest piece of evidence. Jane had left the phone in her car and reported it stolen along with the car. The car thief was smart enough to remove the battery after noticing the phone on the passenger seat of the car, so its location could not be traced after the theft. Nevertheless, John had the phone when the police questioned him. Further research showed the phone was located in John’s home a few days after the car theft, with a different SIM card. John claims he bought the phone from ‘some guy’ on the street for $100. Once again, though, he cannot provide any proof or details. We’ll assign this a probability of 10% — that is, there is only a 10% chance the fact that John possessed Jane’s phone one week after the car theft is not explained by the fact that John stole Jane’s car.
Do we convict John on this evidence? To do that, we need to compound these probabilities. We assume the null hypothesis: that these events are all independent (i.e., that they are all mere coincidences, and not all explained by the fact that John stole Jane’s car), and multiply the probabilities accordingly:
1 x .75 x .75 x .75 x .4 x .1 = 0.016875.
This leaves us with a figure of 1.7 percent. That is, there is only a likelihood of 1.7% that all of these circumstances were coincidences: i.e., that they were not explained by the fact that John stole Jane’s car. By contrast, there is a 98.3 percent likelihood that all of these circumstances, taken together, were related: they were all explained by the fact that John stole Jane’s car. Would this be enough to arrest John? In most countries, certainly. I would say this is enough for ‘probable cause’ (Anfangsverdacht) in the USA and Germany. You could get a warrant to arrest John and to search his house based on this evidence.
Would this be enough to convict John at a criminal trial? That standard is, of course, much higher. I would give the prosecution 5 to 1 odds of convicting John at trial on this evidence, but every lawyer you ask would cite different odds.2
III. Looking at All the Evidence
But the case against John only works when all the evidence is considered. This is what many people get wrong when they buy into innocence claims: they look at single pieces of evidence in isolation, without considering the overall picture they yield.
However, plenty of people, even judges, make this mistake in many different contexts. Here’s a recent example: Curtis Flowers, a man on Mississippi’s death row (who probably actually is innocent), claimed that prosecutors illegally removed black people from his trial jury. Flowers listed many factors proving this: the prosecutor had a history of excluding black jurors, had excluded black jurors for reasons that also applied to white jurors, asked black jurors different questions than white jurors, etc. The Mississippi Supreme Court addressed each of these factors, and found that, taken on their own, they didn’t show the prosecutor had acted unlawfully. The US Supreme Court, however, disagreed:
To reiterate, we need not and do not decide that any one of those four facts alone would require reversal. All that we need to decide, and all that we do decide, is that all of the relevant facts and circumstances taken together establish that the trial court at Flowers’ sixth trial committed clear error in concluding that the [prosecutor did not discriminate]. (emphasis added)
Circumstantial evidence must be considered together, not in isolation. This is why German critiques of Jens Soering’s trial are unconvincing: They pick at individual elements of the case against Soering, without considering the overall picture. Let’s look at one piece of evidence Soering’s supporters often focus on: the bloody sock print, which the prosecution — correctly — argued at trial was consistent with Soering’s foot. The defense — also correctly — argued that it was doubtless also consistent with thousands of other people’s feet,3 and thus wasn’t very strong evidence on its own. This is “consistent with” evidence.
Yet the case against Soering does not stand or fall on whether the sock print at the crime scene can be positively identified as exclusively Jens Soering’s. The question is whether Soering is guilty, considering that (1) a sock print found at the crime scene is consistent with his foot; (2) he fled the country without warning when asked to give blood and fingerprint samples to the police (wiping his fingerprints off surfaces in his car and home beforehand); (3) he had injuries directly after the murders consistent with an armed struggle; (4) he had means, motive, and opportunity to commit the crime; (5) he had no alibi for the time of the crime; (6) he had discussed violent acts against the victims with his girlfriend before the crime, etc. etc. Looking at “all of the relevant facts and circumstances taken together”, to quote the Supreme Court, the picture is clear. Or put another way, the probability that all of these circumstances could be explained by mere coincidence is tiny, just as the probability that all the circumstantial evidence against John were mere coincidence is tiny. The probability that all of these circumstances are explained by the fact that Jens Soering killed the Haysoms, by contrast, is very high.
And that’s not even counting his repeated confessions — which, by the way, are direct evidence, not circumstantial.
- Not all of this evidence would necessarily be admissible in court. In particular, John’s previous conviction would likely not be admissible unless he testifies. Also, police would have to prove they had legal cause to seize John’s phone.
- What happens if at John’s trial, he provides irrefutable CCTV evidence that he was at a bar on the other side of the city at the exact time the car was stolen? Well, that is a circumstance which is 0% compatible with his being guilty. This means that you add a 0 to the simple equation above, which reduces everything to 0, and proves all the circumstantial evidence was, indeed, coincidence. John, to his delight, goes free.
- One of the mistakes Soering’s supporters constantly make is to complain that the prosecution made some sort of flimsy argument (such as that the sock-print matched Soering’s foot), which obviously bamboozled the jury. What they rarely mention is that Soering had two defense attorneys who actively challenged all of these prosecution arguments. They didn’t just sit there, they constantly objected, and made their own powerful counter-arguments. The jury wasn’t misled, they heard two different interpretations of most of the evidence Soering complains about, and simply chose the one they thought more convincing.