Another mass shooting and the sick sad tragedy is again in full swing. How could this have been prevented? Did anyone see it coming? Can we predict and intervene before the next one happens? These questions arise each time the cycle repeats. They are worthy questions to ask. Successfully predicting and intervening in one of these attacks means potentially dozens of lives saved. It is worth the time of data scientists to explore the issue.
The shooting at Marjory Stoneman Douglas high school on February 14, 2018 is different in at least one respect than the multitude of mass shootings since Columbine: there was more than enough evidence lying around to predict and prevent it. The police had been called on the shooter multiple times, he was known to the FBI, he had purchased numerous guns. People who knew the shooter warned law enforcement about him directly. There were multiple times when the interventions could have been much harsher or more effective. There were warnings. But the pieces weren’t put together by the authorities. Or, more disastrously, they lacked the conviction to act.
So the question remains, can we predict when some individual will commit this kind of terror?
On the surface gun violence and earthquake prediction don’t seem to have much in common. We’re talking about human psychology, culture, and access to firearms versus geological movements and stress on tectonic plates. In the former, you have millions of school-aged children growing up in society with relatively easy access to guns, and predicting which one among them will turn to these weapons (and when) is daunting. For the latter, you have all of geological time to draw from and the physical world to investigate. Between 12,000-14,000 earthquakes are recorded each year from around the world. Since the mass shooting at Columbine high school there have been 193 school shootings. Source
Let’s start with earthquake prediction
We are most interested in predicting the city shaking Great Earthquakes. But because they are less frequent there are fewer data about them. So, what’s the problem with that…we don’t want more powerful earthquakes to occur, right? Well, the fewer data the poorer our models perform. Models built on the relatively plentiful small quake datasets are not predictive of the bigger quakes. To be blunt: there isn’t enough data. Source
Think about the problem a bit more. What are we trying to predict exactly? We are trying to predict where, when, and how strong the next significant earthquake will be. We need to be able to warn people as far in advance as is possible. We’d like to shut down power stations and evacuate cities if necessary. Timely earthquake prediction means many thousands of lives saved.
So what do we know? We know where past earthquakes have occurred so we can narrow down the locations to the fault lines. We know how long it’s been since the last quake and we know that they are typically cyclical. It’s well known that the West Coast of the United States is overdue for a large quake. It is likely this earthquake will occur in the next few years and will cause significant amounts of damage. But we don’t know where on the thousands of miles long West Coast the quake will strike, and we don’t know when. Nate Silver, current head of election forecasting company Five Thirty Eight wrote a book about predictions some years ago where he discussed this issue:
In theory, all these models should be competing in a test of which best forecasts seismic activity. In reality, there just aren’t that many high-magnitude earthquakes in the world that can serve as tests. That’s good news for anyone living on a fault line, but not for seismologists, who would like to know whether their models are correctly calibrated to pick up the greater risk just before a major earthquake. So instead the tests generally focus on whether the models correctly predict the frequency of lower-magnitude earthquakes. Globally, there is a reliable relationship between the rate at which these occur and the rate at which major earthquakes occur. But in any given spot, that relationship may not apply — and not all models can be tested globally because not all places have the level of measurement of crust structure and seismic activity that, say, California does. Also, a model tuned to pick up small quakes may not pick up bigger ones. Source
So the models are not able to generalize from the small quakes to the larger ones. And the data we can collect isn’t great. We cannot yet measure the pressure on the tectonic plates we just can’t get close enough to the plates with current technology. We are groping in the dark in terms of data.
In many ways this is similar to predicting who is going to commit harsh acts of violence.
As convenient as it would be, there is no one-size-fits-all profile of who carries out mass shootings in the United States. About the only thing almost all of them have in common is that they are men
The shooter who committed the atrocities at Marjory Stoneman Douglas high school was like a fault line, slowly building pressure, extremely dangerous but mostly hidden below the surface. In hindsight, all the telltale signs were there but no one could predict when the plates inside him would shift. What can we know about people like him? We know that he posted his hate regularly. Couldn’t we use this information to isolate the list of individuals who are likely to do this? To answer a question with a question: how many other teenage boys are talking about these kinds of things on forums or with their friends? As online magazine Vox points out, after the terrible shooting in Sweden the government there commissioned a report to look into finding lone wolf terrorists. They found:
To produce fully automatic computer tools for detecting lone wolf terrorists on the Internet is, in our view, not possible, both due to the enormous amounts of data (which is only partly indexed by search engines) and due to the deep knowledge that is needed to really understand what is discussed or expressed in written text or other kinds of data available on the Internet, such as videos or images. Source
The math is against us
Not only is it difficult to predict who might be ready to cause extreme violence we also would need that precision to be exacting. In other words, let’s say that we could use that data and other information to build a model. How accurate would that model need to be to make it actionable?
To make our predictions actionable the prediction accuracy needs to be more than 99%.
Because as Vox points out. A prediction of the US population that is even 99% accurate would still yield 1% of individuals as false positives. One percent of three hundred million people is three million people. Among those, three million people will be the next mass shooter, but two million nine-hundred thousand nine hundred people will have been identified falsely.
And here’s the problem with identifying these false positives: It takes significant resources away from other useful activities. We don’t have the resources as a nation to spend on following all of these people.
In the years after 9/11, the NSA passed to the FBI thousands of tips per month; every one of them turned out to be a false alarm. The cost was enormous, and ended up frustrating the FBI agents who were obligated to investigate all the tips. We also saw this with the Suspicious Activity Reports —or SAR — database: tens of thousands of reports, and no actual results. And all the telephone metadata the NSA collected led to just one success: the conviction of a taxi driver who sent $8,500 to a Somali group that posed no direct threat to the U.S. — and that was probably trumped up so the NSA would have better talking points in front of Congress. Source
Taking a step forward: Use what you know
Where does this leave us? The shooter at Marjory Stoneman Douglas high school was known to local and federal officials. The FBI was twice warned and local law enforcement had visited this young man several times. It seems like we can use this kind of information to make some actionable decisions and prevent these incidents. What other troubled individuals are sitting on the bottom of stacks of paper waiting to either be investigated or to explode.
With the uncertainty in our predictions and the fact that no interventions were applied even after knowing this individual was a threat the best thing we can do are to take steps to actively reduce these incidents by other means or deal with the repercussions. In the case of earthquakes, it’s easier to know what we should do.
We can build more earthquake resistant structures in the future and invest in early warning systems that measure seismic activity. Even though we cannot predict the next earthquake’s strength knowing what magnitude earthquakes have happened historically will clue us into the standards we should require for new construction. Source
As for school shootings, dealing with the repercussions of these attacks has become unbearable and now more than ever seems like a time ripe for policy change.
This final quote about earthquakes may, unfortunately, ring true about identifying individuals who are likely to commit atrocities:
“I would not be at all surprised if earthquakes are just practically, inherently unpredictable”