Monday, May 15, 2006

 

Mything the point of "Positive Results"

President Bush defended his NSA data mining program by saying, "We are not mining through the personal lives of millions of innnocent Americans."

I believe he is telling the truth.

For starters, at least 16 million American children are too young to use telephones. We also know that Qwest refused to give their records to the NSA.  That accounts for another 14 million Americans.  So at least ten percent of Americans were not included in this round of data mining. Of course, that raises an important question...

What about the rest of us?

Unfortunately, at least 200 million Americans probably did have their personal information examined by the NSA.   When you consider this program has been around for a few years, it is likely they looked at trillions of individual calling records.

Many reports about the program suggest the only information provided was phone numbers. However, a class action lawsuit already filed indicates there was more going on.  In addition to the number of origin and the number called, the records also included date, time, and duration of calls.  Obviously, the records are being subjected to some sort of social network analysis.

Network analysis can be useful in things like penetration detection.  Social network analysis can also useful for looking at criminal organizations.  However, to be useful, you start from a known node and work your way out.  You don't start with a trillion events and sift through them for "suspicious patterns" to locate targets.  That's called "data mining."

I realize that marketeers love to talk about data mining like it is something smart.  I hate to be the bearer of bad news, but in scientific circles "data mining" is a pejorative term.  It means you are running through the data looking for results that appear to be statistically significant even though you did not have a clear prediction about the data before you ran the experiment.  It's a polite way of labeling someone's data as bullshit.

Here's the problem with data mining:
Pick whatever criteria you want and you will run up against the bugaboo of data mining... false positives.

How big a problem is that? According to one article: "Assume the software is very accurate and produces just one false lead in every 1000 queries and misses one real lead in every 1,000 queries. Assume the software sifts through 1 trillion entries ... and assume there are 10 real plots.  This system will generate a billion false alarms for every real terrorist plot it uncovers.

Just based on arithmetic it is clear this system is not going to provide the sort of magic bullet people imply.   It may have made sense immediately after September 11th, but this far out in time it is of little or no value to be looking at billions of calls from several years ago.   Besides, if this was so damn powerful how is it that Qwest got away with turning them down?  I think there is something else going on here.

Those with long memories will recall this is not the first time someone tried to spring this sort of Big Brother operation on unsuspecting Americans.  Ten years ago, when cell phones were just taking off, the FBI tried to subvert the Communications Assisting Law Enforcement Act (CALEA) and create a national tracking system out of the wireless telephone network.  This is a never-ending battle.  Unfortunately, this time around the battle is being fought behind a cloak of executive secrecy and we are supposed to blindly trust the same people who have repeatedly demonstrated they cannot be trusted.

I may not know much, but one thing I am positive about is this: I have as much confidence in the Bush administration's ability to constrain itself to legal activity as I do in Gen. Hayden's grasp of the 4th amendment.

This page is powered by Blogger. Isn't yours?

Subscribe to "Mything the Point"