Thursday, January 11, 2007

Data Mining turns Political!

This issue started out in Congress as the "Data Mining Moratorium Act" a few years ago but has now taken the turned into the Federal Agency Data Mining Reporting Act of 2007 that was introduced on Jan 10, by Senators Russ Feingold (D-Wisc) and John Sununu (R-NH).

In essence, it requires the Federal agencies to report to Congress on their data mining activities. According to the text of the bill "the controversial data analysis technology known as data mining"
can be used to dig deep into the records of fellow Americans using the public and private data. Recently, the phone data mining hit the new headlines.

The Bill acknowledges the success of data mining on the commercial front in the Finance industry such as "to identify people committing fraud." However, the Bill questions the potential of data mining for counter terrorism. This is not because the algorithmic suitability is in question but more so because we do not have enough history of such activities to train the data mining models. The cases of terrorism create a rare case scenario, just like the issue of training machines to predict or detect cancer poses.

It also seems that the regional Law Enforcement Agencies (such as the Sheriff's Offices, Police Departments) will not be impacted as it states that the Federal Agencies are required to report on their data mining activites to the Congress. The law requires reporting of the,
"thorough description of the data mining technology that is being used or will be used, including the basis for determining whether a particular pattern or anomaly is indicative of terrorist or criminal activity."

The down side of this law is that criminals will get more public information on what data mining based counter terrorism measures are being used to foil their attempts.

Shyam Varan Nath
(Thanks to Dr Colleen McCue for sending me this link)


Dean Abbott said...

I've always been somewhat puzzled as to why data mining in particular is singled out for extra scrutiny by Feingold. After reading older versions of bills sponsored by him a few years ago, I suspect that part of the problem is a lack of understanding of the distinctions between data mining and data joining (i.e., putting data together that shouldn't be put together for privacy reasons).

But it also seems to me that as long as the deployed system doesn't contain any data mining models, they escape the scrutiny. Is that so? For example, if you built a series of decision trees and cherry-picked some rules, built your own expert system from these rules, you aren't actually using "data mining" in the final model. In that case, data mining technology didn't determine whether a particular pattern is indicative of criminal activity, but rather a business rule is.

It also seems to me that whether the rule was developed by a "data mining" algorithm or someone's experience is irrelevant. What Feingold should pursue is not data mining but validation: how does one determine that a model or a rule is valid? Simulation...resamping...theory...a ll of these.

Thanks for your blog by the way--it is fascinating and an important contribution to the data mining community.

Edit said...

Adding to the previous comment by Dean Abbott, about misconceptions of data mining.

As a long time practitioner of DM, in my view the DM algorithm has everything to do with the current inability to cease rare case scenarios. My DM model for example – GT data mining – can catch such cases. It can be done through a number of mechanisms: early detection of suspect's ground preparations, background, and typical camouflage pattern of behavior – to name a few.
There are other DM shortcomings that exist but not the ones mentioned in the article.

I don't understand what's the fuss about DM abilities, which are in effect, no more than a straight forward use of one's own data, the digestion and comprehension of operations data. That's the essence of TI data advantage.