Text search and text mining is an integral part of Business Intelligence. Often text mining is tightly coupled with data mining. Here is an example where a text search for "sugar" led to sweet success in solving a crime at NYPD recently:
"Best in class 2007: New York City Police Department"
The text searching capabilities are an integral part of crime related systems. In this case one of the witnesses recalled the word "sugar" in the tattoe on the neck of the suspect and the detectives searched the Real Time Crime Center (RTCC) database for this string. Free text is categorised as un-structured data and often creates the most challenging aspect of a software system based on database. Databases are great in handling structured data or data in columns that can be easily queried upon. However, in recent times, search capabilities for free test of phrases is gaining important. The Google's and Yahoo's have mastered the art of searching text, so what is the big deal about searching for text in the database. The free text consists of several words or phrases and in order to index it for faster searching, the text engine creates a text index. Unlike the index on structured fields such as a numeric column or character column with a few categories of data, the text indexing has to extract all the important words, get rid of stop words like {a, an, the, it, this etc.} and then store the occurence, frequency and relative positions of these words in every row/record of data. Such a text index would allow a quick search for a word like sugar in the description of the suspects, among 120 million or so crime and arrest records.
Realizing the importance of the growing importance of text search and text mining of the data stored in databases, companies like Oracle have tightly coupled the Oracle Text engine with the database. This text engine also works closely with Oracle Data Mining to allow looking for patterns in textual descriptions. In crime incidents, the narrative of the crime report, has a wealth of unstructured data. Now, this "wealth" can be mined using the marriage of the in-database text and the mining engine.
Monday, March 5, 2007
Thursday, January 11, 2007
Data Mining turns Political!
This issue started out in Congress as the "Data Mining Moratorium Act" a few years ago but has now taken the turned into the Federal Agency Data Mining Reporting Act of 2007 that was introduced on Jan 10, by Senators Russ Feingold (D-Wisc) and John Sununu (R-NH).
In essence, it requires the Federal agencies to report to Congress on their data mining activities. According to the text of the bill "the controversial data analysis technology known as data mining"
can be used to dig deep into the records of fellow Americans using the public and private data. Recently, the phone data mining hit the new headlines.
The Bill acknowledges the success of data mining on the commercial front in the Finance industry such as "to identify people committing fraud." However, the Bill questions the potential of data mining for counter terrorism. This is not because the algorithmic suitability is in question but more so because we do not have enough history of such activities to train the data mining models. The cases of terrorism create a rare case scenario, just like the issue of training machines to predict or detect cancer poses.
It also seems that the regional Law Enforcement Agencies (such as the Sheriff's Offices, Police Departments) will not be impacted as it states that the Federal Agencies are required to report on their data mining activites to the Congress. The law requires reporting of the,
"thorough description of the data mining technology that is being used or will be used, including the basis for determining whether a particular pattern or anomaly is indicative of terrorist or criminal activity."
The down side of this law is that criminals will get more public information on what data mining based counter terrorism measures are being used to foil their attempts.
Shyam Varan Nath
(Thanks to Dr Colleen McCue for sending me this link)
In essence, it requires the Federal agencies to report to Congress on their data mining activities. According to the text of the bill "the controversial data analysis technology known as data mining"
can be used to dig deep into the records of fellow Americans using the public and private data. Recently, the phone data mining hit the new headlines.
The Bill acknowledges the success of data mining on the commercial front in the Finance industry such as "to identify people committing fraud." However, the Bill questions the potential of data mining for counter terrorism. This is not because the algorithmic suitability is in question but more so because we do not have enough history of such activities to train the data mining models. The cases of terrorism create a rare case scenario, just like the issue of training machines to predict or detect cancer poses.
It also seems that the regional Law Enforcement Agencies (such as the Sheriff's Offices, Police Departments) will not be impacted as it states that the Federal Agencies are required to report on their data mining activites to the Congress. The law requires reporting of the,
"thorough description of the data mining technology that is being used or will be used, including the basis for determining whether a particular pattern or anomaly is indicative of terrorist or criminal activity."
The down side of this law is that criminals will get more public information on what data mining based counter terrorism measures are being used to foil their attempts.
Shyam Varan Nath
(Thanks to Dr Colleen McCue for sending me this link)
Monday, January 8, 2007
OneDOJ - Single view of Crime!
DOJ Pushes To Broaden Data Sharing
Agency will use central database to make crime info widely available
Various companies have used Business Intelligence techniques for creating the 'single version of truth' or the '360 view of the Customer.' The OneDOJ, being pushed by US Dept of Justice, is a similar initiative. Criminals often misuse the de-centralised powers that the different states have. We have heard stories about out of state traffic ticket did not have any impact in the driver's license at the other state. Similarly, due to non-sharing of criminal information, by different law enforcement agencies, gives the criminals a lot of leeway as they move across state boundries.
The closest analogy, I can think of is the National DNA database. Just like the Credit scores are accessible anywhere we go to figure out if a person has bankruptcies or other bad debt, the criminal history of the person should also move across the boundries of the Law Enforcement boundries and should be available to USCIS as well.
Well it remains to be seen, how soon OneDOJ database becomes a reality!
Agency will use central database to make crime info widely available
Various companies have used Business Intelligence techniques for creating the 'single version of truth' or the '360 view of the Customer.' The OneDOJ, being pushed by US Dept of Justice, is a similar initiative. Criminals often misuse the de-centralised powers that the different states have. We have heard stories about out of state traffic ticket did not have any impact in the driver's license at the other state. Similarly, due to non-sharing of criminal information, by different law enforcement agencies, gives the criminals a lot of leeway as they move across state boundries.
The closest analogy, I can think of is the National DNA database. Just like the Credit scores are accessible anywhere we go to figure out if a person has bankruptcies or other bad debt, the criminal history of the person should also move across the boundries of the Law Enforcement boundries and should be available to USCIS as well.
Well it remains to be seen, how soon OneDOJ database becomes a reality!
Thursday, January 4, 2007
Database biometrics support help Motorola fight crime
The different ways technology is being used to help fight crime continues to amaze me as I read this article today: Oracle 10g, biometrics help Motorola fight crime
Motorola products, as seen in CSI, used for finger prints, palm prints etc are power by the Oracle 10g database. The database allows saving the images of the finger prints and the associated textual data with search capabilities. Increasing the database is becoming a content management system where the image and textual content can be securely stored for quick retrieval using querying tools. Apart from fingerprints, nowadays tongue and iris recognition are also other biometrics in use these days. Disney has used the two finger scans for a while for preventing multiple guests using the same ticket or misuse of its annual passes.
Similarly, the mug shots and crime scene images used by the law enforcement agencies can now be stored inside the database. These images can be searched based on the textual annotations or even by image data mining. As long as there are easy to use Graphic user interfaces and performant databases at the back-end driving these, the detectives and the other law enforcement officers will be willing to give these technolgies a shot in crime analysis. Oracle Corp. provides such an application suite called Oracle Protect for Law Enforcement that can be used for Crimestats (crime reporting), Crime analysis using the geo-spatial display of crime and the pattern search using the Oracle data mining.
Will the computers become the new Sherlock Homles eventually?
Motorola products, as seen in CSI, used for finger prints, palm prints etc are power by the Oracle 10g database. The database allows saving the images of the finger prints and the associated textual data with search capabilities. Increasing the database is becoming a content management system where the image and textual content can be securely stored for quick retrieval using querying tools. Apart from fingerprints, nowadays tongue and iris recognition are also other biometrics in use these days. Disney has used the two finger scans for a while for preventing multiple guests using the same ticket or misuse of its annual passes.
Similarly, the mug shots and crime scene images used by the law enforcement agencies can now be stored inside the database. These images can be searched based on the textual annotations or even by image data mining. As long as there are easy to use Graphic user interfaces and performant databases at the back-end driving these, the detectives and the other law enforcement officers will be willing to give these technolgies a shot in crime analysis. Oracle Corp. provides such an application suite called Oracle Protect for Law Enforcement that can be used for Crimestats (crime reporting), Crime analysis using the geo-spatial display of crime and the pattern search using the Oracle data mining.
Will the computers become the new Sherlock Homles eventually?
Wednesday, January 3, 2007
News Flash-U.S. intelligence chief to switch jobs
This blog is just a day old and there is some intelligence related action in DC!
U.S. intelligence chief to switch jobs
The news is hot off the press. National Intelligence Director John Negroponte will resign from his office that he took over in 2005. This has been a new postition subseqent to the 9/11 commission recommendations. The successor for Negroponte is not yet known.
Thanks
Shyam
U.S. intelligence chief to switch jobs
The news is hot off the press. National Intelligence Director John Negroponte will resign from his office that he took over in 2005. This has been a new postition subseqent to the 9/11 commission recommendations. The successor for Negroponte is not yet known.
Thanks
Shyam
Tuesday, January 2, 2007
Welcome to the Crime Analysis and Data Mining Blog
This blog will be used to discuss the use of predictive analytics technology for Crime Analysis. Crime Analysis traces it's root to some really old times, read the History of Crime Analysis. Just like the criminals have become more savvy and intelligent over the years, so has the field of crime analysis changed and resorted to some cutting edge technologies. We plan to discuss some of these topics here.
I concieved the idea of this Blog while brainstorming with Dr Colleen McCue the author of the book that I have been reading these days called "Data Mining and Predictive Analysis." I have done some work in the area of Crime Data Mining while working in Oracle Corporation and recently IEEE Computer Society published my work. I had met Ralf in my recent visit to Germany to speak on Crime Data Mining and he added a comment on my presentation at this blog.
The other books on this topics that I like are Investigative Data Mining for Security and Criminal Detection by Jesus Mena and Crime Analysis and Mapping by Dr Rachel Boba. Dr Boba is a faculty in Criminal Justice at the same University where I went for my Grad school.
And finally, I would like to acknowledge Chuck Dodson from Oracle who is a "living breathing encyclopedia" of Criminal Justice. I consider him my Crime Analysis Guru. You can see some of his Crime related work here.
Thanks and a Happy New 2007 to everyone!
Shyam Varan Nath
I concieved the idea of this Blog while brainstorming with Dr Colleen McCue the author of the book that I have been reading these days called "Data Mining and Predictive Analysis." I have done some work in the area of Crime Data Mining while working in Oracle Corporation and recently IEEE Computer Society published my work. I had met Ralf in my recent visit to Germany to speak on Crime Data Mining and he added a comment on my presentation at this blog.
The other books on this topics that I like are Investigative Data Mining for Security and Criminal Detection by Jesus Mena and Crime Analysis and Mapping by Dr Rachel Boba. Dr Boba is a faculty in Criminal Justice at the same University where I went for my Grad school.
And finally, I would like to acknowledge Chuck Dodson from Oracle who is a "living breathing encyclopedia" of Criminal Justice. I consider him my Crime Analysis Guru. You can see some of his Crime related work here.
Thanks and a Happy New 2007 to everyone!
Shyam Varan Nath
Subscribe to:
Posts (Atom)