Monday, March 5, 2007

Crime Analytics at NYPD

Text search and text mining is an integral part of Business Intelligence. Often text mining is tightly coupled with data mining. Here is an example where a text search for "sugar" led to sweet success in solving a crime at NYPD recently:
"Best in class 2007: New York City Police Department"

The text searching capabilities are an integral part of crime related systems. In this case one of the witnesses recalled the word "sugar" in the tattoe on the neck of the suspect and the detectives searched the Real Time Crime Center (RTCC) database for this string. Free text is categorised as un-structured data and often creates the most challenging aspect of a software system based on database. Databases are great in handling structured data or data in columns that can be easily queried upon. However, in recent times, search capabilities for free test of phrases is gaining important. The Google's and Yahoo's have mastered the art of searching text, so what is the big deal about searching for text in the database. The free text consists of several words or phrases and in order to index it for faster searching, the text engine creates a text index. Unlike the index on structured fields such as a numeric column or character column with a few categories of data, the text indexing has to extract all the important words, get rid of stop words like {a, an, the, it, this etc.} and then store the occurence, frequency and relative positions of these words in every row/record of data. Such a text index would allow a quick search for a word like sugar in the description of the suspects, among 120 million or so crime and arrest records.

Realizing the importance of the growing importance of text search and text mining of the data stored in databases, companies like Oracle have tightly coupled the Oracle Text engine with the database. This text engine also works closely with Oracle Data Mining to allow looking for patterns in textual descriptions. In crime incidents, the narrative of the crime report, has a wealth of unstructured data. Now, this "wealth" can be mined using the marriage of the in-database text and the mining engine.


Kelvin Leong said...

It's a great blog. Whether the crime rate in your study area is high?

Sandro Saitta said...

Hello Shyam,

Your blog is very interesting, so I hope you will have time to update it in a near future.


Ayisha said...

Nice and knowledgeable gifts for everyone-
Books and references

barb michelen said...

Hello I just entered before I have to leave to the airport, it's been very nice to meet you, if you want here is the site I told you about where I type some stuff and make good money (I work from home): here it is

bernard n. shull said...

i did a little research after you told me about your "thing", and if you want a way to make more money using your your blog you can enter this site: link. bye.

alanedwards1984 said...

I have developed fuzzy matching logic for DB analysis in the financial arena. I interested to see whether these algorithms can be applied to crime prediction.

Tas said...

very nice information thanks for sharing it what about silica.

stevewckrt said...

Its a great blog.
Gulf Coast Supply is a metal roofing manufacturer and supplies wholesale metal roofing and residential metal roofing materials.

Gd Girish said...

It is made for collection of qualitative information that does not have to be examined statistically. Inquiries in this layout are open-ended with pre-set urges listed under each concern. When we are assessing questionnaire designs at Data cleaning we search for the following points. First our PhD Degree specialists inspect if a survey fulfills its research goals. This is because some studies omit important facets of a questionnaire and this means that those facets are not penetrated deeply as they are not well understood. The various other important factor they will certainly review is whether your survey is developed in a manner to encourage the participants to have just the truth.