Monday, March 5, 2007

Crime Analytics at NYPD

Text search and text mining is an integral part of Business Intelligence. Often text mining is tightly coupled with data mining. Here is an example where a text search for "sugar" led to sweet success in solving a crime at NYPD recently:
"Best in class 2007: New York City Police Department"

The text searching capabilities are an integral part of crime related systems. In this case one of the witnesses recalled the word "sugar" in the tattoe on the neck of the suspect and the detectives searched the Real Time Crime Center (RTCC) database for this string. Free text is categorised as un-structured data and often creates the most challenging aspect of a software system based on database. Databases are great in handling structured data or data in columns that can be easily queried upon. However, in recent times, search capabilities for free test of phrases is gaining important. The Google's and Yahoo's have mastered the art of searching text, so what is the big deal about searching for text in the database. The free text consists of several words or phrases and in order to index it for faster searching, the text engine creates a text index. Unlike the index on structured fields such as a numeric column or character column with a few categories of data, the text indexing has to extract all the important words, get rid of stop words like {a, an, the, it, this etc.} and then store the occurence, frequency and relative positions of these words in every row/record of data. Such a text index would allow a quick search for a word like sugar in the description of the suspects, among 120 million or so crime and arrest records.

Realizing the importance of the growing importance of text search and text mining of the data stored in databases, companies like Oracle have tightly coupled the Oracle Text engine with the database. This text engine also works closely with Oracle Data Mining to allow looking for patterns in textual descriptions. In crime incidents, the narrative of the crime report, has a wealth of unstructured data. Now, this "wealth" can be mined using the marriage of the in-database text and the mining engine.


Kelvin Leong said...

It's a great blog. Whether the crime rate in your study area is high?

Sandro Saitta said...

Hello Shyam,

Your blog is very interesting, so I hope you will have time to update it in a near future.


Unknown said...

Nice and knowledgeable gifts for everyone-
Books and references

Anonymous said...

Hello I just entered before I have to leave to the airport, it's been very nice to meet you, if you want here is the site I told you about where I type some stuff and make good money (I work from home): here it is

Anonymous said...

i did a little research after you told me about your "thing", and if you want a way to make more money using your your blog you can enter this site: link. bye.

Awynn said...

I have developed fuzzy matching logic for DB analysis in the financial arena. I interested to see whether these algorithms can be applied to crime prediction.

Unknown said...

very nice information thanks for sharing it what about silica.

stevewckrt said...

Its a great blog.
Gulf Coast Supply is a metal roofing manufacturer and supplies wholesale metal roofing and residential metal roofing materials.

Unknown said...

It is made for collection of qualitative information that does not have to be examined statistically. Inquiries in this layout are open-ended with pre-set urges listed under each concern. When we are assessing questionnaire designs at Data cleaning we search for the following points. First our PhD Degree specialists inspect if a survey fulfills its research goals. This is because some studies omit important facets of a questionnaire and this means that those facets are not penetrated deeply as they are not well understood. The various other important factor they will certainly review is whether your survey is developed in a manner to encourage the participants to have just the truth.

Admin said...

Thanks for the post

Best Network Security and Information Security Training Insitute

Fortinet Training
Checkpoint Training
F5 Training
Juniper Training
PaloAlto Training
Sophos Training
Cyberoam Training
ArcSight Training
Sonicwall Training

dataanalyticscourse said...

I see some amazingly important and kept up to length of your strength searching for in your on the site
360DigiTMG data analytics course

tejaswini said...

Set aside my effort to peruse all the remarks, however I truly delighted in the article. It's consistently pleasant when you can not exclusively be educated, yet in addition, engaged!
data science courses in malaysia

devika iangar said...

I think I have never watched such online diaries ever that has absolute things with all nuances which I need. So thoughtfully update this ever for us.
difference between analysis and analytics

360digitmgdelhi said...

You should talk it's shocking. Your blog survey would extend your visitors. I was fulfilled to find this site.I expected to thank you for this phenomenal read!!
data science course in delhi

PMP Certification said...

Here at this site actually the particular material assortment with the goal that everyone can appreciate a great deal.
machine learning course in malaysia

prathyusha said...

Extremely overall quite fascinating post. I was searching for this sort of data and delighted in perusing this one. Continue posting. A debt of gratitude is in order for sharing.Best data analytics course in Hyderabad