Java – Instant search of petabytes of data

Instant search of petabytes of data… here is a solution to the problem.

Instant search of petabytes of data

I need to search for more than petabytes of data in a CSV formate file. When indexed with LUCENE, the index file is twice the size of the original file. Is it possible to reduce the size of the index file??? HOW ARE LUCENE INDEX FILES DISTRIBUTED IN HADOOP AND USED IN A SEARCH ENVIRONMENT? OR IF IT IS NECESSARY, SHOULD I USE SOLR TO DISTRIBUTE LUCENE INDEXES??? My request is an instant search of petabytes of files….

Solution

Hadoop and Map Reduce are based on batch models. You won’t get instant response speed from it, which is not what the tool is designed for. You might be able to use Hadoop to speed up indexing, but it won’t perform the query operations you want.

Look at Lucandra , which is a Lucene backend based on Cassandra. Cassandra is another distributed data store, if I remember, it was developed at Facebook to provide faster access times than Hadoop’s more query-oriented access model.

Related Problems and Solutions