Create your own search engine!

PraveenKumar Purushothaman · 2012-01-12T14:30:40+00:00

Is there anyone who uses Internet but not Google?? For searching any thing on the web, the first site that comes to our mind is Google. Ever thought to have your own search engine?? Not like the Google but a simple one that runs in your desktop and searches your files. Or what if you need to add search functionality to your website? <a href="https://lucene.apache.org/" target="_blank" rel="nofollow noopener noreferrer">Apache Lucene - Welcome to Apache Lucene</a> is there to serve your needs. Lucene is an extremely rich and powerful full-text search API written in Java. In this post, I will briefly explain how Lucene Directory works. The first step in implementing full-text searching with Lucene is to build an index. This is easy - you just specify a directory and an analyzer class. The analyzer breaks text fields into indexable tokens; this is a core part of Lucene. Several types of analyzers are provided out of the box. Below listed some of the more interesting ones.Lucene analyzersStandardAnalyzer: A sophisticated general-purpose analyzer. WhitespaceAnalyzer: A very simple analyzer that just separates tokens using white space. StopAnalyzer: Removes common English words that are not usually useful for indexing. SnowballAnalyzer: An interesting experimental analyzer that works on word roots (a search on rain should also return entries with raining, rained, and so on).There are even a number of language-specific analyzers, including analyzers for German, Russian, French, Dutch, and others.

Create your own search engine!

PraveenKumar Purushothaman

Member

Updated: Oct 21, 2024

Views: 907

Is there anyone who uses Internet but not Google?? For searching any thing on the web, the first site that comes to our mind is Google. Ever thought to have your own search engine?? Not like the Google but a simple one that runs in your desktop and searches your files. Or what if you need to add search functionality to your website?

<a href="https://lucene.apache.org/" target="_blank" rel="nofollow noopener noreferrer">Apache Lucene - Welcome to Apache Lucene</a> is there to serve your needs. Lucene is an extremely rich and powerful full-text search API written in Java.

In this post, I will briefly explain how Lucene Directory works.

The first step in implementing full-text searching with Lucene is to build an index. This is easy - you just specify a directory and an analyzer class. The analyzer breaks text fields into indexable tokens; this is a core part of Lucene.

Several types of analyzers are provided out of the box. Below listed some of the more interesting ones.
Lucene analyzers

StandardAnalyzer: A sophisticated general-purpose analyzer.
WhitespaceAnalyzer: A very simple analyzer that just separates tokens using white space.
StopAnalyzer: Removes common English words that are not usually useful for indexing.
SnowballAnalyzer: An interesting experimental analyzer that works on word roots (a search on rain should also return entries with raining, rained, and so on).
There are even a number of language-specific analyzers, including analyzers for German, Russian, French, Dutch, and others.

0

Replies

Howdy guest!

Dear guest, you must be logged-in to participate on CrazyEngineers. We would love to have you as a member of our community. Consider creating an account or login.

Replies

PraveenKumar Purushothaman

Member • Jan 12, 2012
Next, we need to create an IndexWriter object. The IndexWriter object is used to create the index and to add new index entries to this index. You can create an IndexWriter with the StandardAnalyzer analyzer as follows:
```
IndexWriter indexWriter = new IndexWriter("index", new StandardAnalyzer(), true);
```
The first argument is the directory location in the file system where the index files should be located. The second argument is a StandardAnalyzer object. The third argument is a boolean parameter set to true, which tells the IndexWriter to rebuild the index from scratch if it already exists.

The next step is to index the business objects. For this, we use the Document class.

The document is a container for holding a set of indexed fields.
```
Document document = new Document();
Reader reader = new FileReader(file);
document.add(new Field(FIELD_CONTENTS, reader));  //FIELD_CONTENTS is a String constant having value "contents"
// i.e It is the name of the field. The value is the contents of the file, as represented by "file" parameter to the reader.
```
In above snippet, a Field is created and is being added to the Document. A field is made up of a name and a value (the first two parameters in the class constructor). The value may take the form of a String, or a Reader if the object to be indexed is a file. Field has a lot of overloaded constructors for various needs. For more details on the Field, refer the #-Link-Snipped-#.
Are you sure? This action cannot be undone.
Cancel
PraveenKumar Purushothaman

Member • Jan 12, 2012
Now, add the document to index writer.
```
indexWriter.addDocument(document);
```
So far, we have created an index writer and added the document to it.
The only step that’s remaining now is to search the indexed values. For this, Lucene provides an IndexSearcher and QueryParser classes. We provide an analyzer object to the QueryParser; note that this must be the same one used during the indexing. You also specify the field that you want to search, and the (user-provided) full-text query.
```
// directory is the name of Directory where the indexes will be stored
IndexReader indexReader = IndexReader.open(directory);
    IndexSearcher indexSearcher = new IndexSearcher(indexReader);
 
    Analyzer analyzer = new StandardAnalyzer();
    QueryParser queryParser = new QueryParser(FIELD_CONTENTS, analyzer);
    Query query = queryParser.parse(searchString); //searchString - this is user given!
    Hits hits = indexSearcher.search(query);
    System.out.println("Number of hits: " + hits.length());
```
In above snippet, we are using the QueryParser to create a new Query, and then passing this Query object to IndexSearcher’s search() method. The search method returns a Hits object which contains the values matching searchString. The length() method gives the number of matches.
Voila! Our search engine is ready!
Are you sure? This action cannot be undone.
Cancel
PraveenKumar Purushothaman

Member • Jan 12, 2012
If you want to see the exact matches, then use an Iterator of Hit type on hits and iterate over it to to get the documents that matched the search string.

The code would look somewhat like this:
```
Iterator<Hit> it = hits.iterator();
        while (it.hasNext()) {
            Hit hit = it.next();
            Document document = hit.getDocument();
           // Get the required value from the document and store in matchedValue
            System.out.println("Hit: " + matchedValue);
        }
```
Simple, isn’t it?

I know a lot of new terms have come into picture - Document, Field, IndexWriter, IndexSearcher, etc. But once you do a sample Java project, things will get simpler.
Are you sure? This action cannot be undone.
Cancel
K!r@nS!ngu

Member • Jan 12, 2012

Awesome bro. Let me give a try.....

Are you sure? This action cannot be undone.
Cancel
PraveenKumar Purushothaman

Member • Jan 15, 2012

K!r@nS!ngu
Awesome bro. Let me give a try.....
Sure, let us know how was the result... 😀

Are you sure? This action cannot be undone.
Cancel
greatcoder

Member • Mar 22, 2012

Praveen-Kumar
Is there anyone who uses Internet but not Google?? For searching any thing on the web, the first site that comes to our mind is Google. Ever thought to have your own search engine?? Not like the Google but a simple one that runs in your desktop and searches your files. Or what if you need to add search functionality to your website?

<a href="https://lucene.apache.org/" target="_blank" rel="nofollow noopener noreferrer">Apache Lucene - Welcome to Apache Lucene</a> is there to serve your needs. Lucene is an extremely rich and powerful full-text search API written in Java.

In this post, I will briefly explain how Lucene Directory works.

The first step in implementing full-text searching with Lucene is to build an index. This is easy - you just specify a directory and an analyzer class. The analyzer breaks text fields into indexable tokens; this is a core part of Lucene.

Several types of analyzers are provided out of the box. Below listed some of the more interesting ones.

There are even a number of language-specific analyzers, including analyzers for German, Russian, French, Dutch, and others.
Praveen-Kumar
Is there anyone who uses Internet but not Google?? For searching any thing on the web, the first site that comes to our mind is Google. Ever thought to have your own search engine?? Not like the Google but a simple one that runs in your desktop and searches your files. Or what if you need to add search functionality to your website?

<a href="https://lucene.apache.org/" target="_blank" rel="nofollow noopener noreferrer">Apache Lucene - Welcome to Apache Lucene</a> is there to serve your needs. Lucene is an extremely rich and powerful full-text search API written in Java.

In this post, I will briefly explain how Lucene Directory works.

The first step in implementing full-text searching with Lucene is to build an index. This is easy - you just specify a directory and an analyzer class. The analyzer breaks text fields into indexable tokens; this is a core part of Lucene.

Several types of analyzers are provided out of the box. Below listed some of the more interesting ones.

There are even a number of language-specific analyzers, including analyzers for German, Russian, French, Dutch, and others.

No need to do such hard Work.... JUST DOWNLOAD GOOGLE DESKTOP and you can search ur entire computer with any keyword. It will not only search txt files (all sorts of files ppt, doc), but will search the content written inside the file. Also it will search Outlook emails!!

BEst Tool To keep with You. After all U Cannot compete with Google in Searching 😎

Are you sure? This action cannot be undone.
Cancel
PraveenKumar Purushothaman

Member • Mar 23, 2012

greatcoder
No need to do such hard Work.... JUST DOWNLOAD GOOGLE DESKTOP and you can search ur entire computer with any keyword. It will not only search txt files (all sorts of files ppt, doc), but will search the content written inside the file. Also it will search Outlook emails!!

BEst Tool To keep with You. After all U Cannot compete with Google in Searching 😎
Dude, this is something to learn to make yourself... We know everything exists... How it feels when others use the one which you made...

i.e., Creating is better than Using!!! 😀

Are you sure? This action cannot be undone.
Cancel